It's All Downhill From Here

| 0 Comments

Last time[1] we took a look at function optimisation, implementing the bisection and golden section search algorithms to find local minima; the points at which a function returns a smaller value than at any nearby point. Formally, for a function \(f\), a point \(x\) is a local minimum if and only if
\[ \exists \epsilon>0 \quad \forall y \in [x-\epsilon, x+\epsilon] \quad f(y) \geqslant f(x) \]
where \(\exists\) means there exists, \(\forall\) means for all and \(\in\) means within.

Now this is all fine and dandy, but what if we want to minimise a function that takes two arguments? Or three arguments? Or four, four arguments; mwah, ha, ha, ha, ha!

Er, I seem to be channelling Count von Count. Let me get some garlic...

Seriously though, most real world optimisation problems do involve functions with more than one argument, known as multivariate functions in contrast to univariate functions which have just one.
Whilst it's relatively easy to generalise the concept of a bracket to multiple dimensions, it's not so clear how to generalise the rules for updating it at each step, so we shall take an entirely different approach instead.

A Blindfolded Hill Climber

Imagine trying to climb to the top of a hill whilst blindfolded, perhaps as part of some faddish team building exercise. Unable to see which way leads uphill, you might try tentatively feeling with a foot whether or not a particular direction leads uphill and taking a step in that direction if it does.

Figure 1: Our Intrepid Hill Climber
An entire class of multivariate optimisation algorithms, known by analogy as hill climbers, operate in exactly this manner; take a trial step in some direction and if it is better than the current point, move to it. Just as our intrepid hill climber cannot know for sure that the ascended peak isn't just a foothill, as illustrated by figure 1, so such algorithms can only aspire to finding local optima.

To demonstrate the operation of such algorithms, we'll use one to minimise the two dimensional function
Figure 2: Rosenbrock's Function
\[ f(x, y) = 100 (y - x^2)^2 + (1-x)^2 \]
This is known as Rosenbrock's function and, as illustrated by figure 2 in which smaller values are represented by darker hues, it has a long curved valley that runs from the top left, through the origin and up to the top right of the plane, and gently slopes downhill to a minimum value of zero when \(x=y=1\).
Because of these properties, Rosenbrock's function is frequently used to test multivariate minimisation algorithms, forcing them to navigate a curve to find a minimum at which the function changes rapidly in some directions and slowly in others.

In the simplest hill climbing algorithm we generate the trial steps by adding a uniformly distributed random number to each element of the current point. Specifically, we shall use one that is symmetric about zero, giving equal chances of positive or negative changes.

In program 1, we use our ak.uniformRnd[2] to generate uniform random numbers lying between \(-0.1\) and \(+0.1\) that we add to the coordinates x0 and y0 at each step. If the resulting trial point has a smaller function value than the current point, we plot the current point in red, the trial point in green and then replace the former with the latter. When the program ends we consequently have its result plotted in green and the path it took to get there plotted in red.

Program 1: Minimising Rosenbrock's Function

If you run the program a few times you'll notice that the algorithm consistently descends to the valley floor in pretty short order, but that afterwards things go downhill fast. Or rather they don't!
The reason for this is that high up on the side of the valley almost half of the directions in which we might take a step lead downhill, resulting in fairly rapid progress. On the valley floor, however, almost every direction leads uphill and the algorithm grinds to a near halt.

In fact, this style of hill climbing algorithm is perhaps the worst possible choice for general purpose function minimisation. Nevertheless, it's very simple and as such I think it's worth implementing as our first multivariate minimisation algorithm. At the very least we can use it as a baseline for comparison with more sophisticated algorithms in future posts.

ak.blindfoldMinimum

Once again, we'll use our familiar technique of simply checking that the algorithms arguments are valid before returning a function that simply forwards to another to do the actual work, as shown in listing 1.

Listing 1: ak.blindfoldMinimum
ak.blindfoldMinimum = function(f, steps, rnd) {
 if(ak.nativeType(f)!==ak.FUNCTION_T) {
  throw new Error('invalid function in ak.blindfoldMinimum');
 }

 steps = ak.nativeType(steps)===ak.UNDEFINED_T ? 1024
                                               : ak.floor(Math.abs(steps));
 if(!isFinite(steps)) {
  throw new Error('invalid number of steps in ak.blindfoldMinimum');
 }

 if(ak.nativeType(rnd)===ak.UNDEFINED_T) {
  rnd = Math.random;
 }
 else if(ak.nativeType(rnd)!==ak.FUNCTION_T) {
  throw new Error('invalid random number generator in ak.blindfoldMinimum');
 }

 return function(x, d) {return minimum(f, x, d, steps, rnd);};
};

Here, ak.blindfoldMinimum expects a function f to minimise, an optional number of steps to take and an optional source of uniform random numbers, rnd. In addition to checking that the types of these arguments are correct, we supply a default number of steps of \(1024\) if it is not specified how many to take and use Math.random if no random number generator is given.
Finally, we return a function that takes a starting point x and a step range d which defers to a minimum function, given in listing 2, that does the actual work.

Listing 2: minimum
function minimum(f, x, d, steps, rnd) {
 if(ak.nativeType(d)===ak.UNDEFINED_T) d = (1+ak.abs(x))/64;
 return ak.nativeType(x)===ak.NUMBER_T ? uniMinimum(f, x, d, steps, rnd)
                                       : multiMinimum(f, x, d, steps, rnd);
}

A significant departure from our previous algorithms is that this type of hill climber is equally effective, or perhaps that should be equally ineffective, for both univariate and multivariate function minimisation. We shall require that the function's argument, and consequently the starting point x, be a number in the first case and an ak.vector in the second.
Passing ak.vector objects as single arguments to multivariate functions will be a ubiquitous design choice in the library since it's significantly simpler than dealing with variable numbers of function arguments. In this case it means that we can easily make the step range d optional by giving a default value that is related to the magnitude of the starting point, correctly dispatched to the appropriate formulae for numbers and vectors by ak.abs.

The uniMinimum function is pretty similar to the min function of program 1, as illustrated by listing 3.

Listing 3: uniMinimum
function uniMinimum(f, x0, d, steps, rnd) {
 var step = 0;
 var x1, f0, f1;

 if(ak.nativeType(x0)!==ak.NUMBER_T || !isFinite(x0)) {
  throw new Error('invalid argument in ak.blindfoldMinimum');
 } 

 if(ak.nativeType(d)!==ak.NUMBER_T || !isFinite(d)) {
  throw new Error('invalid delta in ak.blindfoldMinimum');
 }

 if(isNaN(f0 = f(x0))) f0 = ak.INFINITY;

 while(step++<steps) {
  x1 = x0 + d*(2*rnd()-1);
  if(isNaN(f1 = f(x1))) f1 = ak.INFINITY;

  if(f1<=f0) {
   f0 = f1;
   x0 = x1;
  }
 }
 return !isNaN(f(x0)) ? x0 : ak.NaN;
}

The main differences are that we have a single argument x0, use a while loop rather than an ak.asynchLoop and don't have any graphical output. In addition, we are a little more careful to handle invalid data, throwing exceptions if the starting point and step range are not finite numbers and replacing NaN function results with infinity to force the algorithm to aggressively step away from such points. Finally, you should note that rnd is expected to return a value between \(0\) and \(1\) and that we explicitly transform them into values between -d and d when we compute the trial steps x1.

The multiMinimum function is, in fact, more or less identical apart from the fact that it must handle ak.vector objects rather than numbers, as can be seen in listing 4.

Listing 4: multiMinimum
function multiMinimum(f, x, d, steps, rnd) {
 var step = 0;
 var n, i, x0, x1, f0, f1;

 if(ak.type(x)!==ak.VECTOR_T || x.dims()===0) {
  throw new Error('invalid argument in ak.blindfoldMinimum');
 }

 n = x.dims();
 for(i=0;i<n && isFinite(x.at(i));++i);
 if(i<n) throw new Error('invalid argument in ak.blindfoldMinimum');

 if(ak.nativeType(d)===ak.NUMBER_T) {
  d = ak.vector(n, d);
 }
 else if(ak.type(d)!==ak.VECTOR_T || d.dims()!==n) {
  throw new Error('invalid delta in ak.blindfoldMinimum');
 }

 for(i=0;i<n && isFinite(d.at(i));++i);
 if(i<n) throw new Error('invalid delta in ak.blindfoldMinimum');

 d  = d.toArray();
 x0 = x.toArray();
 x1 = new Array(n);

 if(isNaN(f0 = f(x))) f0 = ak.INFINITY;

 while(step++<steps) {
  for(i=0;i<n;++i) x1[i] = x0[i] + d[i]*(2*rnd()-1);
  x = ak.vector(x1);
  if(isNaN(f1 = f(x))) f1 = ak.INFINITY;

  if(f1<=f0) {
   f0 = f1;
   x0 = x.toArray();
  }
 }
 x = ak.vector(x0);
 return !isNaN(f(x)) ? x : ak.vector(n, ak.NaN);
}

So, first off we ensure that x actually is an ak.vector and that its elements are all finite. We shall allow client code to specify different step ranges for different elements using an ak.vector, so in the event that d is a number we replace it with an ak.vector of the correct length with elements equal to that number. Otherwise we check that it's a valid vector in the same way that we did for the starting point.
Before the main body of the loop, we replace d with the array of its elements, store the elements of x in x0 and create an array x1 for the trial step; we need to work with arrays rather than ak.vector objects since the latter are immutable.
This time we construct the trial step by iterating over the elements of the current point and adding random changes lying within the step range of each in turn. We construct a new ak.vector to pass to the function and replace x0 with a copy of its elements if it results in an improvement, safely avoiding the confusion that would result should x0 and x1 ever refer to the same Array object.

Program 2 demonstrates the application of ak.blindfoldMinimum, defined in BlindfoldMinimum.js, to Rosenbrock's function.

Program 2: Using ak.blindfoldMinimum

As in program 1, the result is certainly much closer to the minimum than the starting point, but to be perfectly frank, its convergence to only one or two significant figures is pretty damned shoddy in comparison to our univariate minimisation algorithms.
Fortunately, there's an extremely simple improvement that can get us a good deal closer; run it more than once! Specifically, we can refine a result of the minimiser by using it as a starting point for another call with a smaller step range. If, as we can presumably expect, we are close to the minimum, this gives us a much better chance of hitting upon a point that is closer still at each step, as demonstrated by program 3.

Program 3: Refining ak.blindfoldMinimum

Now that's quite an improvement in accuracy, but unfortunately one that comes at a significant computational expense. Our univariate minimisation algorithms narrowed their brackets at a more or less constant rate, so having our multivariate minimiser slow down as it approaches the minimum is something of a disappointment. Nevertheless, the basic concept of iteratively stepping towards a minimum is sound and we shall find in future posts that we can dramatically improve matters by doing so more intelligently...

References

[1] That Shrinking Feeling, www.thusspakeak.com, 2014.

[2] What Are The Chances Of That?, www.thusspakeak.com, 2014.

Leave a comment