The Tripods Are Here!

| 0 Comments

Last time[1] we discussed the polytope method[2], a multivariate function minimisation algorithm that seeks out a local minimum by stepping away from the worst of a set of points, most commonly a simplex; the multivariate generalisation of a triangle.
Loosely speaking, a local minimum is a point at which a function returns its smallest value in a small region containing that point. More formally, it can be defined as a vector \(\mathbf{x}\) that satisfies
\[ \exists \epsilon>0 \quad \forall \mathbf{y} : |\mathbf{y}-\mathbf{x}| \leqslant \epsilon \quad f(\mathbf{y}) \geqslant f(\mathbf{x}) \]
for a function \(f\), where \(\exists\) means there exists, \(\forall\) means for all, the colon means such that and the vertical bars stand for the magnitude, or length, of the vector between them. Since we're taking the magnitude of the difference between \(\mathbf{x}\) and \(\mathbf{y}\) here, it gives the distance between them.

We got as far as implementing an algorithm for generating regular simplices of any size and location with ak.simplex and in this post we shall finish the job.

The Steps Of The Polytope Method

Figure 1: Flipping The Worst Point
Recall that the essential idea of the polytope method is to flip the worst point of a polytope through the midpoint of the \(n\) other points, as illustrated for two dimensions by figure 1 in which the worst point is marked with a solid circle, its replacement with a hollow one and the midpoint of the other points with a cross.
Noting that the old and new points are an equal distance from the midpoint but in opposite directions, the location of the new point is given by
\[ \begin{align*} \bar{\mathbf{x}} &= \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i\\ \mathbf{x}_{n+1} &= \bar{\mathbf{x}} + \left(\bar{\mathbf{x}} - \mathbf{x}_0\right) = 2\bar{\mathbf{x}} - \mathbf{x}_0 \end{align*} \]
where \(\sum\) is the summation sign and \(\mathbf{x}_0\) is the worst point.
If this new point is better than all of the old ones, then we try doubling the distance from \(\bar{\mathbf{x}}\) with
\[ \mathbf{x}_{n+2} = \bar{\mathbf{x}} + 2 \times \left(\bar{\mathbf{x}} - \mathbf{x}_0\right) = 3\bar{\mathbf{x}} - 2\mathbf{x}_0 \]
and use this if it is better than \(\mathbf{x}_{n+1}\).
Finally, if \(\mathbf{x}_{n+1}\) is worse than \(\mathbf{x}_0\) we try shrinking the polytope. Firstly by trying a point halfway between \(\mathbf{x}_0\) and \(\bar{\mathbf{x}}\)
\[ \mathbf{x}_{n+3} = \frac{\mathbf{x}_0 + \bar{\mathbf{x}}}{2} \]
and, if that fails to be an improvement on \(\mathbf{x}_0\), by then shrinking all of the points towards the best point \(\mathbf{x}_n\) with
\[ \mathbf{x}_{i} = \frac{\mathbf{x}_i + \mathbf{x}_n}{2} \]
for \(i\) from \(0\) to \(n-1\).

A Note On Efficiency

Derivation 1: The Mean Of All But The Worst
By definition
\[ \hat{\mathbf{x}} = \frac{\sum_{i=0}^{n} \mathbf{x}_i}{n+1} \]
and so
\[ \begin{align*} &\left(\hat{\mathbf{x}} - \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n}\\ &\quad = \left(\frac{\sum_{i=0}^{n} \mathbf{x}_i}{n+1} - \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n}\\ &\quad = \frac{\sum_{i=1}^{n} \mathbf{x}_i}{n+1} \times \frac{n+1}{n}\\ &\quad = \frac{\sum_{i=1}^{n} \mathbf{x}_i}{n}\\ &\quad = \bar{\mathbf{x}} \end{align*} \]
Mathematically, this is all perfectly reasonable but from a computational perspective, calculating the mean of all but the worst point at every step is an expense that we could well do without. Fortunately, we can easily avoid it by keeping track of the mean of every point in the polytope, \(\hat{\mathbf{x}}\). From this we can easily recover the mean of every point but \(\mathbf{x}_0\) with
\[ \bar{\mathbf{x}} = \left(\hat{\mathbf{x}} - \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n} \]
as shown by derivation 1.
Now, for all of the steps except the shrinking of every point of the polytope towards the best, we can update \(\hat{\mathbf{x}}\) with
\[ \hat{\mathbf{x}} + \frac{\mathbf{x}_{n+1} - \mathbf{x}_0}{n+1} \]
where \(\mathbf{x}_{n+1}\) is the new point, which follows from much the same argument as that made for \(\bar{\mathbf{x}}\).
The advantage of this approach is that it requires a constant number of six vector arithmetic operations to calculate \(\bar{\mathbf{x}}\) rather than the \(n+1\) that would be required otherwise.

ak.polytopeMinimum

The implementation of the polytope method follows our usual scheme of returning a new function that searches for a local minimum from a given starting point, as shown by ak.polytopeMinimum in listing 1.

Listing 1: ak.polytopeMinimum
ak.polytopeMinimum = function(f, threshold, steps) {
 if(ak.nativeType(f)!==ak.FUNCTION_T) {
  throw new Error('invalid function in ak.polytopeMinimum');
 }

 threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? Math.pow(ak.EPSILON, 0.75)
                                                       : Math.abs(threshold);
 if(isNaN(threshold)) {
  throw new Error('invalid convergence threshold in ak.polytopeMinimum');
 }

 steps = ak.nativeType(steps)===ak.UNDEFINED_T ? ak.INFINITY
                                               : ak.floor(Math.abs(steps));
 if(isNaN(steps)) {
  throw new Error('invalid number of steps in ak.polytopeMinimum');
 }

 return function(x, r) {
  return minimum(f, ak.nativeType(x)===ak.ARRAY_T ? x : ak.simplex(x,r), threshold, steps);
 };
};

Its arguments are the function that we want to minimise, f, a convergence threshold to specify the desired accuracy of the result and a maximum number of steps. After ensuring that f is indeed a function, we provide defaults of three quarters of the digits of a double precision number for the threshold argument and infinity for the steps argument if they are undefined, throwing exceptions should they not be numbers.
This out of the way, we return a function that takes a starting point, x, and radius, r and uses them to construct a simplex that is then passed to the helper function minimum to do the actual minimisation. Note that if the former is a number and/or the latter is undefined, then ak.simplex will use its defaults of the origin and one respectively. Furthermore, we allow the user to specify their own initial polytope rather than use one generated by ak.simplex by passing it an array of ak.vector objects.

Before we get to the implementation of minimum there are a few helper functions that we shall need. Firstly, we'll need to calculate the mean of the vectors in the polytope so that we can use our trick for speeding up the calculation of the point through which we'll flip the worst point, as is done by meanElem in listing 2.

Listing 2: meanElem
function meanElem(p) {
 var n = p.length;
 var mean = ak.div(p[0], n);
 var i;

 for(i=1;i<n;++i) mean = ak.add(mean, ak.div(p[i], n));
 return mean;
}

There are a couple of things to note about this function. Firstly, we are using n to indicate the number of points in the polytope rather than the number of points other than the worst. We shall continue to do so throughout the implementation and so must adjust our formulae accordingly. Secondly, we are dividing each vector by n rather than their sum to eliminate the possibility of numerical overflow, a protective measure that would nearly double the cost of calculating \(\bar{\mathbf{x}}\) directly, making our optimisation even more attractive.
Next, we need to identify the best and worst points of the polytope which we shall do with the minElem and maxElem functions given in listing 3, in both of which the fp argument is expected to be an array of the function's values at each point in the polytope.

Listing 3: minElem And maxElem
function minElem(fp) {
 var n = fp.length;
 var m = 0;
 var i;

 for(i=1;i<n;++i) if(fp[i]<fp[m]) m = i;
 return m;
}

function maxElem(fp) {
 var n = fp.length;
 var m = 0;
 var i;

 for(i=1;i<n;++i) if(fp[i]>=fp[m]) m = i;
 return m;
}

Note that we deliberately use a greater than or equal comparison in the latter so that in the event that every element of fp is equal it will identify a different point than the former. Of course, this will fail in the presence of NaNs, for which every comparison other than not equal is false, so we must be careful to deal with them appropriately in minimum.
Specifically, we shall use the same trick that we used before[3][4] of replacing NaNs with infinity so that they will be treated as greater than any finite number, as shown by listing 4.

Listing 4: minimum
function minimum(f, p, eps, steps) {
 var step = 0;
 var n = p.length;
 var fp = new Array(n);
 var mean, i, min, max, m, d, y, z, fy, fz;

 if(n<2) throw new Error('invalid polytope in ak.polytopeMinimum');

 mean = meanElem(p);
 if(ak.type(mean)!==ak.VECTOR_T || mean.dims()===0) {
  throw new Error('invalid polytope in ak.polytopeMinimum');
 }
 for(i=0;i<mean.dims() && isFinite(mean.at(i));++i);
 if(i<mean.dims()) throw new Error('invalid polytope in ak.polytopeMinimum');

 for(i=0;i<n;++i) if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY;
 min = minElem(fp);
 max = maxElem(fp);

 while(step++<steps && !(ak.diff(fp[min],fp[max])<=eps) && ak.diff(p[min],p[max])>eps) {
  m = ak.mul(ak.sub(mean, ak.div(p[max], n)), n/(n-1));
  d = ak.sub(m, p[max]);
  y = ak.add(m, d);
  fy = f(y);

  if(fy<fp[min]) {
   z = ak.add(y, d);
   fz = f(z);

   if(fz<fy) {
    y  = z;
    fy = fz;
   }
  }
  else if(!(fy<fp[max])) {
   y = ak.div(ak.add(p[max], m), 2);
   fy = f(y);
  }

  if(fy<fp[max]) {
   mean = ak.add(mean, ak.div(ak.sub(y, p[max]), n));

   p[max]  = y;
   fp[max] = fy;

   if(fy<fp[min]) min = max;
   max = maxElem(fp);
  }
  else {
   for(i=0;i<n;++i) {
    if(i!==min) {
     p[i] = ak.div(ak.add(p[i], p[min]), 2);
     if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY;
    }
   }
   mean = meanElem(p);
   min = minElem(fp);
   max = maxElem(fp);
  }
 }
 return !isNaN(f(p[min])) ? p[min] : ak.vector(x.dims(), ak.NaN);
}

Now this is a rather lengthy bit of code and so I think that it bears breaking into smaller parts to explain its operation.

The Control Flow In Detail

Firstly, we create an array, fp, to store the values of the function f at the points of the polytope p as the algorithm progresses. Next we check that the polytope is valid with

if(n<2) throw new Error('invalid polytope in ak.polytopeMinimum');

mean = meanElem(p);
if(ak.type(mean)!==ak.VECTOR_T || mean.dims()===0) {
 throw new Error('invalid polytope in ak.polytopeMinimum');
}
for(i=0;i<mean.dims() && isFinite(mean.at(i));++i);
if(i<mean.dims()) throw new Error('invalid polytope in ak.polytopeMinimum');
requiring that it contains at least two points and that their mean is a vector of at least one dimension with all finite elements, implying that the same is true of them.
We then populate fp with the function values of each point, replacing NaNs with infinity as described above, with

for(i=0;i<n;++i) if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY;
The final step before entering the main loop of the algorithm is to identify the best and worst points with

min = minElem(fp);
max = maxElem(fp);
The loop's condition is rather subtle since it's designed to cope with some tricky corner cases. The first term is simple enough; check that we haven't taken the maximum number of steps. The second, however, must cope with the possibility that the function's value at the best and worst points is infinite. In this case, ak.diff will return NaN and so we check that it is not less than or equal to eps so that we can at least try to proceed. The call to ak.diff in the third term will similarly return NaN if both the best and worst points have any infinite elements which, since we can't sensibly proceed under such circumstances, should terminate the loop.

The first thing we do in the body of the loop is calculate \(\bar{\mathbf{x}}\), which we store in m, and the vector from \(\mathbf{x}_0\) to \(\bar{\mathbf{x}}\), which we store in d

m = ak.mul(ak.sub(mean, ak.div(p[max], n)), n/(n-1));
d = ak.sub(m, p[max]);
after which we calculate the location of the new point and its function value with

y = ak.add(m, d);
fy = f(y);
Note that we don't replace NaN with infinity this time, so the rest of the loop must handle it correctly. Now if it's better than the best point we try expanding the step by adding d to y with

if(fy<fp[min]) {
 z = ak.add(y, d);
 fz = f(z);

 if(fz<fy) {
  y  = z;
  fy = fz;
 }
}
The first of these if statements will fail if fy is NaN, ensuring that we don't try to extend a step to a NaN value, and the second will fail if fz is NaN, ensuring that we don't keep an extension that yields one.
If it's no better than the worst point we try shrinking the step with

else if(!(fy<fp[max])) {
 y = ak.div(ak.add(p[max], m), 2);
 fy = f(y);
}
Note that this time we check that fy is not less than fp[max] so that, if the former is NaN, this step will be tried. Once again, we do not bother replacing NaN with infinity, relying upon the remaining code to do the right thing.

Finally, if the new point has a smaller function value than the worst point, which it won't have if it's NaN, we replace the worst point with the new point, updating the mean of the polytope and the best and worst points as we do so

mean = ak.add(mean, ak.div(ak.sub(y, p[max]), n));

p[max]  = y;
fp[max] = fy;

if(fy<fp[min]) min = max;
max = maxElem(fp);
Otherwise, we shrink the entire polytope towards the best point and recalculate its mean and best and worst points with

for(i=0;i<n;++i) {
 if(i!==min) {
  p[i] = ak.div(ak.add(p[i], p[min]), 2);
  if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY;
 }
}
mean = meanElem(p);
min = minElem(fp);
max = maxElem(fp);
Now this time we do replace NaNs with infinity, since we're storing them in fp and the logic assumes that none of its elements are NaNs.

Phew! This numerical stuff is a fiddly business!

Using ak.polytopeMinimum

Figure 2: Rosenbrock's Function
To demonstrate the use of ak.polytopeMinimum, which can be found in PolytopeMinimum.js, we shall compare its accuracy to that of our random step hill climber, ak.blindfoldMinimum. Once again, we shall use Rosenbrock's function
\[ f(x, y) = 100\left(y-x^2\right)^2 + (1-x)^2 \]
which has a long, gently sloping curved valley, as illustrated by figure 2 in which larger values of the function are plotted as brighter points. It has a minimum of zero at \((1, 1)\) and is often used to test hill climbing algorithms because both shallow slopes and curved paths can be rather difficult for such algorithms to follow.

In program 1 both minimisers are initialised with an implementation of Rosenbrock's function, f, and a fixed number of steps, n. Note that to specify a fixed number of steps for ak.polytopeMinimum we must set the convergence threshold to zero.
They are then both set off from the same starting point with the same radius, albeit that having a slightly different meaning for the two algorithms.

Program 1: Using ak.polytopeMinimum

Clearly ak.polytopeMinimum is a tremendous improvement over ak.blindfoldMinimum whilst being no more difficult to use; an impressive result!
Nevertheless, random step algorithms still have their uses and we shall return to them in a future post.

References

[1] The Tripods Are Coming!, www.thusspakeak.com, 2015.

[2] Nelder, J. & Mead, R., A Simplex Method for Function Minimisation, Computer Journal, Vol. 7, pp. 308-313, 1965.

[3] That Shrinking Feeling, www.thusspakeak.com, 2014.

[4] It's All Downhill From Here, www.thusspakeak.com, 2014.

Leave a comment