Last time^{[1]} we discussed the polytope method^{[2]}, a multivariate function minimisation algorithm that seeks out a local minimum by stepping away from the worst of a set of points, most commonly a simplex; the multivariate generalisation of a triangle.
Loosely speaking, a local minimum is a point at which a function returns its smallest value in a small region containing that point. More formally, it can be defined as a vector \(\mathbf{x}\) that satisfies
We got as far as implementing an algorithm for generating regular simplices of any size and location with
Recall that the essential idea of the polytope method is to flip the worst point of a polytope through the midpoint of the \(n\) other points, as illustrated for two dimensions by figure 1 in which the worst point is marked with a solid circle, its replacement with a hollow one and the midpoint of the other points with a cross.
Noting that the old and new points are an equal distance from the midpoint but in opposite directions, the location of the new point is given by
If this new point is better than all of the old ones, then we try doubling the distance from \(\bar{\mathbf{x}}\) with
Finally, if \(\mathbf{x}_{n+1}\) is worse than \(\mathbf{x}_0\) we try shrinking the polytope. Firstly by trying a point halfway between \(\mathbf{x}_0\) and \(\bar{\mathbf{x}}\)
Mathematically, this is all perfectly reasonable but from a computational perspective, calculating the mean of all but the worst point at every step is an expense that we could well do without. Fortunately, we can easily avoid it by keeping track of the mean of every point in the polytope, \(\hat{\mathbf{x}}\). From this we can easily recover the mean of every point but \(\mathbf{x}_0\) with
Now, for all of the steps except the shrinking of every point of the polytope towards the best, we can update \(\hat{\mathbf{x}}\) with
The advantage of this approach is that it requires a constant number of six vector arithmetic operations to calculate \(\bar{\mathbf{x}}\) rather than the \(n+1\) that would be required otherwise.
Its arguments are the function that we want to minimise,
This out of the way, we return a function that takes a starting point,
Before we get to the implementation of
There are a couple of things to note about this function. Firstly, we are using
Next, we need to identify the best and worst points of the polytope which we shall do with the
Note that we deliberately use a greater than or equal comparison in the latter so that in the event that every element of
Specifically, we shall use the same trick that we used before^{[3][4]} of replacing NaNs with infinity so that they will be treated as greater than any finite number, as shown by listing 4.
Now this is a rather lengthy bit of code and so I think that it bears breaking into smaller parts to explain its operation.
requiring that it contains at least two points and that their mean is a vector of at least one dimension with all finite elements, implying that the same is true of them.
We then populate
The final step before entering the main loop of the algorithm is to identify the best and worst points with
The loop's condition is rather subtle since it's designed to cope with some tricky corner cases. The first term is simple enough; check that we haven't taken the maximum number of steps. The second, however, must cope with the possibility that the function's value at the best and worst points is infinite. In this case,
The first thing we do in the body of the loop is calculate \(\bar{\mathbf{x}}\), which we store in
after which we calculate the location of the new point and its function value with
Note that we don't replace NaN with infinity this time, so the rest of the loop must handle it correctly. Now if it's better than the best point we try expanding the step by adding
The first of these
If it's no better than the worst point we try shrinking the step with
Note that this time we check that
Finally, if the new point has a smaller function value than the worst point, which it won't have if it's NaN, we replace the worst point with the new point, updating the mean of the polytope and the best and worst points as we do so
Otherwise, we shrink the entire polytope towards the best point and recalculate its mean and best and worst points with
Now this time we do replace NaNs with infinity, since we're storing them in
Phew! This numerical stuff is a fiddly business!
To demonstrate the use of
In program 1 both minimisers are initialised with an implementation of Rosenbrock's function,
They are then both set off from the same starting point with the same radius, albeit that having a slightly different meaning for the two algorithms.
Clearly
Nevertheless, random step algorithms still have their uses and we shall return to them in a future post.
[2] Nelder, J. & Mead, R., A Simplex Method for Function Minimisation, Computer Journal, Vol. 7, pp. 308313, 1965.
[3] That Shrinking Feeling, www.thusspakeak.com, 2014.
[4] It's All Downhill From Here, www.thusspakeak.com, 2014.
Loosely speaking, a local minimum is a point at which a function returns its smallest value in a small region containing that point. More formally, it can be defined as a vector \(\mathbf{x}\) that satisfies
\[
\exists \epsilon>0 \quad \forall \mathbf{y} : \mathbf{y}\mathbf{x} \leqslant \epsilon \quad f(\mathbf{y}) \geqslant f(\mathbf{x})
\]
for a function \(f\), where \(\exists\) means there exists, \(\forall\) means for all, the colon means such that and the vertical bars stand for the magnitude, or length, of the vector between them. Since we're taking the magnitude of the difference between \(\mathbf{x}\) and \(\mathbf{y}\) here, it gives the distance between them.We got as far as implementing an algorithm for generating regular simplices of any size and location with
ak.simplex
and in this post we shall finish the job.
The Steps Of The Polytope Method
Figure 1: Flipping The Worst Point
Noting that the old and new points are an equal distance from the midpoint but in opposite directions, the location of the new point is given by
\[
\begin{align*}
\bar{\mathbf{x}} &= \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i\\
\mathbf{x}_{n+1} &= \bar{\mathbf{x}} + \left(\bar{\mathbf{x}}  \mathbf{x}_0\right) = 2\bar{\mathbf{x}}  \mathbf{x}_0
\end{align*}
\]
where \(\sum\) is the summation sign and \(\mathbf{x}_0\) is the worst point.If this new point is better than all of the old ones, then we try doubling the distance from \(\bar{\mathbf{x}}\) with
\[
\mathbf{x}_{n+2} = \bar{\mathbf{x}} + 2 \times \left(\bar{\mathbf{x}}  \mathbf{x}_0\right) = 3\bar{\mathbf{x}}  2\mathbf{x}_0
\]
and use this if it is better than \(\mathbf{x}_{n+1}\).Finally, if \(\mathbf{x}_{n+1}\) is worse than \(\mathbf{x}_0\) we try shrinking the polytope. Firstly by trying a point halfway between \(\mathbf{x}_0\) and \(\bar{\mathbf{x}}\)
\[
\mathbf{x}_{n+3} = \frac{\mathbf{x}_0 + \bar{\mathbf{x}}}{2}
\]
and, if that fails to be an improvement on \(\mathbf{x}_0\), by then shrinking all of the points towards the best point \(\mathbf{x}_n\) with
\[
\mathbf{x}_{i} = \frac{\mathbf{x}_i + \mathbf{x}_n}{2}
\]
for \(i\) from \(0\) to \(n1\).
A Note On Efficiency
Derivation 1: The Mean Of All But The Worst
By definition
\[
\hat{\mathbf{x}} = \frac{\sum_{i=0}^{n} \mathbf{x}_i}{n+1}
\]
and so
\[
\begin{align*}
&\left(\hat{\mathbf{x}}  \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n}\\
&\quad = \left(\frac{\sum_{i=0}^{n} \mathbf{x}_i}{n+1}  \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n}\\
&\quad = \frac{\sum_{i=1}^{n} \mathbf{x}_i}{n+1} \times \frac{n+1}{n}\\
&\quad = \frac{\sum_{i=1}^{n} \mathbf{x}_i}{n}\\
&\quad = \bar{\mathbf{x}}
\end{align*}
\]
\[
\bar{\mathbf{x}} = \left(\hat{\mathbf{x}}  \frac{\mathbf{x}_0}{n+1}\right) \times \frac{n+1}{n}
\]
as shown by derivation 1.Now, for all of the steps except the shrinking of every point of the polytope towards the best, we can update \(\hat{\mathbf{x}}\) with
\[
\hat{\mathbf{x}} + \frac{\mathbf{x}_{n+1}  \mathbf{x}_0}{n+1}
\]
where \(\mathbf{x}_{n+1}\) is the new point, which follows from much the same argument as that made for \(\bar{\mathbf{x}}\).The advantage of this approach is that it requires a constant number of six vector arithmetic operations to calculate \(\bar{\mathbf{x}}\) rather than the \(n+1\) that would be required otherwise.
ak.polytopeMinimum
The implementation of the polytope method follows our usual scheme of returning a new function that searches for a local minimum from a given starting point, as shown byak.polytopeMinimum
in listing 1.Listing 1: ak.polytopeMinimum
ak.polytopeMinimum = function(f, threshold, steps) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('invalid function in ak.polytopeMinimum'); } threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? Math.pow(ak.EPSILON, 0.75) : Math.abs(threshold); if(isNaN(threshold)) { throw new Error('invalid convergence threshold in ak.polytopeMinimum'); } steps = ak.nativeType(steps)===ak.UNDEFINED_T ? ak.INFINITY : ak.floor(Math.abs(steps)); if(isNaN(steps)) { throw new Error('invalid number of steps in ak.polytopeMinimum'); } return function(x, r) { return minimum(f, ak.nativeType(x)===ak.ARRAY_T ? x : ak.simplex(x,r), threshold, steps); }; };
Its arguments are the function that we want to minimise,
f
, a convergence threshold
to specify the desired accuracy of the result and a maximum number of steps
. After ensuring that f
is indeed a function, we provide defaults of three quarters of the digits of a double precision number for the threshold
argument and infinity for the steps
argument if they are undefined, throwing exceptions should they not be numbers.This out of the way, we return a function that takes a starting point,
x
, and radius, r
and uses them to construct a simplex that is then passed to the helper function minimum
to do the actual minimisation. Note that if the former is a number and/or the latter is undefined, then ak.simplex
will use its defaults of the origin and one respectively. Furthermore, we allow the user to specify their own initial polytope rather than use one generated by ak.simplex
by passing it an array of ak.vector
objects.Before we get to the implementation of
minimum
there are a few helper functions that we shall need. Firstly, we'll need to calculate the mean of the vectors in the polytope so that we can use our trick for speeding up the calculation of the point through which we'll flip the worst point, as is done by meanElem
in listing 2.Listing 2: meanElem
function meanElem(p) { var n = p.length; var mean = ak.div(p[0], n); var i; for(i=1;i<n;++i) mean = ak.add(mean, ak.div(p[i], n)); return mean; }
There are a couple of things to note about this function. Firstly, we are using
n
to indicate the number of points in the polytope rather than the number of points other than the worst. We shall continue to do so throughout the implementation and so must adjust our formulae accordingly. Secondly, we are dividing each vector by n
rather than their sum to eliminate the possibility of numerical overflow, a protective measure that would nearly double the cost of calculating \(\bar{\mathbf{x}}\) directly, making our optimisation even more attractive.Next, we need to identify the best and worst points of the polytope which we shall do with the
minElem
and maxElem
functions given in listing 3, in both of which the fp
argument is expected to be an array of the function's values at each point in the polytope.Listing 3: minElem And maxElem
function minElem(fp) { var n = fp.length; var m = 0; var i; for(i=1;i<n;++i) if(fp[i]<fp[m]) m = i; return m; } function maxElem(fp) { var n = fp.length; var m = 0; var i; for(i=1;i<n;++i) if(fp[i]>=fp[m]) m = i; return m; }
Note that we deliberately use a greater than or equal comparison in the latter so that in the event that every element of
fp
is equal it will identify a different point than the former. Of course, this will fail in the presence of NaNs, for which every comparison other than not equal is false, so we must be careful to deal with them appropriately in minimum
.Specifically, we shall use the same trick that we used before^{[3][4]} of replacing NaNs with infinity so that they will be treated as greater than any finite number, as shown by listing 4.
Listing 4: minimum
function minimum(f, p, eps, steps) { var step = 0; var n = p.length; var fp = new Array(n); var mean, i, min, max, m, d, y, z, fy, fz; if(n<2) throw new Error('invalid polytope in ak.polytopeMinimum'); mean = meanElem(p); if(ak.type(mean)!==ak.VECTOR_T  mean.dims()===0) { throw new Error('invalid polytope in ak.polytopeMinimum'); } for(i=0;i<mean.dims() && isFinite(mean.at(i));++i); if(i<mean.dims()) throw new Error('invalid polytope in ak.polytopeMinimum'); for(i=0;i<n;++i) if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY; min = minElem(fp); max = maxElem(fp); while(step++<steps && !(ak.diff(fp[min],fp[max])<=eps) && ak.diff(p[min],p[max])>eps) { m = ak.mul(ak.sub(mean, ak.div(p[max], n)), n/(n1)); d = ak.sub(m, p[max]); y = ak.add(m, d); fy = f(y); if(fy<fp[min]) { z = ak.add(y, d); fz = f(z); if(fz<fy) { y = z; fy = fz; } } else if(!(fy<fp[max])) { y = ak.div(ak.add(p[max], m), 2); fy = f(y); } if(fy<fp[max]) { mean = ak.add(mean, ak.div(ak.sub(y, p[max]), n)); p[max] = y; fp[max] = fy; if(fy<fp[min]) min = max; max = maxElem(fp); } else { for(i=0;i<n;++i) { if(i!==min) { p[i] = ak.div(ak.add(p[i], p[min]), 2); if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY; } } mean = meanElem(p); min = minElem(fp); max = maxElem(fp); } } return !isNaN(f(p[min])) ? p[min] : ak.vector(x.dims(), ak.NaN); }
Now this is a rather lengthy bit of code and so I think that it bears breaking into smaller parts to explain its operation.
The Control Flow In Detail
Firstly, we create an array,fp
, to store the values of the function f
at the points of the polytope p
as the algorithm progresses. Next we check that the polytope is valid withif(n<2) throw new Error('invalid polytope in ak.polytopeMinimum'); mean = meanElem(p); if(ak.type(mean)!==ak.VECTOR_T  mean.dims()===0) { throw new Error('invalid polytope in ak.polytopeMinimum'); } for(i=0;i<mean.dims() && isFinite(mean.at(i));++i); if(i<mean.dims()) throw new Error('invalid polytope in ak.polytopeMinimum');
We then populate
fp
with the function values of each point, replacing NaNs with infinity as described above, withfor(i=0;i<n;++i) if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY;
min = minElem(fp); max = maxElem(fp);
ak.diff
will return NaN and so we check that it is not less than or equal to eps
so that we can at least try to proceed. The call to ak.diff
in the third term will similarly return NaN if both the best and worst points have any infinite elements which, since we can't sensibly proceed under such circumstances, should terminate the loop.The first thing we do in the body of the loop is calculate \(\bar{\mathbf{x}}\), which we store in
m
, and the vector from \(\mathbf{x}_0\) to \(\bar{\mathbf{x}}\), which we store in d
m = ak.mul(ak.sub(mean, ak.div(p[max], n)), n/(n1)); d = ak.sub(m, p[max]);
y = ak.add(m, d); fy = f(y);
d
to y
withif(fy<fp[min]) { z = ak.add(y, d); fz = f(z); if(fz<fy) { y = z; fy = fz; } }
if
statements will fail if fy
is NaN, ensuring that we don't try to extend a step to a NaN value, and the second will fail if fz
is NaN, ensuring that we don't keep an extension that yields one.If it's no better than the worst point we try shrinking the step with
else if(!(fy<fp[max])) { y = ak.div(ak.add(p[max], m), 2); fy = f(y); }
fy
is not less than fp[max]
so that, if the former is NaN, this step will be tried. Once again, we do not bother replacing NaN with infinity, relying upon the remaining code to do the right thing.Finally, if the new point has a smaller function value than the worst point, which it won't have if it's NaN, we replace the worst point with the new point, updating the mean of the polytope and the best and worst points as we do so
mean = ak.add(mean, ak.div(ak.sub(y, p[max]), n)); p[max] = y; fp[max] = fy; if(fy<fp[min]) min = max; max = maxElem(fp);
for(i=0;i<n;++i) { if(i!==min) { p[i] = ak.div(ak.add(p[i], p[min]), 2); if(isNaN(fp[i] = f(p[i]))) fp[i] = ak.INFINITY; } } mean = meanElem(p); min = minElem(fp); max = maxElem(fp);
fp
and the logic assumes that none of its elements are NaNs.Phew! This numerical stuff is a fiddly business!
Using ak.polytopeMinimum
Figure 2: Rosenbrock's Function
ak.polytopeMinimum
, which can be found in PolytopeMinimum.js, we shall compare its accuracy to that of our random step hill climber, ak.blindfoldMinimum
. Once again, we shall use Rosenbrock's function
\[
f(x, y) = 100\left(yx^2\right)^2 + (1x)^2
\]
which has a long, gently sloping curved valley, as illustrated by figure 2 in which larger values of the function are plotted as brighter points. It has a minimum of zero at \((1, 1)\) and is often used to test hill climbing algorithms because both shallow slopes and curved paths can be rather difficult for such algorithms to follow.In program 1 both minimisers are initialised with an implementation of Rosenbrock's function,
f
, and a fixed number of steps, n
. Note that to specify a fixed number of steps for ak.polytopeMinimum
we must set the convergence threshold to zero.They are then both set off from the same starting point with the same radius, albeit that having a slightly different meaning for the two algorithms.
Program 1: Using ak.polytopeMinimum



Clearly
ak.polytopeMinimum
is a tremendous improvement over ak.blindfoldMinimum
whilst being no more difficult to use; an impressive result!Nevertheless, random step algorithms still have their uses and we shall return to them in a future post.
References
[1] The Tripods Are Coming!, www.thusspakeak.com, 2015.[2] Nelder, J. & Mead, R., A Simplex Method for Function Minimisation, Computer Journal, Vol. 7, pp. 308313, 1965.
[3] That Shrinking Feeling, www.thusspakeak.com, 2014.
[4] It's All Downhill From Here, www.thusspakeak.com, 2014.
Leave a comment