The Tripods Are Coming!

Some time ago[1] we took a first look at multivariate function minimisation in which we try to find a point at which a two or more argument function has a local minimum, or in other words a point at which it returns a value no greater than it does at any nearby points. Formally speaking, a vector $$\mathbf{x}$$ is a local minimum of a multivariate function $$f$$ if and only if
$\exists \epsilon>0 \quad \forall \mathbf{y} : |\mathbf{y}-\mathbf{x}| \leqslant \epsilon \quad f(\mathbf{y}) \geqslant f(\mathbf{x})$
where $$\exists$$ means there exists, $$\forall$$ means for all, the colon means such that and the vertical bars stand for the magnitude, or length, of the vector between them and so in this case is the distance between $$\mathbf{x}$$ and $$\mathbf{y}$$.

The approach that we took was that of a blindfolded hill climber; take a tentative step in some direction and if it leads to a better place commit to it, otherwise stay where you are and try again. Our ak.blindfoldMinimum implemented the simplest of such hill climbing algorithms, choosing each trial step at random. We concluded that we should do much better if we chose those steps more intelligently and in this post we shall do just that.

An Educated Guess

Figure 1: Rotating The Triangle
Clearly we cannot ever be certain that a step will lead closer to a local minimum unless we already know where one is, in which case we don't need to search for it. The goal, therefore, is to exploit whatever information we might have about the function to choose one that is at least likely to be an improvement.
The simplest way to do this is to keep track of more than one point and step away from the worst of them. In two dimensions, we can do this with three points by rotating the triangle made by connecting them through a half turn about the midpoint of the edge opposite the worst point, as illustrated by figure 1 in which the worst point and its replacement are marked with a filled and an unfilled circle respectively and the midpoint of the line is marked with a cross.

To work out the position of the new point, simply note that it and the worst point are an equal distance from the point of rotation, but in exactly opposite directions. If we name the three points $$\mathbf{x}_0$$, $$\mathbf{x}_1$$ and $$\mathbf{x}_2$$, with $$\mathbf{x}_0$$ being the worst, the new point is therefore given by
$\mathbf{x}_3 = \frac{\mathbf{x}_1+\mathbf{x}_2}{2} + \left(\frac{\mathbf{x}_1+\mathbf{x}_2}{2} - \mathbf{x}_0\right) = \mathbf{x}_1+\mathbf{x}_2-\mathbf{x}_0$
To find a local minimum we might try simply repeating this transformation until the new point is no better than the old one. However, this runs the risk of the algorithm getting stuck because the step jumped over a better point. A more effective reaction is to shrink the triangle towards the best of the three points which, assuming that it's $$\mathbf{x}_2$$, we can do with
\begin{align*} \mathbf{x}_0 &= \frac{\mathbf{x}_0 + \mathbf{x}_2}{2}\\ \mathbf{x}_1 &= \frac{\mathbf{x}_1 + \mathbf{x}_2}{2} \end{align*}
Unfortunately, we now run the risk that subsequent steps will be too small, slowing down the progress of the algorithm. We can counter this possibility by increasing the size of the triangle after successful steps. Specifically, if the new point is better than all three of the old points we can try doubling its distance from the point of rotation with
\begin{align*} \mathbf{x}_4 &= \frac{\mathbf{x}_1+\mathbf{x}_2}{2} + 2 \times \left(\frac{\mathbf{x}_1+\mathbf{x}_2}{2} - \mathbf{x}_0\right)\\ &= \tfrac{3}{2}\left(\mathbf{x}_1+\mathbf{x}_2\right)-2\mathbf{x}_0\\ &= \tfrac{3}{2}\mathbf{x}_3-\tfrac{1}{2}\mathbf{x}_0 \end{align*}
Figure 2: Rosenbrock's Function
and replacing $$\mathbf{x}_3$$ with $$\mathbf{x}_4$$ if it's even better.

To show this algorithm in action, we shall again use Rosenbrock's function
$f(x, y) = 100\left(y-x^2\right)^2 + (1-x)^2$
which has a long, curved, shallow sloped valley, as shown by figure 2 in which brighter points represent larger values of the function. This has a minimum of zero at $$(1,1)$$ and is often used to test hill climbing algorithms because such gently descending curved paths are quite difficult to follow.

To demonstrate the effectiveness of this approach, program 1 compares its progress to that of a random step hill climber like ak.blindfoldMinimum, with the path taken by it plotted in green and the path taken by the random step hill climber plotted in red.

Program 1: Minimising Rosenbrock's Function

Well I don't know about you, but I'm impressed! With these deliberately chosen steps we almost always reach the vicinity of the minimum before the randomly chosen steps even reach the valley floor.

The Polytope Method

This is more or less Nelder and Mead's simplex method[3] which, following Press et al's[4] lead, we shall henceforth refer to as the polytope method to distinguish it from another algorithm that we shall cover in a future post. The appropriateness of these names stems from the fact that a simplex is the multidimensional generalisation of a triangle and a polytope is the multidimensional generalisation of a polygon. For example, a tetrahedron, or triangular pyramid, is a three dimensional simplex and a cube is a three dimensional polytope.

Rather than rotate the polytope in three or more dimensions, we shall reflect the worst point, $$\mathbf{x}_0$$, through the average of the $$n$$ other points with a generalisation of the formula that we used before
\begin{align*} \bar{\mathbf{x}} &= \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i\\ \mathbf{x}_{n+1} &= \bar{\mathbf{x}} + \left(\bar{\mathbf{x}} - \mathbf{x}_0\right) = 2\bar{\mathbf{x}} - \mathbf{x}_0 \end{align*}
Here $$\sum$$ is the summation sign, denoting the sum of the term to its right over the range of $$i$$ indicated by the terms below and above it and so $$\bar{\mathrm{x}}$$ is consequently that average.

If $$\mathbf{x}_{n+1}$$ is better than every point in the polytope we shall again try taking a longer step with
$\mathbf{x}_{n+2} = \bar{\mathbf{x}} + 2 \times \left(\bar{\mathbf{x}} - \mathbf{x}_0\right) = 3\bar{\mathbf{x}} - 2\mathbf{x}_0$
To shrink the polytope towards the best point, $$\mathbf{x}_n$$, we simply set
\begin{align*} \mathbf{x}_i = \frac{\mathbf{x}_i + \mathbf{x}_n}{2} \end{align*}
for each $$i$$ from $$0$$ to $$n-1$$, but since this then requires multiple, potentially expensive, re-evaluations of the function we shall first try the point halfway between $$\mathbf{x}_0$$ and $$\bar{\mathbf{x}}$$
$\mathbf{x}_{n+3} = \frac{\mathbf{x}_0 + \bar{\mathbf{x}}}{2}$
whenever $$\mathbf{x}_{n+1}$$ is worse than $$\mathbf{x}_0$$ and only shrink the whole polytope if this too is worse than $$\mathbf{x}_0$$.

Constructing A Simplex

Whilst the formulae defining the various steps are applicable to any polytope, we shall typically want to keep the number of function evaluations to a minimum and to do that we should choose a simplex.
Constructing an $$n$$ dimensional simplex using trigonometry such as we did for a two dimensional simplex in program 1 with the lines

var xt0 = xr0 + d*Math.cos(2*ak.PI*0/3);
var yt0 = yr0 + d*Math.sin(2*ak.PI*0/3);

var xt1 = xr0 + d*Math.cos(2*ak.PI*1/3);
var yt1 = yr0 + d*Math.sin(2*ak.PI*1/3);

var xt2 = xr0 + d*Math.cos(2*ak.PI*2/3);
var yt2 = yr0 + d*Math.sin(2*ak.PI*2/3);

requires some careful bookkeeping about through what angles and in which planes we should rotate each point. Fortunately, there's a much simpler way to do it; start with an $$n-1$$ dimensional simplex and raise up a point in the $$n^{th}$$ dimension from its midpoint. For example, a line of unit length along the $$x$$ axis is a one dimensional simplex with a midpoint at $$x$$ equal to one half. To create a two dimensional simplex we must therefore place a new point at $$(\frac{1}{2}, y)$$ for some value of $$y$$ that we can find by using Pythagoras's theorem, which states that if a right-angled triangle has a hypotenuse, the side opposite the right angle, of length $$c$$ and two other sides of length $$a$$ and $$b$$ then
Figure 3: Finding y
$a^2 + b^2 = c^2$
Specifically, this means that we must have
\begin{align*} 1^2 &= \tfrac{1}{2}^2 + y^2\\ y^2 &= 1 - \tfrac{1}{4} = \tfrac{3}{4}\\ y &= \frac{\sqrt{3}}{2} \end{align*}
as illustrated by figure 3. To construct a three dimensional simplex from this we first need to work out the location of its midpoint. By construction, it must have the same $$x$$ value of one half as our one dimensional simplex. For its $$y$$ value, we simply note that the points in the one dimensional simplex trivially have a $$y$$ value of zero and so it must be the above value divided by three.
To find the $$z$$ value of the new point we again apply Pythagoras's theorem with a hypotenuse of one and the other two sides having lengths of $$z$$ and the distance from the midpoint to the other points. The latter value is made easier to calculate by the first point being at the origin, with every value equal to zero, and so is simply the magnitude of the midpoint vector
$\left(\left(\frac{1}{2}\right)^2 + \left(\frac{\sqrt{3}}{6}\right)^2\right)^\frac{1}{2} = \left(\frac{1}{4} + \frac{3}{36}\right)^\frac{1}{2} = \left(\frac{12}{36}\right)^\frac{1}{2} = \frac{1}{\sqrt{3}}$
The $$z$$ value of the next point is therefore given by
\begin{align*} 1^2 &= \left(\frac{1}{\sqrt{3}}\right)^2 + z^2\\ z^2 &= 1 - \tfrac{1}{3} = \tfrac{2}{3}\\ z &= \frac{\sqrt{2}}{\sqrt{3}} \end{align*}
and so its coordinates are
$\left(\frac{1}{2}, \frac{\sqrt{3}}{6}, \frac{\sqrt{2}}{\sqrt{3}}\right)$
and the midpoint of the three dimensional simplex is given by dividing its $$z$$ value by four
$\left(\frac{1}{2}, \frac{\sqrt{3}}{6}, \frac{\sqrt{2}}{4\sqrt{3}}\right)$
To construct an $$n$$ dimensional simplex from an $$n-1$$ dimensional simplex we simply follow the same procedure; set all but the last element of the new point to those of the current midpoint and use Pythagoras's theorem with a hypotenuse of one and one of the other sides having a length equal to the magnitude of the current midpoint to find the value of the last element. We then update the midpoint by dividing the last element of the new point by the number of dimensions plus one.
Finally, we subtract the midpoint from each of the points to centre the simplex at the origin and divide them by its magnitude to scale its radius to one. We may then multiply them by a desired radius and add to them a desired midpoint to yield a regular simplex of any number of dimensions, of any size and at any position that we might require.

A neat trick, I'm sure you'll agree!

ak.simplex

Given that we only change the last element of the midpoint vector at each step, we can save ourselves some work by also keeping track of the sum of the squares of its elements as we progress from dimension to dimension, adding the square of the new coordinate to it at each step, as is done with r2 by ak.simplex in listing 1, the definition of which can be found in Simplex.js.

Listing 1: ak.simplex
ak.simplex = function(x, r) {
var polytope = [];
var r2 = 0;
var n, o, p, i;

if(ak.nativeType(x)===ak.NUMBER_T) x = ak.vector(x, 0);
if(ak.type(x)!==ak.VECTOR_T || x.dims()===0) {
throw new Error('invalid location in ak.simplex');
}

n = x.dims();
for(i=0;i<n && isFinite(x.at(i));++i);
if(i<n) throw new Error('invalid location in ak.simplex');

r = ak.nativeType(r)===ak.UNDEFINED_T ? 1 : Math.abs(r);
if(!isFinite(r)) throw new Error('invalid radius in ak.simplex');

o = ak.vector(n, 0);
polytope.push(o);
o = o.toArray();

for(i=0;i<n;++i) {
o[i] = Math.sqrt(1-r2);
polytope.push(ak.vector(o));
o[i] /= i+2;
r2 += o[i]*o[i];
}
o = ak.vector(o);
r /= Math.sqrt(r2);
for(i=0;i<=n;++i) polytope[i] = ak.add(x, ak.mul(r, ak.sub(polytope[i], o)));
return polytope;
};


The arguments x and r specify the location and radius of the simplex, defaulting to the origin and one respectively if undefined. Note that for the default location x is required to be the number of dimensions of the function's vector argument. If specified, the location must be a non-empty ak.vector[5] with finite elements and the radius must be a finite number.
The points of the simplex are stored in the polytope array as ak.vector objects and the midpoint's elements are maintained in the array o. Note that after we use the latter to add a point to the polytope we divide its most recently assigned element by i+2 to update the midpoint since at that stage that is the number of points it contains.
The final loop before we return the polytope array performs the scaling and translation to the radius r and location x as described above.

Program 2 prints the points returned by ak.simplex, their mean, their minimum and maximum distances from the location, x, and their minimum and maximum distances from each other.

Program 2: ak.simplex

I think that this is a pretty convincing demonstration that the polytope returned by ak.simplex is a regular simplex having the specified location and radius, at least to within rounding error.
All that's left to do is to use it to help implement a general purpose function minimiser based on the polytope method and we shall do so in the next post.

References

[1] It's All Downhill From Here, www.thusspakeak.com, 2014.

[2] What Are The Chances Of That?, www.thusspakeak.com, 2014.

[3] Nelder, J. & Mead, R., A Simplex Method for Function Minimisation, Computer Journal, Vol. 7, pp. 308-313, 1965.

[4] Press, W.H. et al, Numerical Recipes in C (2nd ed.), Cambridge University Press, 1992.

[5] What's Our Vector Victor?, www.thusspakeak.com, 2014.