Rooting Around For Answers

| 0 Comments

A common task in numerical computing is to approximate the inverse of a function. Specifically, given a function \(f\) such that
\[ y = f(x) \]
we shall often want to find a function \(f^{-1}\) such that
\[ x = f^{-1}(y) \]
We have already seen how useful the inverse of the cumulative distribution function, or CDF, was when we wanted to generate random numbers having a given characteristic function[1]. In that case, given that the CDF was a linear interpolation it was reasonable to approximate the inverse CDF in the same fashion. In contrast, for the normal distribution[2] we used explicit approximation formulae for different regions of the inverse CDF, filled to the rafters with magic numbers.

Neither of these approaches are particularly attractive in general; interpolation demands significant trade-offs between accuracy and efficiency and working out the explicit formula for an approximation to an inverse demands a lot of hard work. Much better then to seek a general purpose algorithmic solution.

Downs To The Left Of Me, Overs To The Right... Here I Am!

Let us suppose that the function in question is continuous and we just happen to have a pair of values \(a\) and \(b\) such that
Figure 1: The Intermediate Value Theorem
\[ \begin{align*} f(a) &\leqslant y\\ f(b) &\geqslant y \end{align*} \]
known as a bracket of \(y\). By the intermediate value theorem we can state with certainty that there is some value \(c\) between \(a\) and \(b\) for which
\[ f(c) = y \]
as illustrated by figure 1, in which there are actually three values at which the graph passes through zero between the negative left hand side and the positive right hand side.

Trivially, the value of the function at the midpoint of \(a\) and \(b\) must be less than, equal to or greater than \(y\). Now this is extremely suggestive of a simple scheme for finding a smaller bracket of \(y\); if the value of the function at the midpoint is smaller than \(y\) replace \(a\) with the midpoint and if its value is larger replace \(b\) instead.
If its value is equal to \(y\) we are, of course, done, but in the vastly more likely case that it isn't we can simply keep repeating the process until it is, or at least until it's close enough.

The Joy Of Bisects

This algorithm is known as the bisection method and is typically used to find roots of arbitrary functions, being those values for which they return zero. Thankfully, generalising root finding algorithms to inversion algorithms is a trivial matter of applying the them to functions of the form
\[ g(x) = f(x) - y \]
although we shall generally do so by adjusting the algorithm rather than creating such functions explicitly.

Now the bisection method is unusually simple for a numerical algorithm. So much so that we can illustrate it with a worked example; find \(x\) such that \(x^2=2\), or in other words calculate the square root of two. Starting with a bracket of \([1.3, 1.6]\) and working to three significant figures, we progress as follows

\(i\)      \(a\)      \(b\)      \(c\)      \(c^2\)      \(\therefore\)
\(1\)      \(1.30\)      \(1.60\)      \(1.45\)      \(2.10\)      \(c \to b\)
\(2\)      \(1.30\)      \(1.45\)      \(1.38\)      \(1.90\)      \(c \to a\)
\(3\)      \(1.38\)      \(1.45\)      \(1.42\)      \(2.02\)      \(c \to b\)
\(4\)      \(1.38\)      \(1.42\)      \(1.40\)      \(1.96\)      \(c \to a\)
\(5\)      \(1.40\)      \(1.42\)      \(1.41\)      \(1.99\)      \(\Box\)

At the \(i^\mathrm{th}\) step, \(c\) is the midpoint of \(a\) and \(b\) and the \(\therefore\) column shows either how we should update the bracket for the next step or, with \(\Box\), that the algorithm should terminate.

This is an incredibly stable algorithm; the repeated halving of the bracket guarantees that its width must eventually converge to that that can be distinguished at the precision to which we are working. We must be a little careful at the limits of that precision however. Taking our worked example a few steps further, we have

\(i\)      \(a\)      \(b\)      \(c\)      \(c^2\)      \(\therefore\)
\(5\)      \(1.40\)      \(1.42\)      \(1.41\)      \(1.99\)      \(c \to a\)
\(6\)      \(1.41\)      \(1.42\)      \(1.42\)      \(2.02\)      \(c \to b\)
\(7\)      \(1.41\)      \(1.42\)      \(1.42\)      \(2.02\)      \(c \to b\)
\(8\)      \(1.41\)      \(1.42\)      \(1.42\)      \(2.02\)      \(c \to b\)
\(\vdots\)      \(\vdots\)      \(\vdots\)      \(\vdots\)      \(\vdots\)      \(\vdots\)

We can easily avoid such infinite regress by terminating the algorithm if the midpoint is equal to either of the endpoints of the bracket. We can also sacrifice accuracy to improve performance by terminating once the width of the bracket falls below some given threshold or if the difference between the value of the function at the midpoint and the desired value does so.

And there's more! Even if the function is discontinuous the bisection method will converge, albeit to the discontinuity rather than the inverse. This follows from the fact that it ensures that the bracket always has one value at which the function is less than or equal to the required value and one at which it is greater than or equal to it. For example, let us try to find \(x\) in the bracket \([-1, 2]\) where \(\frac{1}{x}=0\).

\(i\)      \(a\)      \(b\)      \(c\)      \(\frac{1}{c}\)      \(\therefore\)
\(1\)      \(-1.00\)      \(2.00\)      \(0.50\)      \(2.00\)      \(c \to b\)
\(2\)      \(-1.00\)      \(0.50\)      \(-0.25\)      \(-4.00\)      \(c \to a\)
\(3\)      \(-0.25\)      \(0.50\)      \(0.13\)      \(7.69\)      \(c \to b\)
\(4\)      \(-0.25\)      \(0.13\)      \(-0.06\)      \(-16.67\)      \(c \to a\)
\(5\)      \(-0.06\)      \(0.13\)      \(0.04\)      \(25.00\)      \(c \to b\)
\(6\)      \(-0.06\)      \(0.04\)      \(-0.01\)      \(-100.00\)      \(c \to a\)
\(7\)      \(-0.01\)      \(0.04\)      \(0.02\)      \(50.00\)      \(c \to b\)
\(8\)      \(-0.01\)      \(0.02\)      \(0.01\)      \(100.00\)      \(c \to b\)
\(9\)      \(-0.01\)      \(0.01\)      \(0.00\)      \(\infty\)      \(c \to b\)
\(10\)      \(-0.01\)      \(0.00\)      \(0.00\)      \(\infty\)      \(\Box\)

This verges on embarrassingly stable; it sets the standard for resistance to approximation error rather higher than we shall usually be able to meet. Unfortunately it does rather depend upon our having an initial bracket and it's not entirely obvious how we can be sure to find one. By which I mean impossible, of course.

Sniffing Out A Bracket

A simple way to search for an initial bracket is to simply pick an arbitrary pair of points and then move them progressively further apart until they bracket the target value. Given that we have no idea where the inverse actually is, a reasonable heuristic is to double the distance we move them at each step
\[ \begin{align*} a - \Delta &\to a\\ b + \Delta &\to b\\ 2 \times \Delta &\to \Delta \end{align*} \]
stopping the instant we find a bracket and erroring out if \(\Delta\) overflows. Note that it is quite possible for this scheme to step past an inverse entirely if the function goes from one side of the target to the other within a region smaller than the step size. Unfortunately, there's really not much we can do about it.

The ak.bracketInverse function, given in listing 1, implements this scheme.

Listing 1: ak.bracketInverse
ak.bracketInverse = function(f, y, hint) {
 if(ak.nativeType(f)!==ak.FUNCTION_T) {
  throw new Error('invalid function in ak.bracketInverse');
 }

 if(!isFinite(y)) {
  throw new Error('invalid target in ak.bracketInverse');
 }

 hint = initHint(hint);
 return growHint(f, y, hint);
};

The initHint allows the user some flexibility in the first guess at a bracket, as shown by listing 2.

Listing 2: initHint
function initHint(hint) {
 var hint_t = ak.nativeType(hint);
 var dx = 0.125;

 if(hint_t===ak.UNDEFINED_T) {
  hint = [-dx, dx];
 }
 else if(hint_t===ak.NUMBER_T) {
  dx *= Math.max(1, Math.abs(hint));
  hint = [hint-dx, hint+dx];
 }
 else if(hint_t!==ak.ARRAY_T
      || hint.length!==2
      || ak.nativeType(hint[0])!==ak.NUMBER_T
      || ak.nativeType(hint[1])!==ak.NUMBER_T
      || hint[0]===hint[1]) {
  throw new Error('invalid hint in ak.bracketInverse');
 }
 return hint;
}

So if the user doesn't provide a hint, we shall use the interval \([-0.125, 0.125]\) by default. If, on the other hand, the user supplies a single number, \(x\), we use the wider of the intervals \([x-0.125, x+0.125]\) and \([x-0.125|x|, x+0.125|x|]\) by multiplying our default dx by the larger of \(1\) and \(|x|\), the absolute value of \(x\). Finally, if the user supplies an interval in the form of an array we simply enforce that it has two different numeric elements.
Having so ensured that we have a reasonable, or at least not entirely unreasonable, initial hint, the growHint function expands it according to our heuristic, as shown in listing 3.

Listing 3: growHint
function val(x) {
 return (ak.nativeType(x)!==ak.ARRAY_T) ? x : x[0];
}

function growHint(f, y, hint) {
 var a = Math.min(hint[0], hint[1]);
 var b = Math.max(hint[0], hint[1]);
 var d = (b-a)/2;

 var fa = val(f(a));
 var fb = val(f(b));

 while(((fa<y && fb<y) || (fa>y && fb>y)) && isFinite(d)) {
  a -= d;
  fa = val(f(a));

  if((fa<y && fb<y) || (fa>y && fb>y)) {
   b += d;
   fb = val(f(b));
  }
  d *= 2;
 }
 if(!isFinite(a) || !isFinite(b)) {
  throw new Error('unable to find bracket in ak.bracketInverse');
 }
 if(!(fa<=y && fb>=y) && !(fa>=y && fb<=y)) {
  throw new Error('unable to find bracket in ak.bracketInverse');
 }
 return [a, b];
}

Note that the loop condition and the final test that a bracket was, in fact, found have both been structured so as to generate an error as quickly as possible in the event that f returns NaN.
The only slightly unusual aspect of this implementation, which can be found in BracketInverse.js, is that we make provision for the function f to return an array rather than a number, in which case we simply treat its first element as its value. The reason why we should wish to do so shall be made clear in the next post.

ak.bisectInverse

We use ak.bracketInverse to ensure that we have a valid initial bracket for the bisection method, or at least that we shall emit an error in the event that we are unable to find one, as is done in the ak.bisectInverse function given in listing 4.

Listing 4: ak.bisectInverse
ak.bisectInverse = function(f, threshold) {
 threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? Math.pow(ak.EPSILON, 0.75)
                                                       : Math.abs(threshold);

 if(isNaN(threshold)) {
  throw new Error('invalid convergence threshold in ak.bisectInverse');
 }

 return function(y, hint) {
         return inverse(f, y, ak.bracketInverse(f, y, hint), threshold);
        };
};

This takes a function f and an optional convergence threshold, using the floating point epsilon raised to the power of \(0.75\) if it isn't supplied. It then returns a function implementing a bisection method approximation of the function's inverse as implemented by the inverse function given in listing 5.

Listing 5: inverse
function inverse(f, y, x, eps) {
 var fx = [f(x[0]), f(x[1])];
 var m;

 if(ak.diff(fx[0], y)<=eps || ak.diff(fx[1], y)<=eps) {
  return Math.abs(fx[0]-y)<Math.abs(fx[1]-y) ? x[0] : x[1];
 }

 if(fx[0]>y) {
  m =  x[0];  x[0] =  x[1];  x[1] = m;
  m = fx[0]; fx[0] = fx[1]; fx[1] = m;
 }

 m = bisect(x, fx, y);

 while(ak.diff(x[0], x[1])>2*eps) {
  update(x, fx, m, f(m), y, eps);
  m = bisect(x, fx, y);
 }
 return m;
}

The first thing that we do is to terminate the algorithm immediately if the value of the function at either end of the bracket falls within the user specified, or default, error threshold. Note that we're using the normalised difference, as implemented by ak.diff[3].
In the event that we haven't stuck starting point gold with a reasonable approximation to the inverse, we ensure that the first element of the bracket has a value of the function that is less than y and the second element consequently has one that is greater.
We then enter the body of the algorithm, bisecting the bracket with bisect and updating it with update, both of which are given in listing 6.

Listing 6: bisect And update
function bisect(x, fx, y) {
 var m = x[0]!==x[1] ? x[0]/2 + x[1]/2 : x[0];

 if(m===x[0] || m===x[1]) m = x[0] = x[1] = y-fx[0]<fx[1]-y ? x[0] : x[1];
 return m;
}

function update(x, fx, m, fm, y, eps) {
 var i = fm>y ? 1 : 0;

 if(ak.diff(fm, y)>eps) {
  x[i]  = m;
  fx[i] = fm;
 }
 else {
  x[0] = x[1] = isNaN(fm) ? ak.NaN : m;
 }
}

Note that the reason that bisect is rather more convoluted than might be expected is that it has been designed to behave itself when confronted with the corner cases of numerical error.
Firstly, to avoid overflow we add halfs of the endpoints of the bracket rather than halve their sum. Unfortunately, if the endpoints are equal it is possible that rounding error will raise its ugly head and yield a midpoint that isn't equal to them, so we must treat that as a special case.
Secondly, if the endpoints are sufficiently close to each other then their midpoint won't be representable with floating point and it will consequently be equal to one or the other of them. In that event we should want to terminate the algorithm at the endpoint that evaluates closest to the target value, which we do by setting the midpoint m and both elements of the bracket to that endpoint, ensuring that the while loop in inverse will exit at the next iteration.
The update function simply checks whether the function takes a value above or below the target at the midpoint and updates the relevant endpoint, or bails out in the same fashion as above if it is within the required threshold of the target or is NaN.
Note that these functions can all be found in BisectInverse.js.

Program 1 demonstrates how the bisection method approximation of the inverse of the exponential function compares to the values at which it is evaluated

Program 1: ak.bisectInverse

Now I think that this is a pretty damned convincing demonstration of the accuracy of the bisection method, but if you are unconvinced then please go ahead and try it out on some other functions!
That said, it is by no means the most efficient algorithm for numerically approximating the inverse of a function and in the next post we shall look to improving performance...

References

[1] Characteristically Tricksy, www.thusspakeak.com, 2014.

[2] Define Normal, www.thusspakeak.com, 2014.

[3] Climbing Mount Impractical, www.thusspakeak.com, 2013.

Leave a comment