Every Secant Counts

| 0 Comments

In recent posts we have concerned ourselves with numerically approximating inverses of functions, seeking a value \(x\) for which \(f(x)\) equals some desired value \(y\).
We began with the bisection method[1] which, starting from a pair of points \(a\) and \(b\) such that
\[ \begin{align*} f(a) &\leqslant y\\ f(b) &\geqslant y \end{align*} \]
known as a bracket, repeatedly used their midpoint to replace the endpoint at which the function laid upon the same side of \(y\) as it did.
Whilst this was a terrifically stable algorithm it was unfortunately rather slow, so we went on to discuss Newton's method which used the derivative of the function to make a potentially far more accurate estimate of the location of the inverse, albeit at the risk of making a catastrophically worse one; a risk we eliminated by switching back to bisection in the event that it did so.
Unfortunately, it's not always easy to calculate the derivative of a function, especially if we're numerically approximating it[3], so ideally we should seek an algorithm that is not quite so blind to the behaviour of the function as is the bisection method but that does not require the explicit calculation of derivatives.

The Secant Method

We have already covered using forward finite differences[4] to approximate derivatives
\[ f^\prime(x) \approx \frac{f(x+\delta)-f(x)}{\delta} \]
so it doesn't take a great leap of the imagination to suppose that if we don't actually have the derivatives, we could simply use forward finite differences in Newton's method instead; an approach known as the secant method.

Now, given that the step in Newton's method was
\[ \frac{y - f(x)}{f^\prime(x)} \]
it follows that the step for the secant method is
\[ \frac{y - f(x)}{\frac{f(x+\delta)-f(x)}{\delta}} \]
or equivalently
\[ \frac{y - f(x)}{f(x+\delta)-f(x)}\delta \]
Once again, this is easy to demonstrate with the worked example of finding \(x\) such that \(x^2=2\). As we did for Newton's method we shall work to seven significant figures, implying that a choice of \(\delta=0.001\) shouldn't be too unreasonable. The step is consequently given by
\[ \begin{align*} \frac{2 - x^2}{(x+0.001)^2-x^2} \times 0.001 &= \frac{2 - x^2}{2 \times 0.001 \times x + 0.001^2} \times 0.001\\ &= \frac{2 - x^2}{2x + 0.001} \end{align*} \]
yielding an update rule for the approximation of the inverse of
\[ x + \frac{2 - x^2}{2x + 0.001} \to x \]
or, after rearranging the terms, of
\[ \frac{x^2 + 0.001x + 2}{2x + 0.001} \to x \]
Starting at \(x=2\) again we have

\(i\)      \(x\)      \(\frac{x^2 + 0.001x + 2}{2x + 0.001}\)      \(\left(\frac{x^2 + 0.001x + 2}{2x + 0.001}\right)^2\)
1      2.000000      1.500125      2.250375
2      1.500125      1.416701      2.007042
3      1.416701      1.414217      2.000010
4      1.414217      1.414214      2.000001

as you can confirm using our reverse Polish notation calculator.
Now the approximations at each step are less accurate than those of Newton's method, but not so much so that it requires any more steps to converge. This certainly suggests that the secant method is an effective alternative to Newton's method in the event that we cannot easily compute the derivative of \(f\).

Play It Safe Again Sam

Given that the secant method is an approximation of Newton's method it shouldn't come as too much of a surprise that it is no less susceptible to numerical instability. After all, there's no reason to suppose that the approximate gradient is less likely than the actual gradient to set us off in the wrong direction, throw our estimate out to the extremes or fail to significantly improve upon its accuracy.
We should therefore once again safeguard our algorithm against instability by hybridising it with the bisection method, performing a bisection step under exactly the same circumstances as with did for Newton's method. Now since this means that we must maintain a bracket, \([a,b]\), it's rather tempting to use its endpoints to approximate the derivative
\[ f^\prime(x) \approx \frac{f(b)-f(a)}{b-a} \]
freeing us from having to make further function evaluations to calculate them. This is likely to be rather inaccurate when \(a\) and \(b\) aren't very close to each other but, since this is precisely the situation in which we'd expect Newton's method to be rather inaccurate anyway, it probably won't matter too much in practice.
Our implementation will consequently use a step of
\[ \frac{y-f(x)}{\frac{f(b)-f(a)}{b-a}} = \frac{y-f(x)}{f(b)-f(a)}(b-a) \]
which will be the only significant difference between it and our implementation of Newton's method.

ak.secantInverse

Listing 1 illustrates how ak.secantInverse, like ak.bisectInverse and ak.newtonInverse, simply provides a default convergence threshold, should one be required, and returns a function that calls an inverse function with an initial bracket provided by ak.bracketInverse.

Listing 1: ak.secantInverse
ak.secantInverse = function(f, threshold) {
 threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? Math.pow(ak.EPSILON, 0.75)
                                                       : Math.abs(threshold);

 if(isNaN(threshold)) {
  throw new Error('invalid convergence threshold in ak.secantInverse');
 }

 return function(y, hint) {
         return inverse(f, y, ak.bracketInverse(f, y, hint), threshold);
        };
};

The inverse function, provided in listing 2, is as noted above more or less identical to the one we used for ak.newtonInverse apart from the choice of step and the result of f being the value rather than an array containing the value and the derivative.

Listing 2: inverse
function inverse(f, y, x, eps) {
 var fx = [f(x[0]), f(x[1])];
 var i  = Math.abs(fx[0]-y)<Math.abs(fx[1]-y) ? 0 : 1;
 var m, fm, bi, fn;

 if(ak.diff(fx[i], y)<=eps) return x[i];

 if(fx[0]>y) {
  m =  x[0];  x[0] =  x[1];  x[1] = m;
  m = fx[0]; fx[0] = fx[1]; fx[1] = m;
  i = 1-i;
 }

 m  = x[i];
 fm = fx[i];

 while(ak.diff(x[0], x[1])>eps) {
  m += (y-fm)/(fx[1]-fx[0]) * (x[1]-x[0]);
  bi = m<=Math.min(x[0], x[1]) || m>=Math.max(x[0], x[1]);

  if(!bi) {
   fn = f(m);
   bi = Math.abs(fn-y)>Math.abs(fm-y)/2;
   update(x, fx, m, fn, y, eps);
  }

  if(bi && x[0]!==x[1]) bisect(f, x, fx, y, eps);

  i  = y-fx[0]<fx[1]-y ? 0 : 1;
  m  = x[i];
  fm = fx[i];
 }
 return m;
}

We again apply the step to the endpoint of the bracket at which the function is closest to the target y which we keep track of with the index i, bailing out immediately if the relevant endpoint of the initial bracket has a normalised difference from the target within our convergence threshold.
Next, to simplify updating the endpoints of the bracket as the algorithm progresses, we make sure that the first element of the bracket has a function value less than the target and that the second has a function value greater than it.
On entering the body of the algorithm, we update the closest endpoint according to the secant method and, if it falls outside of the bracket, note that we must take a bisection step with boolean variable bi. If not, we take the step but also note that a bisection step should be taken if it doesn't reduce the difference from the target quickly enough before we update the bracket.
Finally, we update the index of the best endpoint of the bracket after taking a bisection step with bisect if we'd determined that one is required.

The update and bisect functions are also pretty much the same as those we implemented for Newton's method, as shown in listing 3.

Listing 3: update And bisect
function update(x, fx, m, fm, y, eps) {
 var i = fm>y ? 1 : 0;

 if(ak.diff(fm, y)>eps) {
  x[i]  = m;
  fx[i] = fm;
 }
 else {
  x[0] = x[1] = isNaN(fm) ? ak.NaN : m;
 }
}

function bisect(f, x, fx, y, eps) {
 var m = x[0]!==x[1] ? x[0]/2 + x[1]/2 : x[0];

 if(m===x[0] || m===x[1]) x[0] = x[1] = (y-fx[0]<fx[1]-y) ? x[0] : x[1];
 else update(x, fx, m, f(m), y, eps);
}

Once again, update terminates the algorithm by setting both endpoints to the next step if it's close enough to the target and bisect is careful to handle rounding issues by adding halfs of the endpoints to calculate the midpoint and terminating in the same manner if it is equal to either endpoint; a situation that will occur if the endpoints are equal to within a single floating point rounding error.

The definition of ak.secantInverse can be found in SecantInverse.js and its use is illustrated by program 1.

Program 1: ak.secantInverse

Again, the speed at which our inverses operate makes it rather difficult to appreciate that the performance of ak.secantInverse falls between ak.bisectInverse and ak.newtonInverse but we can at least see that it it accurately recovers the inverse. You should, of course, try it out on a few more functions to make sure that it works in general.

Now, this is by no means the last word when it comes to numerically approximating the inverses of functions and we shall no doubt return to the subject in a future post.

References

[1] Rooting Around For Answers, www.thusspakeak.com, 2014.

[2] On The Shoulders Of Gradients, www.thusspakeak.com, 2014.

[3] You’re Going To Have To Think! Why Computer Algebra Won’t Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[4] You’re Going To Have To Think! Why Finite Differences Won’t Cure Your Calculus Blues, www.thusspakeak.com, 2014.

Leave a comment