You’re Going To Have To Think! Why Automatic Differentiation Won’t Cure Your Calculus Blues

| 0 Comments

At the beginning of the second half of this series of posts[1] we briefly covered the history of the differential calculus and described the incredibly useful Taylor's theorem, which states that for a function \(f\)
\[ f(x+\delta) = f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + R_n\\ \min\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \leqslant R_n \leqslant \max\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \; \mathrm{for} \; 0 \leqslant \theta \leqslant 1 \]
where the exclamation mark stands for the factorial, \(f'(x)\) the first derivative of \(f\) at \(x\), \(f''(x)\) the second and \(f^{(n)}(x)\) the \(n^\mathrm{th}\) and where, by convention, the \(0^\mathrm{th}\) derivative is the function itself.

We have so far used it to perform a full error analysis of finite difference algorithms[2] and to automate the calculation of their terms with polynomial curve fitting[3].
We went on to describe the far more accurate polynomial approximation of Ridders' algorithm[4] which treated the symmetric finite difference as a function of the difference \(\delta\).
In the previous post[5] we used expression objects to represent functions as expression trees, from which it was possible to compute expression representing their derivative by applying the relatively simple algebraic rules of differentiation.
Noting that there was little we could do to control cancellation error in the unknown expression for the derivative, assuming we hadn't the appetite to implement a full blown computer algebra system that is, we concluded by combining this with interval arithmetic[6] to automatically keep track of such numerical errors.
Having claimed that short of just such a system our algorithm was as accurate as it was possible to get, you may be a little surprised that there is anything left to discuss.
However, as I also noted, expression objects are fairly memory intensive due to the need to maintain the expression tree and I should ideally prefer something more lightweight.
The question stands as to whether such an algorithm exists.

Given that I'm still writing, it should be more or less self-evident that it does.
And, in another testament to its tremendous power, it's based entirely upon Taylor's theorem.

Surreal Numbers

Like Robinson's non-standard numbers[7], Conway's surreal numbers[8] extend the reals in a way that admits a rigorous definition of infinitesimals.
So what's to stop us from actually using infinitesimals in our calculations?
Nothing. That's what!

Much as we extend the reals to the complex numbers by representing numbers with both a real and an imaginary part, we can extend the reals to the surreals by representing numbers with both a real and an infinitesimal part.
\[ w = a + b \delta \]
Note that we don't actually need to define \(\delta\) so long as it stands for the same infinitesimal throughout our calculations.

Negation, addition and subtraction are trivial operations in which we perform the same action on both parts of the number
\[ \begin{align*} w_1 &= a_1 + b_1 \delta\\ w_2 &= a_2 + b_2 \delta\\ \\ -w &= -a -b\delta\\ w_1 + w_2 &= (a_1+a_2) + (b_1+b_2) \delta\\ w_1 - w_2 &= (a_1-a_2) + (b_1-b_2) \delta \end{align*} \]
Multiplication proceeds much as it does for complex numbers, although we shall discard the term involving the square of \(\delta\).
\[ \begin{align*} w_1 \times w_2 &= (a_1+b_1\delta) + (a_2+b_2\delta)\\ &= a_1 \times a_2 + a_1 \times b_2 \delta + a_2 \times b_1 \delta\\ &= (a_1 \times a_2) + (a_1 \times b_2 + a_2 \times b_1) \delta \end{align*} \]
Division would seem to be a slightly trickier proposition; we're ignoring second and higher order powers of \(\delta\). If the product of two of our surreal numbers has a second order term, surely we'll need it when dividing the product by one of the arguments if we are to have any chance of recovering the other. Or of recovering the correct result when dividing it by any other surreal number, for that matter.
Well, appearances can be deceptive.

Surreal Number Division

The trick is one we're going to exploit time and again when defining arithmetic operations for these numbers; no matter what real number we multiply \(\delta\) by, it's still infinitesimal, which means that we can still use Taylor's theorem!

What we shall therefore do is work out the Taylor series expansion of reciprocation to first order. We can then multiply the numerator by the reciprocal of the denominator. The required series for the reciprocal is given by
\[ \frac{1}{a + b\delta} = \frac{1}{a} + b\delta \times \frac{\mathrm{d}}{\mathrm{d}x} \frac{1}{x} \Bigg|_a \]
where the vertical bar means "evaluated at the subscript".
Hence our rule for reciprocation is
\[ \frac{1}{a + b\delta} = \frac{1}{a} - \frac{b}{a^2}\delta \]
Note that this trick is effectively the chain rule for surreal numbers in that it allows us to calculate the derivative of an expression in which one function is applied to the result of another.
The rule for division is therefore given by
\[ \begin{align*} \frac{w_1}{w_2} &= w_1 \times \frac{1}{w_2}\\ &= (a_1 + b_1\delta) \times \left(\frac{1}{a_2} - \frac{b_2}{a_2^2}\delta\right)\\ &= \frac{a_1}{a_2} + \left(\frac{b_1}{a_2} - \frac{a_1 \times b_2}{a_2^2}\right)\delta \end{align*} \]
Now something about this sidestepping of the need to retain higher order infinitesimals seems a little bit fishy. To be truly comfortable that I haven't done something silly we should probably check that multiplying the ratio of \(w_1\) and \(w_2\) by \(w_2\) yields \(w_1\) as expected.
\[ \begin{align*} \frac{w_1}{w_2} \times w_2 &= \left(\frac{a_1}{a_2} + \left(\frac{b_1}{a_2} - \frac{a_1 \times b_2}{a_2^2}\right)\delta\right) \times (a_2 + b_2\delta)\\ &= \frac{a_1}{a_2} \times a_2 + a_2 \times \left(\frac{b_1}{a_2} - \frac{a_1 \times b_2}{a_2^2}\right)\delta + \frac{a_1}{a_2} \times b_2 \delta\\ &= a_1 + \left(b_1 - \frac{a_1 \times b_2}{a_2}\right) \delta + \frac{a_1 \times b_2}{a_2} \delta\\ &= a_1 + b_1 \delta \end{align*} \]
Well, it's clear that my reservations were undeserved; I really should have had more faith in Taylor's remarkable result.

Whither Algorithm?

Now this consistent extension of real arithmetic is all well and good, but how does it help us to develop a machine precision accurate algorithm for computing derivatives?
Well, the answer, perhaps a little unsurprisingly, lies in Taylor's theorem.

The surreal result of a function is a first order Taylor series expansion about an infinitesimal deviation from the value of that function. By Taylor's theorem, the coefficient by which \(\delta\) is multiplied in the result of a function is therefore equal to its first derivative.

For example consider
\[ \frac{\mathrm{d}}{\mathrm{d}x} \frac{1}{x^2} \Bigg|_2 = -\frac{2}{x^3} \Bigg|_2 = -\frac{2}{8} = -\frac{1}{4} \]
Using surreals we can simply divide 1 by the square of an infinitesimal deviation from 2
\[ \begin{align*} \frac{1}{(2+\delta) \times (2+\delta)} &= \frac{1}{4 + 4\delta}\\ &= \frac{1}{4} - \frac{4}{4^2}\delta\\ &= \frac{1}{4} - \frac{1}{4}\delta\\ \end{align*} \]
Clearly the coefficient of the infinitesimal part is equal to the required derivative.
This use of the Taylor series to compute derivatives is known as automatic differentiation, as the title of this article might have hinted at.

Higher Order Derivatives

One thing I would like to be able to do is compute higher order derivatives and an arithmetic type representing surreal numbers as currently defined won't be much help.
Fortunately we can generalise their definition to facilitate such calculations. Specifically we define a surreal number of order n as
\[ w = a + \sum_{i=1}^n b_i \delta^i \]
where the \(\sum\) represents the sum of the terms to its right for the range of indices below and above it.
To see how this can help us recover higher order derivatives, consider the \(n^\mathrm{th}\) order truncated Taylor series of a function \(f\)
\[ f(x+\delta) = f(x) + \sum_{i=1}^n \tfrac{1}{i!} \delta^i f^{(i)}(x) \]
so if our \(w\) is the result of some function of an \(n^\mathrm{th}\) order surreal equal to \(x+\delta\) (i.e. with all but the first surreal coefficient \(b_i\) equal to zero) we have
\[ \begin{align*} a &= f(x)\\ b_i &= \tfrac{1}{i!} f^{(i)} \end{align*} \]

A Surreal Class

The ak implementation of a surreal type is given in listing 1.

Listing 1: ak.surreal
ak.SURREAL_T = 'ak.surreal';

function Surreal(){}
Surreal.prototype = {TYPE: ak.SURREAL_T, valueOf: function(){return ak.NaN;}};

ak.surreal = function() {
 var s = new Surreal();
 var state = [];
 var arg0  = arguments[0];

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 s.order  = function()  {return state.length-1;};
 s.coeff  = function(i) {return state[Number(i)]+0;};
 s.coeffs = function()  {return state.slice(0);};
 s.deriv  = function(i) {return fact(Number(i))*state[Number(i)];};
 s.derivs = function()  {return derivs(state);};

 s.toString = function() {
  return toString(state, Number.prototype.toString);
 };

 s.toExponential = function(d) {
  return toString(state, Number.prototype.toExponential, d);
 };

 s.toFixed = function(d) {
  return toString(state, Number.prototype.toFixed, d);
 };

 s.toPrecision = function(d) {
  return toString(state, Number.prototype.toPrecision, d);
 };

 return Object.freeze(s);
};

var constructors = {};

Given that this is an arithmetic type, we've defined a TYPE member that we can use to dispatch operations to the appropriate functions via our overloading mechanism[9].
The state variable holds the coefficients of the surreal number, the first representing \(a\) and the other \(i^\mathrm{th}\) representing \(b_i\). As usual, initialisation of the state is handled by dynamic dispatch to methods of a constructors object with accessor methods added afterwards.

ak.surreal Constructors

The first constructor initialises the state with an array of coefficients, as shown in listing 2.

Listing 2: ak.surreal Array Constructor
constructors[ak.ARRAY_T] = function(state, arr) {
 var n = arr.length;
 var i;

 if(n===0) throw new Error('no coefficients in ak.surreal constructor')

 for(i=0;i<n;++i) state[i] = Number(arr[i]);
};

Next up are constructors which specify the order of the surreal, dispatched from that defined in listing 3.

Listing 3: ak.surreal Order Constructor
constructors[ak.NUMBER_T] = function(state, n, args) {
 var arg1 = args[1];

 n = ak.trunc(n);
 if(n<0) throw new Error('negative order in ak.surreal constructor');
 state.length = n+1;

 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1, args);
};

An important point to note is that a 0th order surreal still has one coefficient, that of its real part.
The order constructors that define real and first order infinitesimal coefficients are given in listing 4.

Listing 4: ak.surreal Coefficient Constructor
constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
 var n = state.length;
 var i;

 for(i=0;i<n;++i) state[i] = 0;
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, x, args) {
 var arg2 = args[2];
 constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, x, arg2);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function(state, x) {
 var n = state.length;
 var i;

 state[0] = x;
 for(i=1;i<n;++i) state[i] = 0;
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.NUMBER_T] = function(state, x, d) {
 var n = state.length;
 var i;

 state[0] = x;
 if(n>1) state[1] = d;
 for(i=2;i<n;++i) state[i] = 0;
};

Next we have an order constructor that initialises the coefficients with a function taking their order as an argument, shown in listing 5.

Listing 5: ak.surreal Function Constructor
constructors[ak.NUMBER_T][ak.FUNCTION_T] = function(state, func) {
 var n = state.length;
 var i;

 for(i=0;i<n;++i) state[i] = Number(func(i));
};

Finally, listing 6 defines a constructor that initialises an ak.surreal with another object supporting a similar interface.

Listing 6: ak.surreal Object Constructor
constructors[ak.OBJECT_T] = function(state, obj) {
 var n = obj.order;
 var i;

 n = (ak.nativeType(n)===ak.FUNCTION_T) ? Number(n()) : Number(n);
 if(n<0 || n>ak.INT_MAX-1) throw new Error('invalid order in ak.surreal constructor');

 state.length = n+1;
 for(i=0;i<=n;++i) state[i] = Number(obj.coeff(i));
};

By similar, I mean that it must support an order method or member and a coeff method that are analagous to those defined below for ak.surreal.

ak.surreal Methods

The order, coeff and coeffs methods are relatively simple. The first returns the order of the surreal which, by definition, is one less than the number of coefficients. The second returns the \(i^{th}\) coefficient, with the addition of 0 coercing undefined values that result from reading past the end of the coefficients to NaN, for reasons that will become clear later. The last simply returns the coefficients in an array.
The deriv and derivs methods provide direct access to the values of the derivatives recoverable from the surreal. The former simply multiplies the term in question by the factorial of its order, as returned by fact, and the latter forwards to derivs to transform the terms into derivative, as shown in listing 7.

Listing 7: ak.surreal String Conversion
var factCache = [1];

function fact(i) {
 var n = factCache.length;
 while(n++<=i) factCache[n-1] = (n-1)*factCache[n-2];
 return factCache[i];
}

function derivs(terms) {
 var fac = 1;
 var n = terms.length;
 var i;

 terms = terms.slice(0);

 for(i=0;i<n;++i) {
  terms[i] *= fac;
  fac *= i+1;
 }
 return terms;
}

Note that the \(i^{th}\) element of the factCache array stores \(i!\) and is sequentially populated by fact if it doesn't yet contain the requested factorial, ensuring that each must only be calculated once, reducing the computational cost from \(O(i)\) to \(O(1)\) for repeated calculations, albeit at an increased cost in memory. That I'm happy to accept that increased cost is a consequence of an expectation that derivatives of the same order are likely to be required many times during the course of a program, and that very high order derivatives are unlikely to be needed at all.
In contrast, since the derivs function needs every factorial up to the order of the surreal coefficients, it can afford to calculate each of them in sequence as it iterates through the terms.

The remaining methods provide conversion to strings using standard JavaScript numeric string conversions by forwarding to a toString function given in listing 8.

Listing 8: ak.surreal String Conversion
function toString(state, f, d) {
 var n = state.length;
 var s = [];
 var i;

 s.push(f.call(state[0], d));

 if(n>1) {
  if(state[1]<0) s.push(' - ' + f.call(-state[1], d) + 'd');
  else           s.push(' + ' + f.call( state[1], d) + 'd');
 }

 for(i=2;i<n;++i) {
  if(state[i]<0) s.push(' - ' + f.call(-state[i], d) + 'd^' + i);
  else           s.push(' + ' + f.call( state[i], d) + 'd^' + i);
 }

 return s.join('');
}

Surreal Arithmetic

Now that we have a surreal type all that remains is to work out how to apply arithmetic operations to it. Addition is trivial, as shown in listing 9.

Listing 9: ak.surreal Addition
function add(s0, s1) {
 var n = s0.order();
 var s = s0.coeffs();
 var i;

 if(s1.order()!==n) throw new Error('order mismatch in ak.surreal add');

 for(i=0;i<=n;++i) s[i] += s1.coeff(i);
 return ak.surreal(s);
}

function addSR(s, r) {
 s = s.coeffs();
 s[0] += r;
 return ak.surreal(s);
}

function addRS(r, s) {
 s = s.coeffs();
 s[0] += r;
 return ak.surreal(s);
}

ak.overload(ak.add, [ak.SURREAL_T, ak.SURREAL_T], add);
ak.overload(ak.add, [ak.SURREAL_T, ak.NUMBER_T],  addSR);
ak.overload(ak.add, [ak.NUMBER_T,  ak.SURREAL_T], addRS);

Here we have our first example of the mixed type arithmetic that was central to my argument that my dispatch mechanism for ak arithmetic operations was far and away the best design choice available. Given that the surreals are an extension of the reals, in the sense that surreal numbers with zeros for all their infinitesimal coefficients are arithmetically indistinguishable from real numbers, it makes sense mathematically to equate the latter with the former so as to formalise surreal/real arithmetic. Hence we allow mixed arithmetic with JavaScript numbers and ak.surreal values under the assumption that the former represent reals.
Note that it does not make sense to allow mixed arithmetic with surreals of different orders since this will almost always be a programming error, hence the exception thrown if ak.surreal values of different orders are added.

Subtraction is no less trivial, as illustrated in listing 10.

Listing 10: ak.surreal Subtraction
function sub(s0, s1) {
 var n = s0.order();
 var s = s0.coeffs();
 var i;

 if(s1.order()!==n) throw new Error('order mismatch in ak.surreal sub');

 for(i=0;i<=n;++i) s[i] -= s1.coeff(i);
 return ak.surreal(s);
}

function subSR(s, r) {
 s = s.coeffs();
 s[0] -= r;
 return ak.surreal(s);
}

function subRS(r, s) {
 var n = s.order();
 var i;

 s = s.coeffs();
 s[0] = r - s[0];
 for(i=1;i<=n;++i) s[i] = -s[i];

 return ak.surreal(s);
}

ak.overload(ak.sub, [ak.SURREAL_T, ak.SURREAL_T], sub);
ak.overload(ak.sub, [ak.SURREAL_T, ak.NUMBER_T],  subSR);
ak.overload(ak.sub, [ak.NUMBER_T,  ak.SURREAL_T], subRS);

Next up is multiplication which, with reals at least, is straightforward, as shown in listing 11.

Listing 11: ak.surreal Multiplication
function mulSR(s, r) {
 var n = s.order();
 var i;

 s = s.coeffs();
 for(i=0;i<=n;++i) s[i] *= r;
 return ak.surreal(s);
}

function mulRS(r, s) {
 var n = s.order();
 var i;

 s = s.coeffs();
 for(i=0;i<=n;++i) s[i] *= r;
 return ak.surreal(s);
}

ak.overload(ak.mul, [ak.SURREAL_T, ak.NUMBER_T],  mulSR);
ak.overload(ak.mul, [ak.NUMBER_T,  ak.SURREAL_T], mulRS);

Multiplying a pair of surreals is a slightly more fiddly prospect since there will generally be multiple terms on the left and right hand sides which when multiplied yield an infinitesimal of a given order. Essentially what we need to do is multiply two polynomials in \(\delta\). The process for polynomial multiplication is known as a discrete convolution and takes the form
\[ \sum_{i=0}^n a_i x^i \times \sum_{i=0}^n b_i x^i = \sum_{i=0}^n \sum_{j=0}^i \left(a_j \times b_{i-j}\right) x^i \]
This is more or less how we approached bignum multiplication[10], albeit without the carries.
As noted then there are far more efficient ways to perform this calculation than naively implementing the above formula, but I'm going to keep things simple. We shall at least arrange to apply the process in-place by iterating backwards over the coefficients of the result, as shown in listing 12.

Listing 12: ak.surreal Multiplication
function mul(s0, s1) {
 var n = s0.order();
 var s = s0.coeffs();
 var i, j;

 if(s1.order()!==n) throw new Error('order mismatch in ak.surreal mul');

 for(i=0;i<=n;++i)
 {
  s[n-i] *= s1.coeff(0);

  for(j=1;j<=n-i;++j)
  {
   s[n-i] += s[n-i-j] * s1.coeff(j);
  }
 }
 return ak.surreal(s);
}

ak.overload(ak.mul, [ak.SURREAL_T, ak.SURREAL_T], mul);

Note that we can only get away with this because we are discarding all terms above \(n^\mathrm{th}\) order.

Division of a surreal by a real is another coefficient by coefficient affair, as illustrated by listing 13.

Listing 13: ak.surreal Division
function divSR(s, r) {
 var n = s.order();
 var i;

 s = s.coeffs();
 for(i=0;i<=n;++i) s[i] /= r;
 return ak.surreal(s);
}

ak.overload(ak.div, [ak.SURREAL_T, ak.NUMBER_T],  divSR);

Division by surreals is distinctly more troublesome than the operations defined thus far. We shall implement it by defining a reciprocal function
\[ \mathrm{inv}(w) = \frac{1}{w} \]
and multiplying the left hand side of the division by the reciprocal of the right hand side
\[ \frac{w_0}{w_1} = w_0 \times \mathrm{inv}(w_1) \]
Now it may not be immediately obvious how we can propagate all of the powers of \(\delta\) through a mathematical function like \(\mathrm{inv}\), but it turns out that there's a relatively simple trick; \(\sum_{i=1}^n b_i \delta^i\) is itself an infinitesimal! This duly noted, we have
\[ f\left(a + \sum_{i=1}^n b_i \delta^i\right) = f(a) + \left(\sum_{i=1}^n b_i \delta^i\right) f'(a) + \frac{1}{2}\left(\sum_{i=1}^n b_i \delta^i\right)^2 f''(a) + \dots + \frac{1}{n!}\left(\sum_{i=1}^n b_i \delta^i\right)^n f^{(n)}(a) \]
which is essentially the chain rule expressed with surreal numbers. Evaluating such powers of sums of the infinitesimal terms is a fiddly job, but by no means insurmountable. So as long as we can work out the exact form of the derivatives of a function we can relatively simply generalise it to \(n^\mathrm{th}\) order surreals.
For the \(\mathrm{inv}\) function this equates to
\[ \mathrm{inv}\left(a + \sum_{i=1}^n b_i \delta^i\right) = \frac{1}{a} - \left(\sum_{i=1}^n b_i \delta^i\right) \frac{1}{a^2} + \left(\sum_{i=1}^n b_i \delta^i\right)^2 \frac{1}{a^3} + \dots + (-1)^n \left(\sum_{i=1}^n b_i \delta^i\right)^n \frac{1}{a^{n+1}} \]
To compute the result, we shall fill an array with the real and infinitesimal coefficients of the function and pass it along with the surreal argument to a function that applies this for of the chain rule, given in listing 14.

Listing 14: Applying The Chain Rule
function chain(c, s) {
 var di = s.coeffs();
 var cs = [];
 var n  = di.length;
 var i, j;

 cs.length = n;
 cs[0] = c[0];

 for(j=1;j<n;++j) cs[j] = c[1] * di[j];

 for(i=2;i<n;++i) {
  iPow(s, i, di);
  for(j=i;j<n;++j) cs[j] += c[i] * di[j];
 }
 return ak.surreal(cs);
}

The argument c holds the function's coefficients and s the surreal argument. The variable di holds the \(i^\mathrm{th}\) power of the sum of the infinitesimal terms of the argument and is updated at each step by the iPow function as given in listing 15.

Listing 15: Updating The Infinitesimal Terms
function iPow(s, i, di) {
 var n = s.order();
 var j, k;

 for(j=0;j<=n-i;++j) {
  di[n-j] = 0;
  for(k=1;k<=n+1-(i+j);++k) di[n-j] += di[n-(j+k)] * s.coeff(k);
 }
}

Now this is a fairly terse bit of code so it might not be immediately obvious exactly what it's doing. The important things to note are that in the \(i^\mathrm{th}\) power of the infinitesimal terms there can be no terms of lower order than \(\delta^i\) and that we are not interested in any that are of higher order than \(\delta^n\).
By working backwards from the \(n^\mathrm{th}\) to the \(i^\mathrm{th}\) terms we can compute the result in place since each term must be multiplied by \(\delta\) at least once.
Finally, to convince yourself that the inner loop is correct simply substitute the lower and upper limits of k in the body

di[n-j] += di[(n-j) - 1)] * s.coeff(1);
di[n-j] += di[i-1] * s.coeff((n-j) - (i-1));

This should make it clear that the left and right hand sides of the sums are both of order \(\delta^{n-j}\).
Finally, substituting the lower and upper limits of j in these expression shows that we are not using any invalid indices

di[n] += di[n-1)] * s.coeff(1);
di[n] += di[i-1] * s.coeff(n - (i-1));

di[i] += di[i-1] * s.coeff(1);
di[i] += di[i-1] * s.coeff(1);

The two expressions are identical for the upper limit of j because k can only equal one.
Hopefully that's made everything clear because I can't think of any way of making it clearer!

Implementing \(\mathrm{inv}\) in terms of chain is relatively straightforward as illustrated by listing 16.

Listing 16: ak.surreal Reciprocal
function inv(s) {
 var fx = 1/s.coeff(0);
 var fi = fx;
 var c  = [];
 var n  = s.order()+1;
 var i;

 c.length = n;
 c[0] = fx;

 for(i=1;i<n;++i) {
  fi *= -fx;
  c[i] = fi;
 }
 return chain(c, s);
}

So finally we can implement division by surreals, as shown in listing 17.

Listing 17: ak.surreal Division
function div(s0, s1) {
 return mul(s0, inv(s1));
}

function divRS(r, s) {
 return mulRS(r, inv(s));
}

ak.overload(ak.div, [ak.SURREAL_T, ak.SURREAL_T], div);
ak.overload(ak.div, [ak.NUMBER_T,  ak.SURREAL_T], divRS);

General Arithmetic Operations

Reciprocation shines a light on how we shall go about implementing other operations. As examples we shall use
\[ e^w\\ \ln w \]
I shan't bother with the powers of surreals since, as we have already seen[5], they can be implemented in terms of these operations.

The first step is, of course, to work out the Taylor series expansions of these functions. Fortunately this isn't very difficult; the Taylor series of the exponential is given by
\[ \begin{align*} e^{a+\delta} &= e^a + e^a \times \delta + \frac{1}{2} \times e^a \times \delta^2 + \frac{1}{6} \times e^a \times \delta^3 + \dots\\ &= e^a \times \sum_{i \geqslant 0} \frac{1}{i!} \times \delta^i \end{align*} \]
and that of the logarithm by
\[ \begin{align*} \ln(a+\delta) &= \ln a + \frac{1}{a} \times \delta - \frac{1}{2} \times \frac{1}{a^2} \times \delta^2 + \frac{1}{6} \times \frac{2}{a^3} \times \delta^3 + \dots\\ &= \ln a - \sum_{i \geqslant 1} \frac{(-1)^i}{i \times a^i} \times \delta^i \end{align*} \]
With these formulae to hand, implementing the exponential and logarithmic functions is but a simple matter of programming, as shown in listing 18.

Listing 18: ak.surreal Exponent And Logarithm
function exp(s) {
 var fx = Math.exp(s.coeff(0));
 var c  = [];
 var n  = s.order()+1;
 var f  = 1;
 var i;

 c.length = n;
 c[0] = fx;

 for(i=1;i<n;++i) {
  f *= i;
  c[i] = fx / f;
 }
 return chain(c, s);
}

function log(s) {
 var x  = s.coeff(0);
 var fx = 1/x;
 var fi = -1;
 var c  = [];
 var n  = s.order()+1;
 var i;

 c.length = n;
 c[0] = Math.log(x);

 for(i=1;i<n;++i) {
  fi *= -fx;
  c[i] = fi / i;
 }
 return chain(c, s);
}

ak.overload(ak.exp, ak.SURREAL_T, exp);
ak.overload(ak.log, ak.SURREAL_T, log);

Unfortunately not all arithmetic operations have such simple Taylor series expansions. For example, consider the inverse of the cosine, \(\arccos(x)\). Its first derivative is
\[ \frac{\mathrm{d}}{\mathrm{d}x} \arccos(x) = - \frac{1}{\sqrt{1-x^2}} \]
If you were to differentiate this some several times in pursuit of a pattern for the Taylor series coefficients, and I wouldn't recommend that you try because it's an immensely tedious exercise, you'd be out of luck. Well, that is to say I couldn't spot one so I gave up and cheated; I used the symbolicDerivative class[5] to generate them for me, as shown in listing 19.

Listing 19: ak.surreal Inverse Cosine
var symbolicDerivsCache = {};

function symbolicCoeffs(f, s) {
 var coeffs = [];
 var cache = symbolicDerivsCache[f]
 var fac = 1;
 var n = s.order()+1;
 var z, dfz, i;

 if(symbolicDerivsCache[f]) {
  cache = symbolicDerivsCache[f];
  z = cache.z;
  dfz = cache.dfz;
 }
 else {
  z  = ak.varExpr();
  dfz = [ak[f](z)];
  cache = {z: z, dfz: dfz};
  symbolicDerivsCache[f] = cache;
 }

 z.value(ak.approxExpr(s.coeff(0)));

 coeffs.length = n;
 coeffs[0] = dfz[0].approx();

 for(i=1;i<n;++i) {
  if(!dfz[i]) dfz[i] = ak.symbolicDerivative(dfz[i-1], z);
  fac *= i;

  coeffs[i] = dfz[i].approx() / fac;
 }
 return coeffs;
}

function acos(s) {
 return chain(symbolicCoeffs('acos', s), s);
}

ak.overload(ak.acos, ak.SURREAL_T, acos);

The bulk of the symbolicCoeffs function involves maintaining a cache of ak expression objects representing derivatives of increasing orders of ak arithmetic operations, much as we did for factorials with fact and for much the same reason. Fortunately for the sake of memory usage there aren't that many functions for which we shall need to rely upon symbolic differentiation for Taylor series coefficients, as you will see if you take a look at Surreal.js where you will also find the definitions of the rest of the ak.surreal arithmetic operations, enumerated in listing 20.

Listing 20: ak.surreal Overloads
ak.overload(ak.abs,  ak.SURREAL_T, abs);
ak.overload(ak.acos, ak.SURREAL_T, acos);
ak.overload(ak.asin, ak.SURREAL_T, asin);
ak.overload(ak.atan, ak.SURREAL_T, atan);
ak.overload(ak.cos,  ak.SURREAL_T, cos);
ak.overload(ak.cosh, ak.SURREAL_T, cosh);
ak.overload(ak.exp,  ak.SURREAL_T, exp);
ak.overload(ak.inv,  ak.SURREAL_T, inv);
ak.overload(ak.log,  ak.SURREAL_T, log);
ak.overload(ak.neg,  ak.SURREAL_T, neg);
ak.overload(ak.sin,  ak.SURREAL_T, sin);
ak.overload(ak.sinh, ak.SURREAL_T, sinh);
ak.overload(ak.sqrt, ak.SURREAL_T, sqrt);
ak.overload(ak.tan,  ak.SURREAL_T, tan);
ak.overload(ak.tanh, ak.SURREAL_T, tanh);

ak.overload(ak.add,  [ak.SURREAL_T, ak.SURREAL_T], add);
ak.overload(ak.add,  [ak.SURREAL_T, ak.NUMBER_T],  addSR);
ak.overload(ak.add,  [ak.NUMBER_T,  ak.SURREAL_T], addRS);
ak.overload(ak.cmp,  [ak.SURREAL_T, ak.SURREAL_T], cmp);
ak.overload(ak.cmp,  [ak.SURREAL_T, ak.NUMBER_T],  cmpSR);
ak.overload(ak.cmp,  [ak.NUMBER_T,  ak.SURREAL_T], cmpRS);
ak.overload(ak.dist, [ak.SURREAL_T, ak.SURREAL_T], dist);
ak.overload(ak.dist, [ak.SURREAL_T, ak.NUMBER_T],  distSR);
ak.overload(ak.dist, [ak.NUMBER_T,  ak.SURREAL_T], distRS);
ak.overload(ak.div,  [ak.SURREAL_T, ak.SURREAL_T], div);
ak.overload(ak.div,  [ak.SURREAL_T, ak.NUMBER_T],  divSR);
ak.overload(ak.div,  [ak.NUMBER_T,  ak.SURREAL_T], divRS);
ak.overload(ak.eq,   [ak.SURREAL_T, ak.SURREAL_T], eq);
ak.overload(ak.eq,   [ak.SURREAL_T, ak.NUMBER_T],  eqSR);
ak.overload(ak.eq,   [ak.NUMBER_T,  ak.SURREAL_T], eqRS);
ak.overload(ak.ge,   [ak.SURREAL_T, ak.SURREAL_T], ge);
ak.overload(ak.ge,   [ak.SURREAL_T, ak.NUMBER_T],  geSR);
ak.overload(ak.ge,   [ak.NUMBER_T,  ak.SURREAL_T], geRS);
ak.overload(ak.gt,   [ak.SURREAL_T, ak.SURREAL_T], gt);
ak.overload(ak.gt,   [ak.SURREAL_T, ak.NUMBER_T],  gtSR);
ak.overload(ak.gt,   [ak.NUMBER_T,  ak.SURREAL_T], gtRS);
ak.overload(ak.le,   [ak.SURREAL_T, ak.SURREAL_T], le);
ak.overload(ak.le,   [ak.SURREAL_T, ak.NUMBER_T],  leSR);
ak.overload(ak.le,   [ak.NUMBER_T,  ak.SURREAL_T], leRS);
ak.overload(ak.lt,   [ak.SURREAL_T, ak.SURREAL_T], lt);
ak.overload(ak.lt,   [ak.SURREAL_T, ak.NUMBER_T],  ltSR);
ak.overload(ak.lt,   [ak.NUMBER_T,  ak.SURREAL_T], ltRS);
ak.overload(ak.mul,  [ak.SURREAL_T, ak.SURREAL_T], mul);
ak.overload(ak.mul,  [ak.SURREAL_T, ak.NUMBER_T],  mulSR);
ak.overload(ak.mul,  [ak.NUMBER_T,  ak.SURREAL_T], mulRS);
ak.overload(ak.ne,   [ak.SURREAL_T, ak.SURREAL_T], ne);
ak.overload(ak.ne,   [ak.SURREAL_T, ak.NUMBER_T],  neSR);
ak.overload(ak.ne,   [ak.NUMBER_T,  ak.SURREAL_T], neRS);
ak.overload(ak.pow,  [ak.SURREAL_T, ak.SURREAL_T], pow);
ak.overload(ak.pow,  [ak.SURREAL_T, ak.NUMBER_T],  powSR);
ak.overload(ak.pow,  [ak.NUMBER_T,  ak.SURREAL_T], powRS);
ak.overload(ak.sub,  [ak.SURREAL_T, ak.SURREAL_T], sub);
ak.overload(ak.sub,  [ak.SURREAL_T, ak.NUMBER_T],  subSR);
ak.overload(ak.sub,  [ak.NUMBER_T,  ak.SURREAL_T], subRS);

Surreal Derivative Accuracy

So just how accurate are derivatives calculated with surreals?
To demonstrate, program 1 plots the difference between the surreal and symbolic first and second derivatives of an arbitrary function as red and blue lines respectively.

Program 1: Surreal Derivative Accuracy

That you're seeing a purple line at zero is because surreal differentiation is no less accurate than symbolic differentiation, although you might want to edit the function f to convince yourself that this example wasn't a fluke.

The Holy Grail?

So we finally have an algorithm for computer differentiation that is as accurate as possible without a computer algebra system and parsimonious with memory to boot.
I must admit that I have something of a soft spot for this approach having independently discovered it a few years ago. Even though I subsequently discovered that I wasn't the first, I still find it to be more satisfying than symbolic differentiation and it is my weapon of choice for difficult, by which I mean extremely lengthy, differentiation calculations.

Cancellation Error

Unfortunately automatic differentiation suffers from cancellation error in the derivative just as much as symbolic differentiation does. Indeed, it is doubly cursed since, unlike symbolic differentiation, we couldn't automatically manipulate the form of the calculation to try to reduce its effect even if we wanted to, although we could at least again use interval arithmetic to monitor its effect.

Numerical Approximation

As with symbolic differentiation we must still take care when combining automatic differentiation with numerical approximation; the derivative of an approximation of a function may not be a good approximation of the derivative of that function.
The solution is more or less the same as that we used for expression objects; we must simply compute the derivatives explicitly and use them to calculate the coefficients of the function's Taylor series. As with expression objects we can do this either using a general purpose algorithm such as the polynomial approximation, or with a specific approximation of the derivative of the function, or, if we're very fortunate, with the actual formula for the derivative.

Clearly this is no better or no worse than taking the same approach with expression objects, but somehow it seems a more natural union to me. With automatic differentiation we are not trying to maintain an exact representation of the mathematical expression so resorting to approximation doesn't seem quite so jarring.
Although, as I have already admitted, I am a little biased.

L'Hopital's Rule

A neat application of ak.surreal is to perform divisions that appear, on the face of it, impossible. For example, consider
\[ f(x) = \frac{\sin x}{x} \]
What is its value when \(x\) equals zero? If we were to compute it with floating point arithmetic we'd get a result of NaN, but it should in fact be one.
This rather surprising result can be demonstrated using, you guessed it, Taylor’s theorem!
The Taylor series expansions of the numerator and denominator are
\[ \begin{align*} \sin (x+\delta) &= \sin x + \delta \times \cos x - \delta^2 \times \sin x - \delta^3 \times \cos x + \dots\\ x + \delta &= x + \delta \end{align*} \]
Setting \(x\) to zero yields
\[ \begin{align*} f(0 + \delta) &= \frac{\sin 0 + \delta \times \cos 0 - \delta^2 \times \sin 0 - \delta^3 \times \cos 0 + \dots}{0 + \delta}\\ &= \frac{\delta - \delta^3 + \dots}{\delta}\\ &= 1 - \delta^2 + \dots \end{align*} \]
This use of derivatives to evaluate ratios of functions that equate to zero is known as L'Hopital's rule. Note that if the first order terms of the numerator and denominator were also zero we could continue on to the second order terms in the same fashion.
Since our surreal type keeps track of the coefficients of the Taylor series we can easily extend division to exploit L'Hopital's rule, allowing us to correctly evaluate such ratios. By way of an example, consider the ratio of two fourth order surreals whose first three coefficients are zero
\[ \frac{a_3 \delta^3 + a_4 \delta^4}{b_3 \delta^3 + b_4 \delta^4} \]
Now we can simply divide the top and bottom of the fraction by \(\delta^3\) to get a well-defined result
\[ \frac{a_3 + a_4 \delta}{b_3 + b_4 \delta} \]
We can do this by constructing a new pair of surreals whose non-zero terms are shifted to the left, as demonstrated by listing 21.

Listing 21: ak.surrealDiv
ak.surrealDiv = function(s0, s1, d) {
 var a0, a1, i0, i1, n;

 if(ak.nativeType(d)===ak.UNDEFINED_T) d = 0;
 d = Math.abs(d);

 if(ak.type(s0)!==ak.SURREAL_T) {
  throw new Error('non-surreal dividend in ak.surrealDiv');
 }

 if(ak.type(s1)!==ak.SURREAL_T) {
  throw new Error('non-surreal divisor in ak.surrealDiv');
 }

 if(ak.nativeType(d)!==ak.NUMBER_T) {
  throw new Error('non-numeric delta in ak.surrealDiv');
 }

 if(s1.order()!==s0.order()) {
  throw new Error('order mismatch in ak.surrealDiv');
 }

 n  = s0.order();
 i0 = 0;
 i1 = 0;

 while(i1<=n && Math.abs(s0.coeff(i1))<=d && Math.abs(s1.coeff(i1))<=d) ++i1;

 if(i1===0) return div(s0, s1);

 a0 = new Array(n+1);
 a1 = new Array(n+1);

 while(i1<=n) {
  a0[i0] = s0.coeff(i1);
  a1[i0] = s1.coeff(i1);
  ++i0;
  ++i1;
 }
 while(i0<=n) {
  a0[i0] = ak.NaN;
  a1[i0] = ak.NaN;
  ++i0;
 }
 return div(ak.surreal(a0), ak.surreal(a1));
}

The first loop searches for the first index i1 at which one of the surreal arguments has a coefficient with a magnitude greater than the threshold d. The following loops first copy higher order coefficients to lower order indices, the above-mentioned left shift, and then pad empty coefficients with NaNs. The resulting surreals can then simply be passed to ak.div which will return the ratio to the highest order possible, with incalculable coefficients represented by NaNs.
Here, finally, is the reason why I chose to return NaNs rather than throw exceptions when reading past the end of the coefficients; L'Hopital's rule results in the loss of higher order coefficients and rather than change the order of the result and be forced to implement mixed order arithmetic it was far simpler to just return NaNs.
Note that whilst surreal arithmetic is still subject to cancellation error, with a careful choice of d, ak.surrealDiv can at least mitigate against it for division.

This then is my last word on numerical differentiation and I hope I have successfully demonstrated my point that to develop accurate numerical algorithms you will often need to think carefully about the underlying mathematics of the expressions that you're trying to approximate.
Or, in other words, that you're going to have to think!

References

[1] You're Going To Have To Think! Why [Insert Algorithm Here] Won't Cure Your Calculus Blues, www.thusspakeak.com, 2013.

[2] You're Going To Have To Think! Why Finite Differences Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[3] You're Going To Have To Think! Why Polynomial Approximation Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[4] Ridders, C.J.F., Advances in Engineering Software, Vol. 4. No. 2., Elsevier, 1982.

[5] You're Going To Have To Think! Why Computer Algebra Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[6] You're Going To Have To Think! Why Interval Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

[7] Robinson, A. Non-standard Analysis (New Ed.), Princeton University Press, 1996.

[8] Knuth, D. Surreal Numbers; How Two Ex-Students Turned on to Pure Mathematics and Found Total Happiness, Addison Wesley, 1974.

[9] Climbing Mount Impractical, www.thusspakeak.com, 2013.

[10] You're Going To Have To Think! Why Arbitrary Precision Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

Based upon an article I wrote for ACCU's Overload magazine.

Leave a comment