We began this arc of posts with a potted history of the differential calculus; from its origin in the infinitesimals of the 17^{th} century, through its formalisation with Analysis in the 19^{th} and the eventual bringing of rigour to the infinitesimals in the 20^{th}.
We then covered Taylor’s theorem which states that, for a function \(f\)
We went on to use it in a comprehensive analysis of finite difference approximations to the derivative in which we discovered that their accuracy is a balance between approximation error and cancellation error, that it always depends upon the unknown behaviour of higher derivatives of the function and that improving accuracy by increasing the number of terms in the approximation is a rather tedious exercise.
Of these issues, the last rather stands out; from tedious to automated is often but a simple matter of programming. Of course we shall first have to figure out an algorithm, but fortunately we shall be able to do so with relative ease using, you guessed it, Taylor’s theorem.
Next we evaluate \(f\) at \(n\) points in the vicinity of \(x\), say \(y_1\) to \(y_n\), yielding
By way of an example, consider the equations
At each step in this process we transform a system of \(n\) equations of \(n\) unknowns into a system of \(n1\) equations of \(n1\) unknowns and it is therefore supremely well suited to a recursive implementation, as shown in listing 1.
This is a straightforward implementation of the above process with a call to
That we don't bother actually setting it to zero is simply because subsequent steps won't depend upon it.
The first thing we need to do before we can use this to compute the derivative of a function is to decide what values we should choose for each \(y_i\) when setting up the system of equations.
Assuming that we have some value \(e\) that is of the correct order of magnitude for the finite differences we shall use
For an \(n^\mathrm{th}\) order approximation, I have chosen
Program 1 plots the base 10 logarithm of the absolute error in the polynomial derivative of the exponential function at zero, or in other words the number of accurate decimal places, against the order of the approximation as a black line and that of the expected order of the error \(\epsilon^\frac{n}{n+1}\) as a red line.
This certainly seems to be a step in the right direction; as we increase the order of our polynomials the number of accurate digits grow more or less as expected. We should consequently expect that by increasing the order of the polynomials still further we should bring the error arbitrarily close to \(\epsilon\), as demonstrated by program 2.
Something has gone horribly wrong! As we expend ever increasing effort in improving the approximation, the errors are getting worse.
The reason for this disastrous result should be apparent if you spend a little time studying our simultaneous equation solver. We are performing large numbers of subtractions and divisions; if this doesn’t scream cancellation error at you then you haven’t been paying attention!
We can demonstrate the problem quite neatly if we use our interval arithmetic class^{[1]} to keep track of floating point errors. You will recall with interval arithmetic we keep track of an upper and lower bound on a floating point calculation by choosing the most pessimistic bounds on the result of an operation and then rounding the lower bound down and the upper bound up.
The first thing we need is a function to convert a system of equations with floating point terms into one with
Note that this is a little optimistic as it assumes just a single rounding error in the calculation of each term. Fortunately, for our test case of the exponential function evaluated at zero this isn't so far off the mark.
Next we need a function to solve this new system of equations, which pretty much just involves replacing the inbuilt arithmetic operations in the original solver with
Glueing these functions together we can calculate the error bounds of the polynomial derivative approximation, as shown in listing 6.
Program 3 plots the approximate number of decimal places of agreement between the upper and lower bounds of the interval result of the approximation as a black line and those of the expected order of the error as a red line.
This clearly shows that our higher order polynomial approximations are suffering from massive loss of precision. We can’t plot results above 11^{th} order since the intervals are infinite; these calculations have effectively lost all digits of precision.
Note that the intersection of the two curves provides a reasonable choice of 4 for the polynomial order we should use to calculate this particular derivative.
However, as it stands, our algorithm is not particularly well implemented.
Firstly, if every value in a column in the left hand side is zero, an unlikely but not impossible prospect, we shall end up dividing zero by zero.
Secondly, and more importantly, by eliminating rows in reverse order of the magnitude of \(\delta\), we risk dividing by small values and unnecessarily amplifying rounding errors; a far better scheme is to choose the row with the largest absolute value in the column we are eliminating.
Listing 7 provides a more robust simultaneous equation solver that corrects these weaknesses.
Unfortunately, whilst this certainly improves the general stability of our algorithm, it does not make much difference to its accuracy.
A more effective approach is to try to minimise the number of arithmetic opertions required by the algorithm. We can do this by exploiting the very same observation that led us to the symmetric finite difference^{[2]}; that the difference between a pair of evaluations of the function an equal distance above and below \(x\) depends only on the odd ordered derivative.
Listing 8 shows the implementation of a function to set up a system of equations based on symmetric differences.
Applying the stable solver to this system of equations yields a doubly improved polynomial derivative approximation, as given in listing 9.
Program 4 illustrates the impact upon the number of accurate digits in the results of our improved approximations.
If you can see any breaks in the black line representing the actual error it's because the approximation is exact for those values of \(n\). This happens in all of the browsers I test with and I must confess that it came as something of a suprise to me since my C++ implementation isn't exact anywhere in this range! Could it be that JavaScript is more accurate for floating point arithmetic?
Well, no. They're both using the same standard after all.
I suspect that the reason is that IEEE floating point arithmetic is allowed to promote numbers to the most accurate available hardware representation during a calculation before rounding them down when they are moved from registers to RAM. In an interpreted language like JavaScript the intermediate values in an expression will be moved to RAM immediately, scuppering any chance of improving accuracy in this fashion. I believe that I just got lucky that the rounding actually improves the result in some cases (albeit for different cases in different browsers, presumably because of different choices for the order of execution of subexpressions by different JavaScript engines).
To demonstrate how much of the accuracy we see in this example is pure luck we'll need to compute bounds for the approximation as we did for our original implementation. Listing 10 gives an implementation of the stable solver using interval arithmetic with which we can do so.
The main difference between this and the original stable solver, apart from the use of the
Listing 11 shows the implementation of the bounds calculation for the symmetric polynomial derivative.
Program 5 demonstrates the presence of cancellation error in the symmetric polynomial derivative approximation.
If we choose the point at which the observed cancellation error crosses the expected error, shown as a red line, we should choose a 7^{th} order polynomial, corresponding again to 4 simultaneous equations.
Now these polynomial approximation algorithms, implemented in PolynomialDerivative.js, are certainly an improvement upon finite differences since they automate the annoying process of working out the difference formulae. However, we are still clearly constrained by the trade off between approximation and cancellation error.
This problem is fundamental to the basic approach of these algorithms; computing the coefficients of the Taylor series by solving simultaneous equations is inherently numerically unstable.
To understand why, it is illuminating to recast the problem as one of linear algebra.
We can consequently represent the simultaneous equations of our improved algorithm with a single matrixvector equation if we choose the elements of \(\mathbf{M}\), \(\mathbf{v}\) and \(\mathbf{w}\) such that
Those matrices with the same number of rows and columns, say \(n\), whose elements are equal to one if the column index is equal to the row index and zero otherwise we call identity matrices, denoted by \(\mathbf{I}\). These have the property that for all vectors \(\mathbf{v}\) with \(n\) elements
That this can be a source of errors is best demonstrated by considering the geometric interpretation of linear algebra in which we treat a vector as Cartesian coordinates identifying a point.
For example, in two dimensions we have
If it so happens that \(c \div a\) equals \(d \div b\) then we are in trouble since the two dimensional plane is transformed into a one dimensional line.
For example, if
Such matrices are known as singular matrices and attempting to invert them is analogous to dividing by zero. They are not, however, the cause of our problems. Rather, we are troubled by matrices that are nearly singular.
To understand what this means, consider the two dimensional matrix
The points clearly end up very close to, rather than exactly upon, a straight line.
To a mathematician, who is equipped with the infinitely precise real numbers, this isn’t really a problem. To a computer programmer, who is not, it causes no end of trouble. One dimension of the original data has been compressed to small deviations from the line. These deviations are represented by the least significant digits in the coordinates of the points and much information about the original positions is necessarily rounded off and lost. Any attempt to reverse the process cannot recover the original positions of the points with very much accuracy; cancellation error is the inevitable consequence.
That our approximations should lead to such matrices becomes clear when you consider the right hand side of the equation. We choose \(\delta\) as small as possible in order to minimise approximation error with the result that the function is evaluated at a set of relatively close points. These evaluations will generally lie close to a straight line since the behaviour of the function will largely be governed by the first two terms in its Taylor series expansion.
Unfortunately, the approach we have used to mitigate cancellation error is horribly inefficient since we must solve a new system of simultaneous equations each time we increase the order of the approximating polynomials.
The question is whether there is a better way?
Specifically, if we define
Ridders' algorithm^{[3]} avoids this problem by evaluating the polynomial at zero without explicitly computing its coefficients. That this is even possible might seem a little unlikely, but nevertheless there is more than one way to do so.
Given a pair of points \((x_0,y_0)\) and \((x_1,y_1)\), the line that passes through them consists of the points \((x,y)\) where
Given a set of \(n\) points \((x_i,y_i)\) and a value \(x\) at which we wish to evaluate the polynomial that passes through them, we define the formulae
If we arrange the formulae in a triangular tableau, we can see that as the calculation progresses each value is derived from the pair above and below to its left.
Further noting that we actually only really need the lower diagonal values when we do so allows us to implement Neville's algorithm with minimal memory usage, as shown in listing 12 and implemented in NevilleInterpolate.js.
Here
A call to the
It might not be immediately obvious that this function correctly applies Neville's algorithm due to its rather terse implementation. I must confess that once I realised that I didn’t need to keep the entire tableau in memory I couldn’t resist rewriting it a few times to improve its efficiency still further.
The key to understanding its operation is the fact that the \(x\) values are iterated over in reverse order.
The tableau representation of the algorithm is also helpful in demonstrating why Neville's algorithm works, as shown in derivation 1.
As the order of the polynomial increases we expect the result to get progressively closer to the correct value until the point at which cancellation error takes over. The absolute differences between successive approximations should consequently form a more or less decreasing sequence up to that point.
The algorithm therefore begins with a relatively large \(\delta\) and, shrinking it at each step, computes the symmetric finite difference and refines the polynomial approximation for \(\delta\) equal to 0. The value with the smallest absolute difference from that of the previous step is used as the approximation of the derivative with the algorithm terminating when the step starts to grow.
Now there is almost certainly going to be some numerical noise in the value of the polynomial at each step, so rather than stop as soon as the difference increases we shall terminate when the difference is twice the smallest found so far.
Note that the smallest absolute difference provides a rough estimate of the error in the approximation.
The rate at which we shrink \(\delta\) and the termination criterion are based upon the implementation of this algorithm given in Numerical Recipes in C. There are a couple of important differences however.
Firstly, they use the penultimate value in the final row of the Neville's algorithm tableau as a second approximation to the derivative, having the same polynomial order as the previous iteration's approximation but with a smaller \(\delta\). This is used to improve the termination criterion for the algorithm.
Secondly, we exploit the fact that the symmetric finite difference doesn’t depend upon the sign of \(\delta\). At each step we can refine the polynomial twice with the same symmetric difference; once with the positive \(\delta\), once with the negative. This doubles the order of the approximating polynomial and forces it to be symmetric about 0 with no additional evaluations of the function but may lead to increased cost in the application of Neville's algorithm. This is a reasonable tradeoff if the cost of the function evaluations is the primary concern.
Listing 13 shows the implementation of Ridders' algorithm as defined in RidderDerivative.js.
Here the
Note that
Program 7 shows the actual (black line) and estimated (red line) decimal digits of accuracy in the approximate derivative of the exponential function using our implementation of Ridders' algorithm.
This represents a vast improvement in both accuracy and error estimation over every algorithm we have studied thus far. The relative error in these results is on average roughly 2E15, just one decimal order of magnitude worse than the theoretical minimum.
But that's one decimal order of magnitude of improvement just waiting to be improved...
[2] You're Going To Have To Think! Why Finite Differences Won’t Cure Your Calculus Blues, www.thusspakeak.com, 2013.
[3] Ridders, C.J.F., Advances in Engineering Software, Vol. 4. No. 2., Elsevier, 1982.
[4] Press, W.H. et al, Numerical Recipes in C (2nd ed.), Cambridge University Press, 1992.
We then covered Taylor’s theorem which states that, for a function \(f\)
\[
f(x+\delta) = f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + R_n\\
\min\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \leqslant R_n \leqslant \max\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \; \mathrm{for} \; 0 \leqslant \theta \leqslant 1
\]
where \(f'(x)\) stands for the first derivative of \(f\) at \(x\), \(f''(x)\) for the second and \(f^{(n)}(x)\) for the \(n^{th}\) with the convention that the \(0^\mathrm{th}\) derivative of a function is the function itself and the exclamation mark represents the factorial.We went on to use it in a comprehensive analysis of finite difference approximations to the derivative in which we discovered that their accuracy is a balance between approximation error and cancellation error, that it always depends upon the unknown behaviour of higher derivatives of the function and that improving accuracy by increasing the number of terms in the approximation is a rather tedious exercise.
Of these issues, the last rather stands out; from tedious to automated is often but a simple matter of programming. Of course we shall first have to figure out an algorithm, but fortunately we shall be able to do so with relative ease using, you guessed it, Taylor’s theorem.
Finding The Truncated Taylor Series
The first step is to truncate the series to its first \(n\) terms and make the simple substitution of \(y = x+\delta\), giving
\[
f(y) \approx f(x) + (yx) \times f'(x) + \tfrac{1}{2} (yx)^2 \times f''(x) + \dots + \tfrac{1}{n!} (yx)^n \times f^{(n)}(x)
\]
or
\[
(yx) \times f'(x) + \tfrac{1}{2} (yx)^2 \times f''(x) + \dots + \tfrac{1}{n!} (yx)^n \times f^{(n)}(x) = f(y)  f(x)
\]
to \(n^{\mathrm{th}}\) order in \(yx\).Next we evaluate \(f\) at \(n\) points in the vicinity of \(x\), say \(y_1\) to \(y_n\), yielding
\[
\begin{align*}
(y_1x) \times f'(x) + \tfrac{1}{2} (y_1x)^2 \times f''(x) + \dots + \tfrac{1}{n!} (y_1x)^n \times f^{(n)}(x) &= f(y_1)  f(x)\\
(y_2x) \times f'(x) + \tfrac{1}{2} (y_2x)^2 \times f''(x) + \dots + \tfrac{1}{n!} (y_2x)^n \times f^{(n)}(x) &= f(y_2)  f(x)\\
&\dots\\
(y_nx) \times f'(x) + \tfrac{1}{2} (y_nx)^2 \times f''(x) + \dots + \tfrac{1}{n!} (y_nx)^n \times f^{(n)}(x) &= f(y_n)  f(x)
\end{align*}
\]
Now the only unknowns in these equations are the derivatives of \(f\) so they are effectively a set of simultaneous linear equations of those derivatives and can be solved using the standard technique of eliminating variables.By way of an example, consider the equations
\[
\begin{array}{ll}
E_1:& 2x + 4y + 3z = 20\\
E_2:& 3x + 2y + 2z = 13\\
E_3:& 4x + 3y + 4z = 21
\end{array}
\]
To recover the value of \(x\), we begin by eliminating \(z\) from the set of equation which we can do with
\[
\begin{array}{ll}
E_4 = E_1  \tfrac{3}{4}E_3:& x + \tfrac{7}{4}y = \tfrac{17}{4}\\
E_5 = E_2  \tfrac{1}{2}E_3:& \phantom{}x + \tfrac{1}{2}y = \tfrac{5}{2}
\end{array}
\]
Next we eliminate \(y\) with
\[
\begin{array}{ll}
E_6 = E_4  \tfrac{7}{2}E_5:& \tfrac{9}{2}x = \tfrac{17}{4}  \tfrac{35}{4} = \tfrac{18}{4} = \tfrac{9}{2}
\end{array}
\]
and hence \(x\) is equal to 1.At each step in this process we transform a system of \(n\) equations of \(n\) unknowns into a system of \(n1\) equations of \(n1\) unknowns and it is therefore supremely well suited to a recursive implementation, as shown in listing 1.
Listing 1: Solving For x
function solve(s, n) { var lhs = s.lhs; var rhs = s.rhs; var ln = lhs[n]; var i, j, li; for(i=0;i<n;++i) { li = lhs[i]; for(j=0;j<n;++j) li[j] = li[n]*ln[j]/ln[n]; rhs[i] = li[n]*rhs[n]/ln[n]; } return n>0 ? solve(s, n1) : rhs[0]/ln[0]; }
This is a straightforward implementation of the above process with a call to
solve(s, n)
eliminating the \(n^\mathrm{th}\) variable from the system of equations represented by s
as should be clear if you consider what would happen to li[j]
in the inner loop if we looped up to j===n
.That we don't bother actually setting it to zero is simply because subsequent steps won't depend upon it.
The first thing we need to do before we can use this to compute the derivative of a function is to decide what values we should choose for each \(y_i\) when setting up the system of equations.
Assuming that we have some value \(e\) that is of the correct order of magnitude for the finite differences we shall use
\[
y_i = x + i \times e \times (x+1)
\]
so as to balance between absolute and relative differences for small and large \(x\), as is done in listing 2.Listing 2: Setting Up The Equations
function system(f, x, e, n) { var lhs = []; var rhs = []; var abs_x = Math.abs(x); var f_x = f(x); var i, j, y, fac, dn, li; for(i=0;i<n;++i) { y = x + (i+1)*e*(abs_x+1); fac = 1; li = []; for(j=0;j<n;++j) { fac *= j+1; dn = Math.pow(yx, j+1); li[j] = dn/fac; } lhs[i] = li; rhs[i] = f(y)f(x); } return { lhs: lhs, rhs: rhs }; }
For an \(n^\mathrm{th}\) order approximation, I have chosen
\[
e = ((n+1)\times\epsilon)^\frac{1}{n+1}
\]
for an error of \(O\left(\epsilon^\frac{n}{n+1}\right)\). The factor of \(n+1\) is a pragmatic, albeit somewhat naive, device intended to reduce numerical errors for large \(n\).
Polynomial Derivative Approximation
Listing 3 provides an implementation of a polynomial derivative approximation using the functions defined above.Listing 3: A Polynomial Derivative Approximation
ak.polynomialDerivative = function(f, n) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.polynomialDerivative'); } if(ak.nativeType(n)!==ak.NUMBER_T) { throw new Error('nonnumeric number of terms passed to ak.polynomialDerivative'); } n = ak.trunc(n); if(!(n>0)) { throw new Error('nonpositive order passed to ak.polynomialDerivative'); } var e = Math.pow((n+1)*ak.EPSILON, 1/(n+1)); return function(x) { return solve(system(f, x, e, n), n1); }; };
Program 1 plots the base 10 logarithm of the absolute error in the polynomial derivative of the exponential function at zero, or in other words the number of accurate decimal places, against the order of the approximation as a black line and that of the expected order of the error \(\epsilon^\frac{n}{n+1}\) as a red line.
Program 1: Polynomial Derivative Errors



This certainly seems to be a step in the right direction; as we increase the order of our polynomials the number of accurate digits grow more or less as expected. We should consequently expect that by increasing the order of the polynomials still further we should bring the error arbitrarily close to \(\epsilon\), as demonstrated by program 2.
Program 2: Higher Order Errors



Something has gone horribly wrong! As we expend ever increasing effort in improving the approximation, the errors are getting worse.
The reason for this disastrous result should be apparent if you spend a little time studying our simultaneous equation solver. We are performing large numbers of subtractions and divisions; if this doesn’t scream cancellation error at you then you haven’t been paying attention!
We can demonstrate the problem quite neatly if we use our interval arithmetic class^{[1]} to keep track of floating point errors. You will recall with interval arithmetic we keep track of an upper and lower bound on a floating point calculation by choosing the most pessimistic bounds on the result of an operation and then rounding the lower bound down and the upper bound up.
The first thing we need is a function to convert a system of equations with floating point terms into one with
ak.interval
terms, as given in listing 4.Listing 4: Converting Terms To Intervals
function systemBounds(s) { var n = s.rhs.length; var i, j, li; for(i=0;i<n;++i) { li = s.lhs[i]; for(j=0;j<n;++j) li[j] = ak.interval(li[j]); s.rhs[i] = ak.interval(s.rhs[i]); } return s; }
Note that this is a little optimistic as it assumes just a single rounding error in the calculation of each term. Fortunately, for our test case of the exponential function evaluated at zero this isn't so far off the mark.
Next we need a function to solve this new system of equations, which pretty much just involves replacing the inbuilt arithmetic operations in the original solver with
ak
overloaded ones, as shown in listing 5.Listing 5: Solving The Interval System
function solveBounds(s, n) { var lhs = s.lhs; var rhs = s.rhs; var ln = lhs[n]; var i, j, li; for(i=0;i<n;++i) { li = lhs[i]; for(j=0;j<n;++j) li[j] = ak.sub(li[j], ak.div(ak.mul(li[n],ln[j]),ln[n])); rhs[i] = ak.sub(rhs[i], ak.div(ak.mul(li[n], rhs[n]), ln[n])); } return n>0 ? solveBounds(s, n1) : ak.div(rhs[0], ln[0]); }
Glueing these functions together we can calculate the error bounds of the polynomial derivative approximation, as shown in listing 6.
Listing 6: Polynomial Derivative Error Bounds
ak.polynomialDerivativeBounds = function(f, n) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.polynomialDerivativeBounds'); } if(ak.nativeType(n)!==ak.NUMBER_T) { throw new Error( 'nonnumeric number of terms passed to ak.polynomialDerivativeBounds'); } n = ak.trunc(n); if(!(n>0)) { throw new Error('nonpositive order passed to ak.polynomialDerivativeBounds'); } var e = Math.pow((n+1)*ak.EPSILON, 1/(n+1)); return function(x) { return solveBounds(systemBounds(system(f, x, e, n)), n1); }; };
Program 3 plots the approximate number of decimal places of agreement between the upper and lower bounds of the interval result of the approximation as a black line and those of the expected order of the error as a red line.
Program 3: Cancellation Errors



This clearly shows that our higher order polynomial approximations are suffering from massive loss of precision. We can’t plot results above 11^{th} order since the intervals are infinite; these calculations have effectively lost all digits of precision.
Note that the intersection of the two curves provides a reasonable choice of 4 for the polynomial order we should use to calculate this particular derivative.
Improving The Algorithm
Our algorithm is, in fact, the first step in the Gaussian elimination technique for solving simultaneous linear equations. The second stage is that of backsubstitution during which the remaining unknowns are recovered by recursively substituting back to the equations each unknown as it is revealed. We are able to skip this step because we are only interested in one of the unknowns.However, as it stands, our algorithm is not particularly well implemented.
Firstly, if every value in a column in the left hand side is zero, an unlikely but not impossible prospect, we shall end up dividing zero by zero.
Secondly, and more importantly, by eliminating rows in reverse order of the magnitude of \(\delta\), we risk dividing by small values and unnecessarily amplifying rounding errors; a far better scheme is to choose the row with the largest absolute value in the column we are eliminating.
Listing 7 provides a more robust simultaneous equation solver that corrects these weaknesses.
Listing 7: A Robust Solver
function stableSolve(s, n) { var lhs = s.lhs; var rhs = s.rhs; var abs_nn = Math.abs(lhs[n][n]); var i, j, abs_in, t, li, ln; for(i=0;i<n;++i) { abs_in = Math.abs(lhs[i][n]); if(abs_in>abs_nn) { abs_nn = abs_in; t = lhs[i]; lhs[i] = lhs[n]; lhs[n] = t; t = rhs[i]; rhs[i] = rhs[n]; rhs[n] = t; } } ln = lhs[n]; if(abs_nn>0) { for(i=0;i<n;++i) { li = lhs[i]; for(j=0;j<n;++j) li[j] = li[n]*ln[j]/ln[n]; rhs[i] = li[n]*rhs[n]/ln[n]; } } return n>0 ? stableSolve(s, n1) : rhs[0]/lhs[0][0]; }
Unfortunately, whilst this certainly improves the general stability of our algorithm, it does not make much difference to its accuracy.
A more effective approach is to try to minimise the number of arithmetic opertions required by the algorithm. We can do this by exploiting the very same observation that led us to the symmetric finite difference^{[2]}; that the difference between a pair of evaluations of the function an equal distance above and below \(x\) depends only on the odd ordered derivative.
\[
\delta \times f'(x) + \tfrac{1}{6} \times f'''(x) + \dots + \tfrac{1}{(2n1)!}\delta^{2n1} \times f^{(2n1)}(x) = \tfrac{1}{2} \left(f(x+\delta)  f(x\delta)\right)
\]
This will yield an error of the same order as our first implementation with approximately half the number of simultaneous equations and consequently roughly one eighth of the arithmetic operations.Listing 8 shows the implementation of a function to set up a system of equations based on symmetric differences.
Listing 8: Setting Up The Symmetric Equations
function symmetricSystem(f, x, e, n) { var lhs = []; var rhs = []; var abs_x = Math.abs(x); var f_x = f(x); var i, j, y, d, fac, dn, li; for(i=0;i<n;++i) { y = abs_x + (i+1)*e*(abs_x+1); d = yabs_x; fac = 1; li = []; for(j=0;j<n;++j) { dn = Math.pow(d, 2*j+1); li[j] = dn/fac; fac *= (2*j+2) * (2*j+3); } lhs[i] = li; rhs[i] = 0.5*(f(x+d)f(xd)); } return { lhs: lhs, rhs: rhs }; }
Applying the stable solver to this system of equations yields a doubly improved polynomial derivative approximation, as given in listing 9.
Listing 9: A Symmetric Polynomial Derivative Approximation
ak.symmetricPolynomialDerivative = function(f, n) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.symmetricPolynomialDerivative'); } if(ak.nativeType(n)!==ak.NUMBER_T) ( throw new Error( 'nonnumeric number of terms passed to ak.symmetricPolynomialDerivative'); } n = ak.trunc(n); if(!(n>0)) { throw new Error('nonpositive order passed to ak.symmetricPolynomialDerivative'); } var e = Math.pow((2*n+1)*ak.EPSILON, 1/(2*n+1)); return function(x) { return stableSolve(symmetricSystem(f, x, e, n), n1); }; };
Program 4 illustrates the impact upon the number of accurate digits in the results of our improved approximations.
Program 4: Symmetric Polynomial Errors



If you can see any breaks in the black line representing the actual error it's because the approximation is exact for those values of \(n\). This happens in all of the browsers I test with and I must confess that it came as something of a suprise to me since my C++ implementation isn't exact anywhere in this range! Could it be that JavaScript is more accurate for floating point arithmetic?
Well, no. They're both using the same standard after all.
I suspect that the reason is that IEEE floating point arithmetic is allowed to promote numbers to the most accurate available hardware representation during a calculation before rounding them down when they are moved from registers to RAM. In an interpreted language like JavaScript the intermediate values in an expression will be moved to RAM immediately, scuppering any chance of improving accuracy in this fashion. I believe that I just got lucky that the rounding actually improves the result in some cases (albeit for different cases in different browsers, presumably because of different choices for the order of execution of subexpressions by different JavaScript engines).
To demonstrate how much of the accuracy we see in this example is pure luck we'll need to compute bounds for the approximation as we did for our original implementation. Listing 10 gives an implementation of the stable solver using interval arithmetic with which we can do so.
Listing 10: A Robust Bounds Solver
function stableSolveBounds(s, n) { var lhs = s.lhs; var rhs = s.rhs; var abs_nn = ak.abs(lhs[n][n]); var i, j, abs_in, t, li, ln; for(i=0;i<n;++i) { abs_in = ak.abs(lhs[i][n]); if(abs_in.lb()>abs_nn.lb()) { abs_nn = abs_in; t = lhs[i]; lhs[i] = lhs[n]; lhs[n] = t; t = rhs[i]; rhs[i] = rhs[n]; rhs[n] = t; } } ln = lhs[n]; if(abs_nn.ub()>0) { for(i=0;i<n;++i) { li = lhs[i]; for(j=0;j<n;++j) li[j] = ak.sub(li[j], ak.div(ak.mul(li[n],ln[j]),ln[n])); rhs[i] = ak.sub(rhs[i], ak.div(ak.mul(li[n], rhs[n]), ln[n])); } } return n>0 ? stableSolveBounds(s, n1) : ak.div(rhs[0], lhs[0][0]); }
The main difference between this and the original stable solver, apart from the use of the
ak
overloaded arithmetic operations, is that when we select which row that we use to eliminate a variable we pick the one with the worst case largest magnitude coefficient or, in other words, we test the lower bound of the absolute value of the intervals.Listing 11 shows the implementation of the bounds calculation for the symmetric polynomial derivative.
Listing 11: Symmetric Polynomial Derivative Error Bounds
ak.symmetricPolynomialDerivativeBounds = function(f, n) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.symmetricPolynomialDerivativeBounds'); } if(ak.nativeType(n)!==ak.NUMBER_T) { throw new Error( 'nonnumeric number of terms passed to ak.symmetricPolynomialDerivativeBounds'); } n = ak.trunc(n); if(!(n>0)) { throw new Error( 'nonpositive order passed to ak.symmetricPolynomialDerivativeBounds'); } var e = Math.pow((2*n+1)*ak.EPSILON, 1/(2*n+1)); return function(x) { return stableSolveBounds(systemBounds(symmetricSystem(f, x, e, n)), n1); }; };
Program 5 demonstrates the presence of cancellation error in the symmetric polynomial derivative approximation.
Program 5: Symmetric Cancellation Errors



If we choose the point at which the observed cancellation error crosses the expected error, shown as a red line, we should choose a 7^{th} order polynomial, corresponding again to 4 simultaneous equations.
Now these polynomial approximation algorithms, implemented in PolynomialDerivative.js, are certainly an improvement upon finite differences since they automate the annoying process of working out the difference formulae. However, we are still clearly constrained by the trade off between approximation and cancellation error.
This problem is fundamental to the basic approach of these algorithms; computing the coefficients of the Taylor series by solving simultaneous equations is inherently numerically unstable.
To understand why, it is illuminating to recast the problem as one of linear algebra.
A Problem Of Linear Algebra
By the rules of matrix and vector arithmetic we have, for a matrix \(\mathbf{M}\) and vectors \(\mathbf{v}\) and \(\mathbf{w}\)
\[
\begin{align*}
\mathbf{w} &= \mathbf{M} \times \mathbf{v}\\
\mathbf{w}_i &= \sum_j \mathbf{M}_{i,j} \times \mathbf{v}_j
\end{align*}
\]
where subscripts \(i,j\) of a matrix denote the \(j^\mathrm{th}\) column of the \(i^\mathrm{th}\) row, a subscript \(i\) of a vector denotes its \(i^\mathrm{th}\) element and the capital sigma denotes the sum of the expression to its right for every valid value of the symbol beneath it.We can consequently represent the simultaneous equations of our improved algorithm with a single matrixvector equation if we choose the elements of \(\mathbf{M}\), \(\mathbf{v}\) and \(\mathbf{w}\) such that
\[
\begin{align*}
\mathbf{M}_{i,j} &= \tfrac{1}{(2j1)!} \left(i \times \delta\right)^{2j1}\\
\mathbf{v}_j &= f^{(2j1)}(x)\\
\mathbf{w}_i &= \tfrac{1}{2} \left(f(x+i\times\delta)  f(xi\times\delta)\right)^{2j1}
\end{align*}
\]
for \(i\) and \(j\) from one up to and including \(n\).Those matrices with the same number of rows and columns, say \(n\), whose elements are equal to one if the column index is equal to the row index and zero otherwise we call identity matrices, denoted by \(\mathbf{I}\). These have the property that for all vectors \(\mathbf{v}\) with \(n\) elements
\[
\mathbf{I} \times \mathbf{v} = \mathbf{v} \times \mathbf{I} = \mathbf{v}
\]
If we can find a matrix \(\mathbf{M}^{1}\) such that
\[
\mathbf{M}^{1} \times \mathbf{M} = \mathbf{I}
\]
known as the inverse of \(\mathbf{M}\), we have
\[
\begin{align*}
\mathbf{M}^{1} \times \mathbf{w} &= \mathbf{M}^{1} \times \mathbf{M} \times \mathbf{v}\\
&= \mathbf{I} \times \mathbf{v}\\
&= \mathbf{v}
\end{align*}
\]
Our algorithm is therefore, in some sense, equivalent to inverting the matrix \(\mathbf{M}\).That this can be a source of errors is best demonstrated by considering the geometric interpretation of linear algebra in which we treat a vector as Cartesian coordinates identifying a point.
For example, in two dimensions we have
\[
\begin{align*}
\mathbf{M} &=
\begin{pmatrix}
a & b \\
c & d
\end{pmatrix}\\
\mathbf{v} &=
\begin{pmatrix}
x \\
y
\end{pmatrix}\\
\mathbf{M} \times \mathbf{v} &=
\begin{pmatrix}
a \times x + b \times y \\
c \times x + d \times y
\end{pmatrix}
\end{align*}
\]
Hence every two by two matrix represents a function that takes a point in the plane and returns another point in the plane.If it so happens that \(c \div a\) equals \(d \div b\) then we are in trouble since the two dimensional plane is transformed into a one dimensional line.
For example, if
\[
\begin{pmatrix}
x' \\
y'
\end{pmatrix}
=
\begin{pmatrix}
2 & 4 \\
3 & 6
\end{pmatrix}
\times
\begin{pmatrix}
x \\
y
\end{pmatrix}
=
\begin{pmatrix}
2x + 4y \\
3x + 6y
\end{pmatrix}
\]
then
\[
y' = 1\tfrac{1}{2}x'
\]
Given the result of such a function there is no possible way to identify which of the infinite number of inputs might have yielded it.Such matrices are known as singular matrices and attempting to invert them is analogous to dividing by zero. They are not, however, the cause of our problems. Rather, we are troubled by matrices that are nearly singular.
To understand what this means, consider the two dimensional matrix
\[
\begin{pmatrix}
\tfrac{1}{4} & \tfrac{1}{2}\\
\tfrac{3}{8} & \tfrac{7}{8}
\end{pmatrix}
\]
Program 6 demonstrates multiplying a set of equally spaced points on the unit square by this matrix.
Program 6: Near Singular Matrix Multiplication



The points clearly end up very close to, rather than exactly upon, a straight line.
To a mathematician, who is equipped with the infinitely precise real numbers, this isn’t really a problem. To a computer programmer, who is not, it causes no end of trouble. One dimension of the original data has been compressed to small deviations from the line. These deviations are represented by the least significant digits in the coordinates of the points and much information about the original positions is necessarily rounded off and lost. Any attempt to reverse the process cannot recover the original positions of the points with very much accuracy; cancellation error is the inevitable consequence.
That our approximations should lead to such matrices becomes clear when you consider the right hand side of the equation. We choose \(\delta\) as small as possible in order to minimise approximation error with the result that the function is evaluated at a set of relatively close points. These evaluations will generally lie close to a straight line since the behaviour of the function will largely be governed by the first two terms in its Taylor series expansion.
Unfortunately, the approach we have used to mitigate cancellation error is horribly inefficient since we must solve a new system of simultaneous equations each time we increase the order of the approximating polynomials.
The question is whether there is a better way?
Ridders' Algorithm
The trick is to turn the problem on its head; rather than approximate the function with a polynomial and use its coefficients to approximate the derivative we treat the symmetric finite difference itself as a function of \(\delta\) and approximate it with a polynomial.Specifically, if we define
\[
g_{f,x}(\delta) = \frac{f(x+\delta)  f(x\delta)}{2\delta}
\]
and we vigorously wave our hands, we have
\[
f'(x) = g_{f,x}(0)
\]
Now, if we approximate \(g_{f,x}\) with an \(n^\mathrm{th}\) order polynomial \(\hat{g}^n_{f,x}\) then we can approximate the derivative of \(f\) by evaluating it at zero
\[
f'(x) \approx \hat{g}^n_{f,x}(0)
\]
We could try to find an explicit formula for \(\hat{g}^n_{f,x}\) by evaluating \(g_{f,x}\) at \(n\) nonzero values of \(\delta\) and solving the resulting set of linear equations, as we did in our first polynomial algorithm, but we would end up with the same inefficiency as before when seeking the optimal order.Ridders' algorithm^{[3]} avoids this problem by evaluating the polynomial at zero without explicitly computing its coefficients. That this is even possible might seem a little unlikely, but nevertheless there is more than one way to do so.
Neville's Algorithm
We can easily compute the value of the polynomial that passes through a pair of points, trivially a straight line, without explicitly computing its coefficients with a kind of pro rata formula.Given a pair of points \((x_0,y_0)\) and \((x_1,y_1)\), the line that passes through them consists of the points \((x,y)\) where
\[
\frac{yy_0}{xx_0} = \frac{y_1y_0}{x_1x_0}
\]
This can be rearranged to yield an equation for the line
\[
y = y_0 + \frac{y_1y_0}{x_1x_0} \times (xx_0)
\]
Neville's interpolation algorithm, described in Numerical Recipes in C^{[4]}, generalises this approach to any number of points, or equivalently any order of polynomial, and takes the form of a recursive set of formulae.Given a set of \(n\) points \((x_i,y_i)\) and a value \(x\) at which we wish to evaluate the polynomial that passes through them, we define the formulae
\[
\begin{array}{ll}
p_{i,i}(x) = y_i &\quad 1 \leqslant i \leqslant n\\
p_{i,j}(x) = \frac{(xx_j) \times p_{i,j1}(x) + (x_ix) \times p_{i+1,j}(x)}{x_ix_j} &\quad 1 \leqslant i < j \leqslant n
\end{array}
\]
from which we retrieve the value of the polynomial at \(x\) with \(p_{1,n}(x)\).If we arrange the formulae in a triangular tableau, we can see that as the calculation progresses each value is derived from the pair above and below to its left.
\[
\begin{matrix}
p_{1,1}(x) & & &\\
& p_{1,2}(x) & &\\
p_{2,2}(x) & & p_{1,3}(x) &\\
& p_{2,3}(x) & & p_{1,4}(x)\\
p_{3,3}(x) & & p_{2,4}(x) &\\
& p_{3,4}(x) & &\\
p_{4,4}(x) & & &
\end{matrix}
\]
This shows why this algorithm is so useful; adding a new point introduces a new set of values without affecting those already calculated. We can consequently increase the order of the polynomial approximation without having to recalculate the entire tableau.Further noting that we actually only really need the lower diagonal values when we do so allows us to implement Neville's algorithm with minimal memory usage, as shown in listing 12 and implemented in NevilleInterpolate.js.
Listing 12: Neville's Algorithm
ak.NEVILLE_INTERPOLATE_T = 'ak.nevilleInterpolate'; function refine(xn, yn, xi, yi, x) { var n = xi.length; var i, t; for(i=0;i<n;++i) { t = yi[i]; yi[i] = yn; yn = t; yn = (yn*(xxn) + yi[i]*(xi[n1i]x)) / (xi[n1i]xn); } xi.push(xn); yi.push(yn); return yn; } function NevilleInterpolate(){} NevilleInterpolate.prototype = { TYPE: ak.NEVILLE_INTERPOLATE_T, valueOf: function(){return ak.NaN;} }; ak.nevilleInterpolate = function(x) { var n = new NevilleInterpolate(); var xi = []; var yi = []; n.refine = function(xn, yn) {return refine(xn, yn, xi, yi, x);}; return Object.freeze(n); };
Here
refine
does all of the work updating the tableau according to the above formulae. The ak.nevilleInterpolate
function is responsible for creating a closure in which the xi
and yi
variables representing it are defined.A call to the
refine
method of the object it returns defers to refine
to compute the next lower diagonal and returns its last entry.It might not be immediately obvious that this function correctly applies Neville's algorithm due to its rather terse implementation. I must confess that once I realised that I didn’t need to keep the entire tableau in memory I couldn’t resist rewriting it a few times to improve its efficiency still further.
The key to understanding its operation is the fact that the \(x\) values are iterated over in reverse order.
The tableau representation of the algorithm is also helpful in demonstrating why Neville's algorithm works, as shown in derivation 1.
Derivation 1: Proving Neville's Algorithm
The result of Neville's algorithm for \(n\) points is trivially an \(n1^\mathrm{th}\) order polynomial since each value in the tableau is one order higher in \(x\) than those to its left used to calculate it and the leftmost values are constants and hence of order 0.
All that remains is to prove that it passes through the set of points \((x_i,y_i)\).
Consider a single point \((x_k,y_k)\), some \(i\) less than \(k\) and some \(j\) greater than \(k\)
If the final value lies on one of these diagonals, we are done. If not, then consider the value calculated from the second on each of these diagonals
All that remains is to prove that it passes through the set of points \((x_i,y_i)\).
Consider a single point \((x_k,y_k)\), some \(i\) less than \(k\) and some \(j\) greater than \(k\)
\[
p_{k,k}(x_k) = y_k\\
p_{i,k}(x_k) = \frac{(x_kx_k) \times p_{i,k1}(x_k) + (x_ix_k) \times p_{i+1,k}(x_k)}{x_i  x_k} = \frac{(x_ix_k) \times p_{i+1,k}(x_k)}{x_i  x_k} = p_{i+1,k}(x_k)\\
p_{k,j}(x_k) = \frac{(x_kx_j) \times p_{k,j1}(x_k) + (x_kx_k) \times p_{k+1,k}(x_k)}{x_k  x_j} = \frac{(x_kx_j) \times p_{k,j1}(x_k)}{x_k  x_j} = p_{k,j1}(x_k)
\]
If we apply these equalities recursively, we have
\[
\begin{align*}
p_{i,k}(x_k) &= y_k\\
p_{k,j}(x_k) &= y_k
\end{align*}
\]
so that there are two diagonal lines of \(y_k\)s running up and down from \(p_{k,k}(x_k)\) on the left of the tableau, albeit not necessarily of the same length or of more than the leftmost value.If the final value lies on one of these diagonals, we are done. If not, then consider the value calculated from the second on each of these diagonals
\[
\begin{align*}
p_{k1,k+1}(x_k) &= \frac{(x_k  x_{k+1}) \times p_{k1,k}(x_k) + (x_{k1}  x_k) \times p_{k,k+1}(x_k)}{x_{k1}  x_{k+1}}\\
&= \frac{(x_k  x_{k+1}) \times y_k + (x_{k1}  x_k) \times y_k}{x_{k1}  x_{k+1}}\\
&= \frac{x_{k+1} \times y_k + x_{k1} \times y_k}{x_{k1}  x_{k+1}} = y_k
\end{align*}
\]
If we continue to fill in the tableau we find that every value lying between the diagonals in the tableau, including \(p_{1,n}(x_k)\), must therefore be equal to \(y_k\).
A Better Polynomial Approximation
We could use interval arithmetic again to decide when to stop increasing the order of the approximating polynomial. Ridders' algorithm takes a different approach, however.As the order of the polynomial increases we expect the result to get progressively closer to the correct value until the point at which cancellation error takes over. The absolute differences between successive approximations should consequently form a more or less decreasing sequence up to that point.
The algorithm therefore begins with a relatively large \(\delta\) and, shrinking it at each step, computes the symmetric finite difference and refines the polynomial approximation for \(\delta\) equal to 0. The value with the smallest absolute difference from that of the previous step is used as the approximation of the derivative with the algorithm terminating when the step starts to grow.
Now there is almost certainly going to be some numerical noise in the value of the polynomial at each step, so rather than stop as soon as the difference increases we shall terminate when the difference is twice the smallest found so far.
Note that the smallest absolute difference provides a rough estimate of the error in the approximation.
The rate at which we shrink \(\delta\) and the termination criterion are based upon the implementation of this algorithm given in Numerical Recipes in C. There are a couple of important differences however.
Firstly, they use the penultimate value in the final row of the Neville's algorithm tableau as a second approximation to the derivative, having the same polynomial order as the previous iteration's approximation but with a smaller \(\delta\). This is used to improve the termination criterion for the algorithm.
Secondly, we exploit the fact that the symmetric finite difference doesn’t depend upon the sign of \(\delta\). At each step we can refine the polynomial twice with the same symmetric difference; once with the positive \(\delta\), once with the negative. This doubles the order of the approximating polynomial and forces it to be symmetric about 0 with no additional evaluations of the function but may lead to increased cost in the application of Neville's algorithm. This is a reasonable tradeoff if the cost of the function evaluations is the primary concern.
Listing 13 shows the implementation of Ridders' algorithm as defined in RidderDerivative.js.
Listing 13: Ridders' Algorithm
function ridderRefine(i, f, x, dx) { var xa = x + dx; var xb = x  dx; var df = (f(xa)f(xb))/(xaxb); i.refine(xbxa, df); return i.refine(xaxb, df); } function ridderApply(f, x, d) { var c1 = 7/5; var c2 = c1*c1; var i = ak.nevilleInterpolate(0); var dx = d * (Math.abs(x) + 1); var y0 = ridderRefine(i, f, x, dx); var y1 = ridderRefine(i, f, x, dx/=c1); var y = y1; var e1 = Math.abs(y0y1); var e2 = e1; while(e2<2*e1) { y0 = y1; y1 = ridderRefine(i, f, x, dx/=c2); e2 = Math.abs(y0y1); if(e2<e1) { e1 = e2; y = y1; } } return { val: y, err: e1 }; } ak.ridderDerivative = function(f, d) { var r; if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.ridderDerivative'); } if(ak.nativeType(d)===ak.UNDEFINED_T) d = 0.1; if(ak.nativeType(d)!==ak.NUMBER_T) { throw new Error('nonnumeric distance passed to ak.ridderDerivative'); } if(!(d>0)) { throw new Error('nonpositive distance passed to ak.ridderDerivative'); } r = function(x) {return ridderApply(f, x, d).val;}; r.apply = function(x) {return ridderApply(f, x, d);}; return Object.freeze(r); };
Here the
ridderApply
function manages adding samples to the Neville interpolation of the derivative in ridderRefine
. ak.ridderDerivative
simply checks the validity of its arguments and defines a default \(\delta\) of 0.1 before returning an object that forwards the calculation to ridderApply
.Note that
ridderDerivative
is a functional; a function that takes a function as its argument and returns a function as its result, in this case an approximation of the derivative. Since ridderApply
returns an object containing both the approximate derivative and the estimated error, the result is given an apply
method to provide access to the latter.Program 7 shows the actual (black line) and estimated (red line) decimal digits of accuracy in the approximate derivative of the exponential function using our implementation of Ridders' algorithm.
Program 7: Ridders' Algorithm Errors



This represents a vast improvement in both accuracy and error estimation over every algorithm we have studied thus far. The relative error in these results is on average roughly 2E15, just one decimal order of magnitude worse than the theoretical minimum.
But that's one decimal order of magnitude of improvement just waiting to be improved...
References
[1] You're Going To Have To Think! Why Interval Arithmetic Won’t Cure Your Floating Point Blues, www.thusspakeak.com, 2013.[2] You're Going To Have To Think! Why Finite Differences Won’t Cure Your Calculus Blues, www.thusspakeak.com, 2013.
[3] Ridders, C.J.F., Advances in Engineering Software, Vol. 4. No. 2., Elsevier, 1982.
[4] Press, W.H. et al, Numerical Recipes in C (2nd ed.), Cambridge University Press, 1992.
Based upon an article I wrote for ACCU's Overload magazine.
Leave a comment