In the previous post^{[1]} I discussed the foundations of the differential calculus. Initially defined in the 17^{th} century in terms of the rather vaguely defined infinitesimals, it was not until the 19^{th} century that Cauchy gave it a rigorous definition with his formalisation of the concept of a limit. Fortunately for us, the infinitesimals were given a solid footing in the 20^{th} century with Conway’s surreal numbers^{[2]} and Robinson’s nonstandard numbers^{[3]}, saving us from the annoyingly complex reasoning that Cauchy’s approach requires.
Finally, we discussed the mathematical power tool of numerical computing; Taylor’s theorem. This states that for any sufficiently differentiable function \(f\)
You may recall that I used forward finite differencing as an example of cancellation error in the first post in this series^{[4]}. This technique replaces the infinitesimal \(\delta\) in the definition of the derivative with a finite, but small, quantity.
We found that the optimal choice of this finite \(\delta\) was driven by a trade off between approximation error and cancellation error. With some fairly vigorous hand waving, we concluded that it was the square root of \(\epsilon\); the difference between 1 and the smallest floating point number greater than 1.
This time, and I fancy I can hear your collective groans of dismay, I shall dispense with the hand waving.
Assuming that \(f\) introduces relative rounding errors within some integer \(n_f\) multiples of \(\tfrac{1}{2}\epsilon\) in the region of \(x\) and that \(x\) has already accumulated a relative rounding error of some integer \(n_x\) multiples of \(\tfrac{1}{2}\epsilon\) then, if we wish to approximate the derivative as accurately as possible we should choose \(\delta\) to minimise
Now this is a function of \(\delta\) of the form
To leading order in \(\epsilon\) and \(\delta\) the worst case absolute error in the forward finite difference approximation of the derivative is therefore
when \(\delta\) is equal to
The first is that these expressions depend on the relative rounding errors of \(x\) and \(f\). We can dismiss these problems out of hand since if we have no inkling as to how accurate \(x\) or \(f\) are then we clearly cannot possibly have any expectation whatsoever that we can accurately approximate the derivative.
The second, and slightly more worrying, is that the error depends upon the very value we are trying to approximate; the derivative of \(f\) at \(x\). Fortunately, we can recover the error to leading order in \(\epsilon\) by replacing it with the finite difference itself.
The third, and by far the most significant, is that both expressions depend upon the behaviour of the unknown second derivative of \(f\).
Unfortunately this is ever so slightly more difficult to weasel our way out of. By which I of course mean that in general it is entirely impossible to do so.
Given the circumstances, the best thing we can really do is to guess how the second derivative behaves. For purely pragmatic reasons we might assume that
Substituting this into our error expression yields
This is little more than an irritation if our guess is significantly larger than the second derivative since the true error will be smaller than our estimate and we will still have a valid, if pessimistic, bound.
If our guess is significantly smaller, however, we’re in trouble since we shall in this case underestimate the error. Unfortunately there is little we can do about it.
One thing worth noting is that if the second derivative is very large at \(x\) then the derivative will be rapidly changing in its vicinity. The limiting case as the second derivative grows in magnitude is a discontinuity at which the derivative doesn’t exist.
If we have a very large second derivative, we can argue that the derivative is in some sense approaching nonexistence and that we should need to be aware of this and plan our calculations accordingly.
We have one final issue to address before implementing our algorithm; we have assumed that we can exactly represent both \(\delta\) and \(x+\delta\). Given that our expression for the optimal choice of \(\delta\) involves a floating point square root operation this is, in general, unlikely to be the case.
Fortunately we can easily find the closest floating point value to \(\delta\) for which our assumptions hold with
Listing 1 provides the implementation of the forward finite difference.
Note that by default we assume that the function
In the body of the returned function we multiply
As an example, let us apply our forward difference to the exponential function with arguments from 10 to 10. We can therefore expect that \(n_f\) is equal to one and \(n_x\) to zero and hence that our approximation of the error is
To compare our approximation of the error with the exact error we shall count the number of leading zeros after the decimal point of each, which we can do by negating the base 10 logarithm of their values.
Program 1 plots the leading zeros in the decimal fraction of our approximate error as a red line and in that of the true error as a black line, with larger values on the vertical axis thus implying smaller values for the errors.
Our approximation clearly increasingly underestimates the error as the absolute value of \(x\) increases. This shouldn’t come as too much of a surprise since our assumption about the behaviour of the second derivative grows increasingly inaccurate as the magnitude of \(x\) increases for the exponential function.
Nevertheless, our approximation displays the correct overall trend and is nowhere catastrophically inaccurate, at least to my eye.
The question remains as to whether we can do any better.
If we approximate the derivative with the finite difference between a small step above and a small step below \(x\) we can arrange for this term to cancel out. Specifically, the expression
Now this is a rather impressive order of magnitude better in \(\delta\) than the forward finite difference considering that it involves no additional evaluations of \(f\).
That said, it is not at all uncommon that both the value of the function and its derivative are required, in which case the forward finite difference can get one of its function evaluations for free.
With a similar analysis to that we made for the forward finite difference, given in derivation 4, we find that the optimal choice of \(\delta\) must minimise
This time the quantity we wish to minimise is a function of \(\delta\) of the form
when \(\delta\) is equal to
Unfortunately in order to achieve this we have compounded the problem of unknown quantities in the error and the choice of \(\delta\).
The optimal choice of \(\delta\) is now dependent on the properties of the third rather than the second derivative of \(f\) so we cannot use our previous argument that it may in some sense be reasonable to ignore it.
Furthermore, the resulting approximation error is dependant on both the second and the third derivatives of \(f\). We can deal with the first problem in the same way as we did before. In the name of pragmatism we assume that
We could arbitrarily decide that it has a similar form to that we chose for it during our analysis of the forward finite difference
Derivation 6 shows that we should choose
Substituting these guessed derivatives back into our error formula yields an estimated error of
Listing 2 gives the implementation of the symmetric finite difference which can be found, together with the forward finite difference, in FiniteDifference.js.
Note that, as with
Following the forward finite difference example, we'll apply the symmetric difference to the exponential function with arguments from 10 to 10 and so can assume that \(n_f\) is equal to one and \(n_x\) is equal to zero yielding an error approximation of
Program 2 plots the negation of the base 10 logarithm of our approximation of the error in this numerical approximation of the derivative of the exponential function as a red line and the true error as a black line with larger values on the vertical axis again implying smaller errors.
Clearly the error in the symmetric finite difference is smaller than that in the forward finite difference, although it appears that the accuracy of our approximation of that error isn’t quite so good. That said, the average ratios between the number of decimal places in the true error and the approximate error of the two algorithms are not so very different; 1.21 for the forward finite difference and 1.24 for the symmetric finite difference.
Not too shabby if you ask me.
But the question still remains as to whether we can do any better.
Unfortunately, doing so makes a full error analysis even more tedious than those we have already suffered through. I therefore propose, and I suspect that this will be to your very great relief, that we revert to our original handwaving analysis.
In doing so our choice of \(\delta\) shall not be significantly impacted, but we shall have to content ourselves with a less satisfactory estimate of the error in the approximation.
We shall start by returning to Taylor’s theorem again.
By the same argument from pragmatism that we have so far used we should therefore choose
With sufficient patience, we might continue in this manner, creating ever more accurate approximations at the expense of increased calls to \(f\).
Unfortunately, not only is this extremely tiresome, but we cannot escape the fact that the error in such approximations shall always depend upon the behaviour of unknown higher order derivatives.
There must be a better way...
[2] Knuth, D. Surreal Numbers; How Two ExStudents Turned on to Pure Mathematics and Found Total Happiness, Addison Wesley, 1974.
[3] Robinson, A. Nonstandard Analysis (New Ed.), Princeton University Press, 1996.
[4] You're Going To Have To Think! Why [Insert Technique Here] Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.
Finally, we discussed the mathematical power tool of numerical computing; Taylor’s theorem. This states that for any sufficiently differentiable function \(f\)
\[
f(x+\delta) = f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + R_n\\
\min\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \leqslant R_n \leqslant \max\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \; \mathrm{for} \; 0 \leqslant \theta \leqslant 1
\]
If we do not place a limit on the number of terms, we have
\[
\begin{align*}
f(x+\delta) &= f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \dots + \tfrac{1}{n!}\delta^n \times f^{(n)} + \dots \\
&= \sum_{i \geqslant 0} \tfrac{1}{i!} \delta^i f^{(i)}(x)
\end{align*}
\]
Note that here \(f'(x)\) stands for the first derivative of \(f\) at \(x\), \(f''(x)\) for the second and \(f^{(n)}(x)\) for the \(n^{th}\) with the convention that the \(0^{th}\) derivative of a function is the function itself. The capital sigma stands for the sum of the expression to its right for every unique value of \(i\) that satisfies the inequality beneath it and \(i!\) stands for the factorial of \(i\), being the product of every integer between 1 and \(i\), with convention that the factorial of 0 is 1.You may recall that I used forward finite differencing as an example of cancellation error in the first post in this series^{[4]}. This technique replaces the infinitesimal \(\delta\) in the definition of the derivative with a finite, but small, quantity.
We found that the optimal choice of this finite \(\delta\) was driven by a trade off between approximation error and cancellation error. With some fairly vigorous hand waving, we concluded that it was the square root of \(\epsilon\); the difference between 1 and the smallest floating point number greater than 1.
This time, and I fancy I can hear your collective groans of dismay, I shall dispense with the hand waving.
Forward Finite Difference
Given some small finite positive \(\delta\), the forward finite difference is given by
\[
\frac{f(x+\delta)  f(x)}{\delta}
\]
Using Taylor's theorem the difference between this value and the derivative is equal to
\[
\tfrac{1}{2} \delta \times f''(y)
\]
for some \(y\) between \(x\) and \(x + \delta\).Assuming that \(f\) introduces relative rounding errors within some integer \(n_f\) multiples of \(\tfrac{1}{2}\epsilon\) in the region of \(x\) and that \(x\) has already accumulated a relative rounding error of some integer \(n_x\) multiples of \(\tfrac{1}{2}\epsilon\) then, if we wish to approximate the derivative as accurately as possible we should choose \(\delta\) to minimise
\[
\frac{n_f \epsilon}{\delta} f(x) + \left(\tfrac{1}{2} n_f + 1\right) \epsilon f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y)
\]
for some \(y\) in the interval \(\left[x + \tfrac{1}{2} n_x \epsilon x,\; x + \tfrac{1}{2} n_x \epsilon x + \delta\right]\), as shown in derivation 1.
Derivation 1: Forward Finite Difference Approximation Error
From Taylor's theorem we have
We shall assume that \(f\) introduces a proportional rounding error no worse than some integer \(n_f\) multiples of \(\tfrac{1}{2}\epsilon\) for the arguments used in the finite difference and that \(x\) has a proportional rounding error of some integer \(n_x\) multiples of \(\tfrac{1}{2}\epsilon\).
We shall further assume that we can represent \(\delta\) exactly and that the sum of it and \(x\) introduces no additional rounding error.
Under these assumptions the floating point result of the forward finite difference calculation is bounded by
\[
f(x+\delta) = f(x) + \delta f'(x) + \tfrac{1}{2}\delta^2f''(y)
\]
for some \(y\) between \(x\) and \(x+\delta\).We shall assume that \(f\) introduces a proportional rounding error no worse than some integer \(n_f\) multiples of \(\tfrac{1}{2}\epsilon\) for the arguments used in the finite difference and that \(x\) has a proportional rounding error of some integer \(n_x\) multiples of \(\tfrac{1}{2}\epsilon\).
We shall further assume that we can represent \(\delta\) exactly and that the sum of it and \(x\) introduces no additional rounding error.
Under these assumptions the floating point result of the forward finite difference calculation is bounded by
\[
\begin{align*}
&\frac{\left[f\left(x + \tfrac{1}{2} n_x \epsilon x +\delta\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)  f\left(x + \tfrac{1}{2} n_x \epsilon x\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)\right] \left(1 \pm \tfrac{1}{2}\epsilon\right)}{\delta} \left(1 \pm \tfrac{1}{2}\epsilon\right)\\
&\quad = \left[\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x +\delta\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{\delta}  \frac{f\left(x + \tfrac{1}{2} n_x \epsilon x\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{\delta}\right] \left(1 \pm \tfrac{1}{2}\epsilon\right)^2
\end{align*}
\]
Taking the terms in the square brackets individually we have
\[
\begin{align*}
&\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x +\delta\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{\delta}\\
&\quad = \frac{\left[f(x) + \left(\tfrac{1}{2} n_x \epsilon x + \delta\right) f'(x) + \tfrac{1}{2} \left(\tfrac{1}{2} n_x \epsilon x + \delta\right)^2 f''(y_1)\right] \left(1 \pm \tfrac{1}{2} n_f \epsilon \right)}{\delta}\\
&\quad = \frac{1 \pm \tfrac{1}{2} n_f \epsilon}{\delta} f(x) + \frac{\tfrac{1}{2} n_x \epsilon x}{\delta} f'(x) + \left(1 \pm \tfrac{1}{2} n_f \epsilon\right) f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1)\\
&\quad \quad + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)\\
&\\
&\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{\delta}\\
&\quad = \frac{\left[f(x) + \left(\tfrac{1}{2} n_x \epsilon x\right) f'(x) + \tfrac{1}{2} \left(\tfrac{1}{2} n_x \epsilon x\right)^2 f''(y_0)\right] \left(1 \pm \tfrac{1}{2} n_f \epsilon \right)}{\delta}\\
&\quad = \frac{1 \pm \tfrac{1}{2} n_f \epsilon}{\delta} f(x) + \frac{\tfrac{1}{2} n_x \epsilon x}{\delta} f'(x) + O\left(\frac{\epsilon^2}{\delta}\right)
\end{align*}
\]
yielding a difference of
\[
\begin{align*}
&\pm \frac{n_f \epsilon}{\delta} f(x) + \left(1 \pm \tfrac{1}{2} n_f \epsilon\right) f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)\\
&\quad = f'(x) \pm \frac{n_f \epsilon}{\delta} f(x) \pm \tfrac{1}{2} n_f \epsilon f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)
\end{align*}
\]
Finally, multiplying by \(\left(1 \pm \tfrac{1}{2} \epsilon\right)^2 = 1 \pm \epsilon + O\left(\epsilon^2\right)\) gives
\[
\begin{align*}
&f'(x) \pm \frac{n_f \epsilon}{\delta} f(x) \pm \tfrac{1}{2} n_f \epsilon f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1) \pm \epsilon f'(x) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)\\
&\quad = f'(x) \pm \frac{n_f \epsilon}{\delta} f(x) + \left(\pm \tfrac{1}{2} n_f \pm 1\right) \epsilon f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)
\end{align*}
\]
for a worst case absolute error of
\[
\frac{n_f \epsilon}{\delta} f(x) + \left(\tfrac{1}{2} n_f + 1\right) \epsilon f'(x) + \tfrac{1}{2} (n_x \epsilon x + \delta) f''(y_1) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon \delta\right)
\]
Now this is a function of \(\delta\) of the form
\[
f(x) = \frac{a}{x} + bx + c
\]
Such functions, for positive \(a\), \(b\) and \(x\), have a minimum value of \(2 \sqrt{ab} + c\) at \(x = \sqrt{a/b}\) (taking the positive square roots) as shown in derivation 2.To leading order in \(\epsilon\) and \(\delta\) the worst case absolute error in the forward finite difference approximation of the derivative is therefore
\[
\sqrt{2 n_f f(x) f''(y) \epsilon} + \left(\tfrac{1}{2} n_x x f''(y) + \left(\tfrac{1}{2} n_f + 1\right)f'(x)\right) \, \epsilon
\]
Derivation 2: The Minimum Of a/x + bx + c
Recall that the turning points of a function \(f\), that is to say the minima, maxima and inflections, occur where the derivative is zero.
We can use the second derivative to find out what kind of turning point this is; positive implies a minimum, negative a maximum and zero an inflection.
\[
\begin{align*}
f(x) &= \frac{a}{x} + bx + c\\
f'(x) &= \frac{a}{x^2} + b\\
f'\left(\sqrt{a/b}\right) &= \frac{a}{a/b} + b = b + b = 0\\
f\left(\sqrt{a/b}\right) &= \frac{a}{\sqrt{a/b}} + b \sqrt{a/b} + c = 2 \sqrt{ab} + c
\end{align*}
\]
Note that this is only a real number if \(a\) and \(b\) have the same sign.We can use the second derivative to find out what kind of turning point this is; positive implies a minimum, negative a maximum and zero an inflection.
\[
\begin{align*}
f''(x) &= 2 \frac{a}{x^3}\\
f''\left(\sqrt{a/b}\right) &= 2 \frac{a}{(a/b) \times \sqrt{a/b}} = 2 \frac{b}{\sqrt{a/b}}
\end{align*}
\]
If both \(a\) and \(b\) are positive and we choose the positive square root of their ratio then this value is positive and we have a minimum.
when \(\delta\) is equal to
\[
\sqrt{2 \leftn_f \frac{f(x)}{f''(y)} \right \epsilon}
\]
Now these expressions provide a very accurate estimate of the optimal choice of \(\delta\) and the potential error in the approximation of the derivative that results from that choice. There are, however, a few teensyweensy little problems.The first is that these expressions depend on the relative rounding errors of \(x\) and \(f\). We can dismiss these problems out of hand since if we have no inkling as to how accurate \(x\) or \(f\) are then we clearly cannot possibly have any expectation whatsoever that we can accurately approximate the derivative.
The second, and slightly more worrying, is that the error depends upon the very value we are trying to approximate; the derivative of \(f\) at \(x\). Fortunately, we can recover the error to leading order in \(\epsilon\) by replacing it with the finite difference itself.
The third, and by far the most significant, is that both expressions depend upon the behaviour of the unknown second derivative of \(f\).
Unfortunately this is ever so slightly more difficult to weasel our way out of. By which I of course mean that in general it is entirely impossible to do so.
Given the circumstances, the best thing we can really do is to guess how the second derivative behaves. For purely pragmatic reasons we might assume that
\[
f''(y) = \frac{f(x)}{(x+1)^2}
\]
since this yields
\[
\delta = \sqrt{2 n_f \epsilon} \times (x+1)
\]
The advantage of having a \(\delta\) of this form is that it helps alleviate both rounding and cancellation errors; it is everywhere larger than \(\sqrt{2 n_f \epsilon} x\) and hence mitigates against rounding error for large \(x\) and it is nowhere smaller than \(\sqrt{2 n_f \epsilon}\) and hence mitigates against cancellation error for small \(x\).Substituting this into our error expression yields
\[
\frac{\sqrt{2 n_f \epsilon}}{x + 1} f(x) + \tfrac{1}{2} \frac{n_x x f(x)}{\left(x+1\right)^2} \epsilon + \left(\tfrac{1}{2} n_f + 1\right) f'(x) \epsilon
\]
Of course this estimate will be inaccurate if the second derivative differs significantly from our guess.This is little more than an irritation if our guess is significantly larger than the second derivative since the true error will be smaller than our estimate and we will still have a valid, if pessimistic, bound.
If our guess is significantly smaller, however, we’re in trouble since we shall in this case underestimate the error. Unfortunately there is little we can do about it.
One thing worth noting is that if the second derivative is very large at \(x\) then the derivative will be rapidly changing in its vicinity. The limiting case as the second derivative grows in magnitude is a discontinuity at which the derivative doesn’t exist.
If we have a very large second derivative, we can argue that the derivative is in some sense approaching nonexistence and that we should need to be aware of this and plan our calculations accordingly.
We have one final issue to address before implementing our algorithm; we have assumed that we can exactly represent both \(\delta\) and \(x+\delta\). Given that our expression for the optimal choice of \(\delta\) involves a floating point square root operation this is, in general, unlikely to be the case.
Fortunately we can easily find the closest floating point value to \(\delta\) for which our assumptions hold with
\[
\left(x + \sqrt{2 n_f \epsilon} \times (x+1)\right)  x
\]
Naturally this will have some effect on the error bounds, but since it will only be \(O(\epsilon \delta)\) it will not impact our approximation of them.Listing 1 provides the implementation of the forward finite difference.
Listing 1: ak.forwardDifference
ak.forwardDifference = function(f, e) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.forwardDifference'); } if(ak.nativeType(e)===ak.UNDEFINED_T) { e = Math.sqrt(2*ak.EPSILON); } if(ak.nativeType(e)!==ak.NUMBER_T) { throw new Error('nonnumeric epsilon passed to ak.forwardDifference'); } if(e<=0) { throw new Error('nonpositive epsilon passed to ak.forwardDifference'); } return function(x) { var d = e*(Math.abs(x)+1); var x1 = x + d; return (f(x1)f(x))/(x1x); }; };
Note that by default we assume that the function
f
introduces one relative rounding error, a reasonable assumption for the functions defined in Math
. The client can pass a second argument e
if it is known to introduce a greater error.In the body of the returned function we multiply
e
by \(x+1\) to yield our \(\delta\) and use the trick described above to find and use the closest floating point value to it.As an example, let us apply our forward difference to the exponential function with arguments from 10 to 10. We can therefore expect that \(n_f\) is equal to one and \(n_x\) to zero and hence that our approximation of the error is
\[
\frac{\sqrt{2\epsilon}}{x+1} f(x) + \tfrac{3}{2} f'(x) \epsilon
\]
Since the derivative of the exponential function is the exponential function itself, we can accurately calculate the true error by taking the absolute difference between it and the finite difference approximation.To compare our approximation of the error with the exact error we shall count the number of leading zeros after the decimal point of each, which we can do by negating the base 10 logarithm of their values.
Program 1 plots the leading zeros in the decimal fraction of our approximate error as a red line and in that of the true error as a black line, with larger values on the vertical axis thus implying smaller values for the errors.
Program 1: Forward Finite Difference Error



Our approximation clearly increasingly underestimates the error as the absolute value of \(x\) increases. This shouldn’t come as too much of a surprise since our assumption about the behaviour of the second derivative grows increasingly inaccurate as the magnitude of \(x\) increases for the exponential function.
Nevertheless, our approximation displays the correct overall trend and is nowhere catastrophically inaccurate, at least to my eye.
The question remains as to whether we can do any better.
Symmetric Finite Difference
Returning to Taylor’s theorem we can see that the term whose coefficient is a multiple of the second derivative is that of \(\delta^2\). This has the very useful property that it takes the same value for both \(+\delta\) and \(\delta\).
Derivation 3: The Symmetric Finite Difference
From Taylor's theorem we have
The symmetric finite difference is therefore given by
\[
\begin{align*}
f(x\delta) &= f(x)  \delta f'(x) + \tfrac{1}{2} \delta^2 f''(x)  \tfrac{1}{6} \delta^3 f'''(y_0)\\
f(x+\delta) &= f(x) + \delta f'(x) + \tfrac{1}{2} \delta^2 f''(x) + \tfrac{1}{6} \delta^3 f'''(y_1)
\end{align*}
\]
for some \(y_0\) between \(x\delta\) and \(x\) and \(y_1\) between \(x\) and \(x+\delta\).The symmetric finite difference is therefore given by
\[
\begin{align*}
&\frac{f(x+\delta)  f(x\delta)}{2\delta}\\
&\quad = \frac{2 \delta f'(x) + \tfrac{1}{6} \delta^3 f''(y_0) + \tfrac{1}{6} \delta^3 f'''(y_1)}{2\delta}\\
&\quad = f'(x) + \tfrac{1}{12} \delta^2 \left(f'''(y_0) + f'''(y_1)\right)
\end{align*}
\]
By the intermediate value theorem, for a continuous function \(f\) there must be a point \(x\) between \(x_0\) and \(x_1\) such that
\[
f(x) = \frac{f(x_0) + f(x_1)}{2}
\]
If the second derivative of our function is continuous then
\[
\frac{f(x+\delta)  f(x\delta)}{2\delta} = f'(x) + \tfrac{1}{6} \delta^2 f'''(y)
\]
for some \(y\) between \(x\delta\) and \(x+\delta\).
\[
\frac{f(x+\delta)  f(x\delta)}{2 \delta}
\]
differs from the derivative at \(x\) by
\[
\tfrac{1}{6} \delta^2 \times f'''(y)
\]
for some \(y\) between \(x\delta\) and \(x+\delta\) as shown in derivation 3.Now this is a rather impressive order of magnitude better in \(\delta\) than the forward finite difference considering that it involves no additional evaluations of \(f\).
That said, it is not at all uncommon that both the value of the function and its derivative are required, in which case the forward finite difference can get one of its function evaluations for free.
With a similar analysis to that we made for the forward finite difference, given in derivation 4, we find that the optimal choice of \(\delta\) must minimise
\[
\frac{n_f \epsilon}{2\delta} f(x) + \left(\tfrac{1}{2} n_f + 1 \right) f'(x) \epsilon + \tfrac{1}{2} n_x x f''(x) \epsilon + \tfrac{1}{6} \delta^2 f'''(y)
\]
Derivation 4: Symmetric Finite Difference Approximation Error
From Taylor's theorem we have
Making the same assumptions as before about rounding errors in both \(f\) and \(x\), the floating point result of the symmetric finite difference calculation is bounded by
Multiplying by \(\left(1 \pm \tfrac{1}{2} \epsilon\right)^2 = 1 \pm \epsilon + O\left(\epsilon^2\right)\) yields
\[
f(x+\delta) = f(x) + \delta f'(x) + \tfrac{1}{2} \delta^2 f''(x) + \tfrac{1}{6} \delta^3 f'''(y)
\]
for some \(y\) between \(x\) and \(x+\delta\).Making the same assumptions as before about rounding errors in both \(f\) and \(x\), the floating point result of the symmetric finite difference calculation is bounded by
\[
\begin{align*}
&\frac{\left[f\left(x + \tfrac{1}{2} n_x \epsilon x + \delta\right)\left(1 \pm \tfrac{1}{2} n_f \epsilon\right)  f\left(x + \tfrac{1}{2} n_x \epsilon x  \delta\right)\left(1 \pm \tfrac{1}{2} n_f \epsilon\right)\right]\left(1 \pm \tfrac{1}{2}\epsilon\right)}{2\delta}\left(1 \pm \tfrac{1}{2}\epsilon\right)\\
&\quad = \left[\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x + \delta\right)\left(1 \pm \tfrac{1}{2} n_f \epsilon\right)}{2\delta}  \frac{f\left(x + \tfrac{1}{2} n_x \epsilon x  \delta\right)\left(1 \pm \tfrac{1}{2} n_f \epsilon\right)}{2\delta}\right]\left(1 \pm \tfrac{1}{2}\epsilon\right)^2
\end{align*}
\]
Taking the terms in the square brackets again
\[
\begin{align*}
&\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x +\delta\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{2\delta}\\
&\quad = \frac{\left[
\begin{array}{l}
f(x)
+ \left(\tfrac{1}{2} n_x \epsilon x + \delta\right) f'(x)
+ \tfrac{1}{2} \left(\tfrac{1}{2} n_x \epsilon x + \delta\right)^2 f''(x)\\
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + \tfrac{1}{6} \left(\tfrac{1}{2} n_x \epsilon x + \delta\right)^3 f'''(y_1)
\end{array}
\right] \left(1 \pm \tfrac{1}{2} n_f \epsilon \right)}{2\delta}\\
&\quad = \frac{1 \pm \tfrac{1}{2} n_f \epsilon}{2\delta} f(x) + \frac{\tfrac{1}{2} n_x \epsilon x}{2\delta} f'(x) + \tfrac{1}{2} \left(1 \pm \tfrac{1}{2} n_f \epsilon\right) f'(x) + \tfrac{1}{4} (n_x \epsilon x + \delta) f''(x)\\
&\quad \quad + \tfrac{1}{12} \delta^2 f'''(y_1) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right) \\
&\\
&\frac{f\left(x + \tfrac{1}{2} n_x \epsilon x \delta\right) \left(1 \pm \tfrac{1}{2}n_f\epsilon\right)}{2\delta}\\
&\quad = \frac{\left[
\begin{array}{l}
f(x) + \left(\tfrac{1}{2} n_x \epsilon x  \delta\right) f'(x) + \tfrac{1}{2} \left(\tfrac{1}{2} n_x \epsilon x  \delta\right)^2 f''(x)\\
\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + \tfrac{1}{6} \left(\tfrac{1}{2} n_x \epsilon x  \delta\right)^3 f'''(y_0)
\end{array}
\right] \left(1 \pm \tfrac{1}{2} n_f \epsilon \right)}{2\delta}\\
&\quad = \frac{1 \pm \tfrac{1}{2} n_f \epsilon}{2\delta} f(x) + \frac{\tfrac{1}{2} n_x \epsilon x}{2\delta} f'(x)  \tfrac{1}{2} \left(1 \pm \tfrac{1}{2} n_f \epsilon\right) f'(x) + \tfrac{1}{4} (n_x \epsilon x + \delta) f''(x)\\
&\quad \quad  \tfrac{1}{12} \delta^2 f'''(y_0) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)
\end{align*}
\]
for a difference of
\[
\begin{align*}
&\pm \frac{n_f\epsilon}{2\delta} f(x) + \left(1 \pm \tfrac{1}{2} n_f \epsilon\right) f'(x) + \tfrac{1}{2} n_x x f''(x) \epsilon + \tfrac{1}{12} \delta^2 (f'''(y_0) + f'''(y_1))\\
&\quad + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)\\
&\quad = f'(x) \pm \frac{n_f \epsilon}{2\delta}f(x) \pm \tfrac{1}{2} n_f \epsilon f'(x) + \tfrac{1}{2} n_x x \epsilon f''(x) + \tfrac{1}{12} \delta^2 (f'''(y_0) + f'''(y_1))\\
&\quad \quad + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)
\end{align*}
\]
or, by the mean value theorem
\[
f'(x) \pm \frac{n_f \epsilon}{2\delta}f(x) \pm \tfrac{1}{2} n_f \epsilon f'(x) + \tfrac{1}{2} n_x x \epsilon f''(x) + \tfrac{1}{6} \delta^2 f'''(y) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)
\]
for some \(y\) between \(x\delta\) and \(x+\delta\).Multiplying by \(\left(1 \pm \tfrac{1}{2} \epsilon\right)^2 = 1 \pm \epsilon + O\left(\epsilon^2\right)\) yields
\[
f'(x) \pm \frac{n_f \epsilon}{2\delta}f(x) \pm \tfrac{1}{2} n_f \epsilon f'(x) + \tfrac{1}{2} n_x x \epsilon f''(x) + \tfrac{1}{6} \delta^2 f'''(y) \pm \epsilon f'(x) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)
\]
for a worst case absolute error of
\[
\frac{n_f \epsilon}{2\delta}f(x) + \left(\tfrac{1}{2} n_f + 1\right) f'(x) \epsilon + \tfrac{1}{2} n_x x f''(x) \epsilon + \tfrac{1}{6} \delta^2 f'''(y) + O\left(\frac{\epsilon^2}{\delta}\right) + O\left(\epsilon\delta\right)
\]
This time the quantity we wish to minimise is a function of \(\delta\) of the form
\[
f(x) = \frac{a}{x} + b x^2 + c
\]
which, as shown in derivation 5, given positive \(b\) is minimised by
\[
f\left(\left(\frac{a}{2b}\right)^\tfrac{1}{3}\right) = \left(\frac{27}{4}\right)^\tfrac{1}{3} a^\tfrac{2}{3} b^\tfrac{1}{3} + c
\]
To leading order in \(\epsilon\) and \(\delta\) the minimum relative error in the symmetric finite difference approximation to the derivative is therefore
\[
\left(\frac{9}{32}\right)^\frac{1}{3} (n_f f(x) \epsilon)^\frac{2}{3} f'''(y)^\frac{1}{3} + (\tfrac{1}{2}n_f + 1) \epsilon f'(x) + \tfrac{1}{2} n_x x f''(x) \epsilon
\]
Derivation 5: The Minimum Of a/x + bx^{2} + c
We find a turning point of \(f\)with
\[
\begin{align*}
f(x) &= \frac{a}{x} + bx^2 + c\\
f'(x) &= \frac{a}{x^2} + 2bx\\
f'\left(\left(\frac{a}{2b}\right)^\frac{1}{3}\right) &= a \left(\frac{2b}{a}\right)^\frac{2}{3} + 2b\left(\frac{a}{2b}\right)^\frac{1}{3}\\
&= (2b)^\frac{2}{3} \; a^\frac{1}{3} + (2b)^\frac{2}{3} \; a^\frac{1}{3} = 0\\
f\left(\left(\frac{a}{2b}\right)^\frac{1}{3}\right) &= a \left(\frac{2b}{a}\right)^\frac{1}{3} + b\left(\frac{a}{2b}\right)^\frac{2}{3} + c\\
&= \left(2^\frac{1}{3} \times 4^\frac{1}{3} \times \tfrac{1}{4}^\frac{1}{3} + \tfrac{1}{4}^\frac{1}{3}\right) \; a^\frac{2}{3} \; b^\frac{1}{3} + c\\
&= \left(2 \times \tfrac{1}{4}^\frac{1}{3} + \tfrac{1}{4}^\frac{1}{3}\right) \; a^\frac{2}{3} \; b^\frac{1}{3} + c\\
&= \left(\frac{27}{4}\right)^\frac{1}{3} \; a^\frac{2}{3} \; b^\frac{1}{3} + c
\end{align*}
\]
We have a second derivative of
\[
\begin{align*}
f''(x) &= 2\frac{a}{x^3} + 2b\\
f''\left(\left(\frac{a}{2b}\right)^\frac{1}{3}\right) &= 2 \frac{a}{a/(2b)} + 2b = 6b
\end{align*}
\]
so if \(b\) is positive we have a minimum.
\[
\left(\tfrac{3}{2} \leftn_f \frac{f(x)}{f'''(y)}\right \epsilon\right)^\frac{1}{3}
\]
Now this error is \(O\left(\epsilon^\frac{2}{3}\right)\) and we should therefore expect the symmetric finite difference to be significantly more accurate than the forward finite difference with its \(O\left(\epsilon^\frac{1}{2}\right)\) error.Unfortunately in order to achieve this we have compounded the problem of unknown quantities in the error and the choice of \(\delta\).
The optimal choice of \(\delta\) is now dependent on the properties of the third rather than the second derivative of \(f\) so we cannot use our previous argument that it may in some sense be reasonable to ignore it.
Furthermore, the resulting approximation error is dependant on both the second and the third derivatives of \(f\). We can deal with the first problem in the same way as we did before. In the name of pragmatism we assume that
\[
f'''(y) = \frac{f(x)}{(x+1)^3}
\]
giving a \(\delta\) of
\[
\left(\tfrac{3}{2} n_f \epsilon\right)^\frac{1}{3} \times (x+1)
\]
It’s a little more difficult to justify a guess about the form of the second derivative since it plays no part in the choice of \(\delta\).We could arbitrarily decide that it has a similar form to that we chose for it during our analysis of the forward finite difference
\[
f''(x) = \frac{f(x)}{(x+1)^2}
\]
but this strikes me as somewhat inadequate since it is not consistent with our assumed behaviour of the third derivative. Instead I should prefer something that satisfies
\[
\frac{\mathrm{d}}{\mathrm{d}x}f''(x) = \frac{f(x)}{(x+1)^3}
\]
since this is approximately consistent with our guess.Derivation 6 shows that we should choose
\[
f''(x) = 0.43016 \frac{f(x)}{(x+1)^2}  0.32472 \frac{f'(x)}{x+1}
\]
for positive \(x\)
\[
f''(x) = f(x)  f'(x)
\]
for \(x\) equal to zero and
\[
f''(x) = 3.07960 \frac{f(x)}{(x+1)^2}  2.32472 \frac{f'(x)}{x+1}
\]
for negative \(x\), with terms given to 5 decimal places.
Derivation 6: A Consistent Second Derivative
Consider first
The second term has the required form so if we can find a way to cancel out the first we shall have succeeded. Adding a second term whose derivative includes a term of the same form might just do the trick.
\[
\begin{align*}
f''(x) &= a \frac{f(x)}{(x+1)^2}\\
\frac{\mathrm{d}}{\mathrm{d}x}f''(x) &= a \frac{f'(x)}{(x+1)^2} 2 \; \mathrm{sign}(x) \; a \; \frac{f(x)}{(x+1)^3}
\end{align*}
\]
where \(a\) is a constant and \(\mathrm{sign}(x)\) is the sign of \(x\). For simplicity’s sake, we shall declare the derivative of the absolute value of \(x\) at 0 to be 0 rather than undefined.The second term has the required form so if we can find a way to cancel out the first we shall have succeeded. Adding a second term whose derivative includes a term of the same form might just do the trick.
\[
\begin{align*}
f''(x) &= a \frac{f(x)}{(x+1)^2} + b \frac{f'(x)}{x+1}\\
\frac{\mathrm{d}}{\mathrm{d}x}f''(x) &= a\frac{f'(x)}{(x+1)^2}  2 \; \mathrm{sign}(x) \; a \; \frac{f(x)}{(x+1)^3} + b \; \frac{f''(x)}{x+1}  \mathrm{sign}(x) \; b \; \frac{f'(x)}{(x+1)^2}\\
&= (a  \mathrm{sign}(x) \; b) \frac{f'(x)}{(x+1)^2}  2 \; \mathrm{sign}(x) \; a \; \frac{f(x)}{(x+1)^3} + \frac{b}{x+1} f''(x)\\
&= \left(a  \mathrm{sign}(x) \; b + b^2\right) \frac{f'(x)}{(x+1)^2} + a (b  2 \; \mathrm{sign}(x)) \frac{f(x)}{(x+1)^3}
\end{align*}
\]
We therefore require
\[
\begin{align*}
a  \mathrm{sign}(x) \; b + b^2 &= 0\\
a (b  2 \; \mathrm{sign}(x)) &= 1
\end{align*}
\]
Solving for \(a\) and \(b\) yields the serendipitously unique result.
Substituting these guessed derivatives back into our error formula yields an estimated error of
\[
\left[\left(\frac{9}{32}\right)^\frac{1}{3}\frac{(n_f\epsilon)^\frac{2}{3}}{x+1} + 0.21508\frac{n_x x\epsilon}{(x+1)^2}\right] \times f(x) + \left[\tfrac{1}{2}n_f \epsilon + 0.16236 \frac{n_x x\epsilon}{x+1} + \epsilon\right] \times f'(x)
\]
for positive \(x\)
\[
\left[\left(\frac{9}{32}\right)^\frac{1}{3}\frac{(n_f\epsilon)^\frac{2}{3}}{x+1} + \tfrac{1}{2}n_x x\epsilon\right] \times f(x) + \left[\tfrac{1}{2}n_f\epsilon + \tfrac{1}{2}n_x x\epsilon + \epsilon\right] \times f'(x)
\]
for \(x\) equal to zero and
\[
\left[\left(\frac{9}{32}\right)^\frac{1}{3}\frac{(n_f\epsilon)^\frac{2}{3}}{x+1} + 1.53980\frac{n_x x\epsilon}{(x+1)^2}\right] \times f(x) + \left[\tfrac{1}{2}n_f \epsilon + 1.16236 \frac{n_x x\epsilon}{x+1} + \epsilon\right] \times f'(x)
\]
for negative \(x\).Listing 2 gives the implementation of the symmetric finite difference which can be found, together with the forward finite difference, in FiniteDifference.js.
Listing 2: ak.symmetricDifference
ak.symmetricDifference = function(f, e) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('nonfunction passed to ak.symmetricDifference'); } if(ak.nativeType(e)===ak.UNDEFINED_T) { e = Math.pow(3*ak.EPSILON/2, 1/3); } if(ak.nativeType(e)!==ak.NUMBER_T) { throw new Error('nonnumeric epsilon passed to ak.symmetricDifference'); } if(e<=0) { throw new Error('nonpositive epsilon passed to ak.symmetricDifference'); } return function(x) { var d = e*(Math.abs(x)+1); var x0 = x  d; var x1 = x + d; return (f(x1)f(x0))/(x1x0); }; };
Note that, as with
ak.forwardDifference
, we do not use \(2\delta\) directly but rather use the difference between the closest floating point numbers to \(x+\delta\) and \(x\delta\).Following the forward finite difference example, we'll apply the symmetric difference to the exponential function with arguments from 10 to 10 and so can assume that \(n_f\) is equal to one and \(n_x\) is equal to zero yielding an error approximation of
\[
\left(\frac{9}{32}\right)^\frac{1}{3}\frac{\epsilon^\frac{2}{3}}{x+1} f(x) + \tfrac{3}{2} f'(x) \epsilon
\]
for all \(x\).Program 2 plots the negation of the base 10 logarithm of our approximation of the error in this numerical approximation of the derivative of the exponential function as a red line and the true error as a black line with larger values on the vertical axis again implying smaller errors.
Program 2: Symmetric Finite Difference Error



Clearly the error in the symmetric finite difference is smaller than that in the forward finite difference, although it appears that the accuracy of our approximation of that error isn’t quite so good. That said, the average ratios between the number of decimal places in the true error and the approximate error of the two algorithms are not so very different; 1.21 for the forward finite difference and 1.24 for the symmetric finite difference.
Not too shabby if you ask me.
But the question still remains as to whether we can do any better.
Higher Order Finite Differences
As it happens we can, although I doubt that this comes as much of a surprise. We do so by recognising that the advantage of the symmetric finite difference stems from the fact that terms dependant upon the second derivative largely cancel out. If we can arrange for higher derivatives to cancel out we should be able to improve accuracy still further.Unfortunately, doing so makes a full error analysis even more tedious than those we have already suffered through. I therefore propose, and I suspect that this will be to your very great relief, that we revert to our original handwaving analysis.
In doing so our choice of \(\delta\) shall not be significantly impacted, but we shall have to content ourselves with a less satisfactory estimate of the error in the approximation.
We shall start by returning to Taylor’s theorem again.
\[
f(x+\delta) = f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \tfrac{1}{6}\delta^3 f'''(x) + \tfrac{1}{24}\delta^4 \times f^{(4)}(x) + O\left(\delta^5\right)
\]
From this we find that the numerator of the symmetric finite difference is
\[
f(x+\delta)  f(x\delta) = 2\delta \times f'(x) + \tfrac{1}{3}\delta^3 f'''(x) + O\left(\delta^5\right)
\]
Performing the same calculation with \(2\delta\) yields
\[
f(x+2\delta)  f(x2\delta) = 4\delta \times f'(x) + \tfrac{8}{3}\delta^3 f'''(x) + O\left(\delta^5\right)
\]
With a little algebra it is simple to show that
\[
f'(x) = \frac{8\left(f(x+\delta)f(x\delta)\right)  \left(f(x+2\delta)f(x2\delta)\right)}{12\delta} + O\left(\delta^4\right)
\]
Assuming that each evaluation of \(f\) introduces a single rounding error and that the arguments are in all cases exact this mean that the optimal choice of \(\delta\) is of order \(\epsilon^\frac{1}{5}\) as shown in derivation 7.
Derivation 7: The Optimal Choice Of δ
Noting that multiplying by a power of 2 never introduces a rounding error, the floating point approximation to out latest finite difference is bounded by
\[
\frac{\left[\begin{array}{}8\left(f(x+\delta)\;\left(1\pm\tfrac{1}{2}\epsilon\right)  f(x\delta)\;\left(1\pm\tfrac{1}{2}\epsilon\right)\right)\;\left(1\pm\tfrac{1}{2}\epsilon\right)\\
\quad \quad  \left(f(x+2\delta)\;\left(1\pm\tfrac{1}{2}\epsilon\right)  f(x2\delta)\;\left(1\pm\tfrac{1}{2}\epsilon\right)\right)\;\left(1\pm\tfrac{1}{2}\epsilon\right)\end{array}\right]\;\left(1\pm\tfrac{1}{2}\epsilon\right)}{12\delta}\;\left(1\pm\tfrac{1}{2}\epsilon\right)^2
\]
which simplifies to
\[
\begin{align*}
&\frac{8\left(f(x+\delta)f(x\delta)\right)  \left(f(x+2\delta)f(x2\delta)\right) + O(\epsilon)}{12\delta}\\
&\quad = f'(x) + O\left(\delta^4\right) + O\left(\frac{\epsilon}{\delta}\right)
\end{align*}
\]
The order of the error is consequently minimised when
\[
\begin{align*}
O\left(\delta^4\right) &= O\left(\frac{\epsilon}{\delta}\right)\\
O(\delta) &= O\left(\epsilon^\frac{1}{5}\right)
\end{align*}
\]
By the same argument from pragmatism that we have so far used we should therefore choose
\[
\delta = (x+1)\times\epsilon^\frac{1}{5}
\]
to yield an \(O\left(\epsilon^\frac{4}{5}\right)\) error.With sufficient patience, we might continue in this manner, creating ever more accurate approximations at the expense of increased calls to \(f\).
Unfortunately, not only is this extremely tiresome, but we cannot escape the fact that the error in such approximations shall always depend upon the behaviour of unknown higher order derivatives.
There must be a better way...
References
[1] You're Going To Have To Think! Why [Insert Algorithm Here] Won't Cure Your Calculus Blues, www.thusspakeak.com, 2013.[2] Knuth, D. Surreal Numbers; How Two ExStudents Turned on to Pure Mathematics and Found Total Happiness, Addison Wesley, 1974.
[3] Robinson, A. Nonstandard Analysis (New Ed.), Princeton University Press, 1996.
[4] You're Going To Have To Think! Why [Insert Technique Here] Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.
Based upon an article I wrote for ACCU's Overload magazine.
Leave a comment