Having discussed the various ways in which we can represent numbers on computers

To illustrate the development and analysis of algorithms for mathematical computation I shall continue to use the example of numerical differentiation with which we seek to approximate as accurately as possible the derivative of a function, primarily because it is the simplest non-trivial numerical calculation that I know of.

Before we do so, it will be useful to discuss exactly what we mean by differentiation and the tools that we shall exploit in the development and analysis of our algorithms.

Credit for its discovery is often given to the 17

The central idea of the differential calculus was that of the infinitesimals; hypothetical entities that have a smaller absolute value than any real number other than zero.

To see how infinitesimals are used to calculate the slope of a graph, first consider the slope of a straight line connecting two points on it, say \((x_0,y_0)\) and \((x_1,y_1)\).

Instead we shall set \(x_0\) equal to \(x\) and \(x_1\) equal to \(x\) plus some infinitesimal quantity \(\mathrm{d}x\). Since the latter is closer to \(x\) than any real number, it should yield a result closer to the slope at \(x\) than any real number. The actual slope can therefore be recovered by discarding any infinitesimals in that result.

For example, consider the slope of the function \(x^2\).

We define a function to be continuous if an infinitesimal change in its argument yields an infinitesimal change of the same or higher order in its value. If the same can be said of its derivative, we say that the function is smooth. For a smooth function f we therefore have \(f(x+\mathrm{d}x)=f(x)+\mathrm{d}f\), where \(\mathrm{d}f\) is an infinitesimal of at least the same order as \(\mathrm{d}x\). Given this we can obtain Leibniz’s notation

Treating \(\frac{\mathrm{d}}{\mathrm{d}x}\) as an operator as above, we recover the notation for repeated differentiation. We square it if we differentiate twice, cube it if thrice, and so forth, to obtain

Given constant \(c\), the exponential function \(e^x\), its inverse \(\ln x\), the trigonometric functions \(\sin x\) and \(\cos x\), and functions \(f\) and \(g\) such that \(y=f(x)\) and \(z=g(y)\) some further useful identities are

Given any real number, we can halve it to give us a number that is closer to zero. Halving again yields a number closer still, as does halving a third time. Repeating this process over and over again yields a sequence of numbers that shrink arbitrarily close to zero, so where exactly are the infinitesimals to be found?

This lack of rigour did not escape Newton and Leibniz's contemporaries; the philosopher George Berkeley

Despite these objections, many mathematicians were unwilling to dismiss the differential calculus due to its incredible usefulness.

For example, consider the equation governing the straight line distance \(s\) travelled in a time \(t\) by a frictionless body from a standing start under a constant acceleration \(a\)

His idea was to define the derivative as the limit of a sequence of ever more accurate approximations to it. Specifically, that

For example, consider again the derivative of \(x^2\)

Now this is how we defined the derivative in the first post in this series but, whilst it's certainly a step in the right direction, it’s not yet quite enough. What exactly do we mean when we say \(\Delta x\) tends to zero? Do we repeatedly halve it? Do we start with a positive value less than 1 and square it, then cube it, then raise it to the 4

Cauchy’s great achievement was to rigorously define the limit of a sequence and, in doing so, discover analysis; the mathematics of limits.

We say that the limit of a function \(f(x)\) as \(x\) tends to \(c\) is equal to \(l\) if and only if for any given positive \(\epsilon\), no matter how small, we can find a positive \(\delta\) such that the absolute difference between \(f(x)\) and \(l\) is always less than \(\epsilon\) if the absolute difference between \(x\) and \(c\) is less than \(\delta\). In mathematical notation we write this as

With this definition we are not dependent upon the manner in which we approach a limit, only upon the size of its terms.

We now define the derivative \(\frac{\mathrm{d}f(x)}{\mathrm{d}x}\) with

For example, the derivative of \(x^2\) is \(2x\) since for all \(x\)

As we did with infinitesimals, we can derive the various identities of the differential calculus with this rigorous definition of a limit. Derivation 2 provides a proof of the product rule in these terms, for example.

That this proof is so much longer and more difficult than the one using infinitesimals is something that mathematicians have had to learn to live with; the price of rigour is generally paid in ink.

The rigour that Cauchy brought to the differential calculus was part of a great revolution in mathematical thinking that took place during the 19

Their approach was to embed something akin to Cauchy’s limits into the very idea of a number. Loosely speaking, Robinson defined a non-standard number as an infinite sequence of real numbers with arithmetic operations and functions applied on an element by element basis. For example

Now consider the standard number whose elements are a strictly decreasing sequence of positive numbers. For example

So here we have a genuine, bona fide infinitesimal with all of the properties of those in Newton and Leibniz’s differential calculus!

If a non-standard number \(z\) can be represented by a real number plus an infinitesimal non-standard number, we call that real number the standard part of \(z\), or \(\mathrm{st}(z)\). We can thus define the derivative of a function \(f\) as

For example, let’s consider the derivative of \(x^2\) a third time.

The problem with this loose definition of the non-standard numbers is that the vast majority of them are comprised of oscillating or random sequences which cannot meaningfully be ordered by the less than and greater than operators. Fortunately, the formal definition addresses this deficiency by demonstrating that it is possible to define rules by which we can do so, albeit ones which we cannot ever hope to write down in full.

Very roughly speaking, these rules unambiguously equate oscillating/random sequences with convergent or divergent sequences. With

Robinson proved that the non-standard numbers do not lead to any logical contradictions other than those, if any, that are consequences of the standard reals and that his infinitesimals have exactly the properties of Newton’s.

We can therefore dispense with the full limit notation and simply go back to using our original infinitesimals.

Now that we know exactly how the differential calculus is defined we are nearly ready to begin analysing numerical differentiation algorithms.

Before we can start, however, there is one last piece of mathematics that we shall need.

It demonstrates how a sufficiently differentiable function can be approximated within some region to within some error by a polynomial. Specifically, it shows that

By sufficiently differentiable we mean that all of the derivatives of \(f\) up to the \(n+1^{th}\) must exist, and that all of the derivatives of \(f\) up to the \(n^{th}\) must be continuous, between \(x\) and \(x+\delta\), inclusive of the bounds in the latter case but not in the former.

In fact, the error term is

If we put no limit upon \(n\) we recover the Taylor series of a function \(f\) in the region of \(x\)

Note that this identity holds if and only if the sum has a well defined limit under Cauchy’s definition.

The reason why Taylor’s theorem is of such utility in numerical computing is that it provides us with an explicit formula for approximating a function with a polynomial and bounds on the error that results from doing so.

Such polynomials are very easy to mathematically and numerically manipulate and thus can dramatically simplify many mathematical computations; they are used to very great effect in Physics, for example.

Furthermore, it gives us an explicit formula for the error in the value of a function that results from an error in its argument, such as might occur through floating point rounding.

So, now we have a thorough grasp of the differential calculus and are equipped with the numerical power tool of Taylor’s theorem, we are ready to scrutinise some of the numerical algorithms for approximating the derivatives of functions.

I’m afraid we shall have to wait until next time before we do so, however.

[2]

[3]

[4]

[5]

[6]

[7]

[8] Berkeley, G.

[9] Robinson, A.

[10] Knuth, D.

^{[1]-[5]}and come to the very sensible conclusion that, perhaps in combination with interval arithmetic^{[6]}, IEEE 754 floating point^{[7]}is the best of the lot, we're ready to explore how best to wield it.To illustrate the development and analysis of algorithms for mathematical computation I shall continue to use the example of numerical differentiation with which we seek to approximate as accurately as possible the derivative of a function, primarily because it is the simplest non-trivial numerical calculation that I know of.

Before we do so, it will be useful to discuss exactly what we mean by differentiation and the tools that we shall exploit in the development and analysis of our algorithms.

### On The Differential Calculus

The aim of the differential calculus is the calculation of the instantaneous rate of change of functions or, equivalently, the slopes of their graphs at any given point.Credit for its discovery is often given to the 17

^{th}century mathematicians Newton and Leibniz despite the fact that many of the concepts were discovered centuries before by the Greeks, the Indians and the Persians. That said, the contributions of Newton and Leibniz were hardly insignificant and their notations are still used today; \(\overset{\cdot}{y}\) from Newton and \(\frac{\mathrm{d}y}{\mathrm{d}x}\), or \(\frac{\mathrm{d}}{\mathrm{d}x}y\), from Leibniz.The central idea of the differential calculus was that of the infinitesimals; hypothetical entities that have a smaller absolute value than any real number other than zero.

To see how infinitesimals are used to calculate the slope of a graph, first consider the slope of a straight line connecting two points on it, say \((x_0,y_0)\) and \((x_1,y_1)\).

\[
\frac{y_1-y_0}{x_1-x_0}
\]

To compute its slope at a point x we should ideally want to set both \(x_0\) and \(x_1\) equal to \(x\) but unfortunately this yields the meaningless quantity \(\frac{0}{0}\).Instead we shall set \(x_0\) equal to \(x\) and \(x_1\) equal to \(x\) plus some infinitesimal quantity \(\mathrm{d}x\). Since the latter is closer to \(x\) than any real number, it should yield a result closer to the slope at \(x\) than any real number. The actual slope can therefore be recovered by discarding any infinitesimals in that result.

For example, consider the slope of the function \(x^2\).

\[
\begin{align*}
\frac{(x+\mathrm{d}x)^2-x^2}{(x+\mathrm{d}x)-x}
&= \frac{x^2 + 2 \times x \times \mathrm{d}x + \mathrm{d}x^2 - x^2}{x + \mathrm{d}x - x}\\
&= \frac{2 \times x \times \mathrm{d}x + \mathrm{d}x^2}{\mathrm{d}x}\\
&= 2 \times x + \mathrm{d}x\\
&\approx 2 \times x
\end{align*}
\]

where the wavy equals sign means approximately equal to in the sense that no real number is closer to the result.We define a function to be continuous if an infinitesimal change in its argument yields an infinitesimal change of the same or higher order in its value. If the same can be said of its derivative, we say that the function is smooth. For a smooth function f we therefore have \(f(x+\mathrm{d}x)=f(x)+\mathrm{d}f\), where \(\mathrm{d}f\) is an infinitesimal of at least the same order as \(\mathrm{d}x\). Given this we can obtain Leibniz’s notation

\[
\frac{f(x+\mathrm{d}x) - f(x)}{(x+\mathrm{d}x)-x} =
\frac{f(x) + \mathrm{d}f - f(x)}{x+\mathrm{d}x-x} =
\frac{\mathrm{d}f}{\mathrm{d}x} =
\frac{\mathrm{d}}{\mathrm{d}x}f
\]

That we require the function to be smooth rather than simply continuous might come as a bit of a surprise, but stems from the fact that a function does not have a well defined value at a discontinuity.Treating \(\frac{\mathrm{d}}{\mathrm{d}x}\) as an operator as above, we recover the notation for repeated differentiation. We square it if we differentiate twice, cube it if thrice, and so forth, to obtain

\[
\begin{align*}
2^{nd} \mathrm{derivative:} & \left(\frac{\mathrm{d}}{\mathrm{d}x}\right)^2 f = \frac{\mathrm{d}^2 f}{\mathrm{d}x^2}\\
3^{rd} \mathrm{derivative:} & \left(\frac{\mathrm{d}}{\mathrm{d}x}\right)^3 f = \frac{\mathrm{d}^3 f}{\mathrm{d}x^3}\\
n^{th} \mathrm{derivative:} & \left(\frac{\mathrm{d}}{\mathrm{d}x}\right)^n f = \frac{\mathrm{d}^n f}{\mathrm{d}x^n}
\end{align*}
\]

We can recover the various identities of the differential calculus using infinitesimals. In derivation 1, for example, we prove the product rule for the derivative of the product of two functions \(f\) and \(g\)
\[
\frac{\mathrm{d}(f \times g)}{\mathrm{d}x} = f \times \frac{\mathrm{d}g}{\mathrm{d}x} + \frac{\mathrm{d}f}{\mathrm{d}x} \times g
\]

Derivation 1: Proving The Product Rule With Infinitesimals

Given smooth, continuous functions \(f\) and \(g\)

\[
\begin{align*}
\frac{\mathrm{d}(f \times g)}{\mathrm{d}x}
&= \frac{f(x+\mathrm{d}x) \times g(x+\mathrm{d}x) - f(x) \times g(x)}{(x + \mathrm{d}x) - x}\\
\\
&= \frac{(f(x)+\mathrm{d}f(x)) \times (g(x)+\mathrm{d}g(x)) - f(x) \times g(x)}{\mathrm{d}x}\\
\\
&= \frac{f(x) \times g(x) + f(x) \times \mathrm{d}g(x) + \mathrm{d}f(x) \times g(x) + \mathrm{d}f(x) \times
\mathrm{d}g(x) - f(x) \times g(x)}{\mathrm{d}x}\\
\\
&= \frac{f(x) \times \mathrm{d}g(x) + \mathrm{d}f(x) \times g(x) + \mathrm{d}f(x) \times \mathrm{d}g(x)}{\mathrm{d}x}\\
\\
&= f(x) \times \frac{\mathrm{d}g(x)}{\mathrm{d}x} + \frac{\mathrm{d}f(x)}{\mathrm{d}x} \times g(x) + \frac{\mathrm{d}f(x) \times \mathrm{d}g(x)}{\mathrm{d}x}\\
\\
& \approx f(x) \times \frac{\mathrm{d}g(x)}{\mathrm{d}x} + \frac{\mathrm{d}f(x)}{\mathrm{d}x} \times g(x)\\
\\
&= f \times \frac{\mathrm{d}g}{\mathrm{d}x} + \frac{\mathrm{d}f}{\mathrm{d}x} \times g
\end{align*}
\]

Given constant \(c\), the exponential function \(e^x\), its inverse \(\ln x\), the trigonometric functions \(\sin x\) and \(\cos x\), and functions \(f\) and \(g\) such that \(y=f(x)\) and \(z=g(y)\) some further useful identities are

\[
\begin{align*}
\frac{\mathrm{d}c}{\mathrm{d}x} &= 0 & \frac{\mathrm{d}x^n}{\mathrm{d}x} &= nx^{n-1}\\
\\
\frac{\mathrm{d}e^x}{\mathrm{d}x} &= e^x & \frac{\mathrm{d}\ln x}{\mathrm{d}x} &= \frac{1}{x}\\
\\
\frac{\mathrm{d}\sin x}{\mathrm{d}x} &= \cos x & \frac{\mathrm{d}\cos x}{\mathrm{d}x} &= -\sin x\\
\\
\frac{\mathrm{d}z}{\mathrm{d}x} &= \frac{\mathrm{d}z}{\mathrm{d}y} \times \frac{\mathrm{d}y}{\mathrm{d}x} & \frac{\mathrm{d}x}{\mathrm{d}y} &= 1 \bigg/ \frac{\mathrm{d}y}{\mathrm{d}x}
\end{align*}
\]

Whilst infinitesimals provide a reasonably intuitive framework for the differential calculus it is not a particularly rigorous one. What exactly does it mean to say that an infinitesimal is smaller in magnitude than any real number other than zero?Given any real number, we can halve it to give us a number that is closer to zero. Halving again yields a number closer still, as does halving a third time. Repeating this process over and over again yields a sequence of numbers that shrink arbitrarily close to zero, so where exactly are the infinitesimals to be found?

This lack of rigour did not escape Newton and Leibniz's contemporaries; the philosopher George Berkeley

^{[8]}, for example, criticised the calculus for its “fallacious way of proceeding to a certain Point on the Supposition of an Increment, and then at once shifting your Supposition to that of no Increment” and derided the infinitesimals as the “ghosts of departed quantities”.Despite these objections, many mathematicians were unwilling to dismiss the differential calculus due to its incredible usefulness.

For example, consider the equation governing the straight line distance \(s\) travelled in a time \(t\) by a frictionless body from a standing start under a constant acceleration \(a\)

\[
s = \tfrac{1}{2} a t^2
\]

If we take the derivative of this with respect to time, we recover the equation governing the speed \(v\) of that body after the time \(t\)
\[
v = \frac{\mathrm{d}s}{\mathrm{d}t} = at
\]

That equations of motion such as these could be experimentally verified was rather strong evidence that the differential calculus was valid. It was not until some 150 years later that this was conclusively demonstrated, however.
### On Analysis

It was Cauchy who made the great leap forward in setting the differential calculus on a secure foundation. He did it not by giving the infinitesimals a rigorous definition but by doing away with them quite entirely.His idea was to define the derivative as the limit of a sequence of ever more accurate approximations to it. Specifically, that

\[
\frac{\mathrm{d}f(x)}{\mathrm{d}x} \; \mathrm{is\,the\,limit\,of} \; \frac{f(x + \Delta x)}{\Delta x} \; \mathrm{as} \; \Delta x \; \mathrm{tends\,to} \; 0
\]

or in conventional notation
\[
\frac{\mathrm{d}f(x)}{\mathrm{d}x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x)}{\Delta x}
\]

where lim stands for the limit of the expression that follows it.For example, consider again the derivative of \(x^2\)

\[
\begin{align*}
\frac{(x + \Delta x)^2 - x^2}{(x + \Delta x) - x} &= \frac{x^2 + 2 \times x \times \Delta x + \Delta x^2 - x^2}{\Delta x}\\
&= \frac{2 \times x \times \Delta x + \Delta x^2}{\Delta x}\\
&= 2x + \Delta x \underset{\Delta x \to 0}=2x
\end{align*}
\]

where the final equals sign means *equals as \(\Delta x\) tends to zero*.Now this is how we defined the derivative in the first post in this series but, whilst it's certainly a step in the right direction, it’s not yet quite enough. What exactly do we mean when we say \(\Delta x\) tends to zero? Do we repeatedly halve it? Do we start with a positive value less than 1 and square it, then cube it, then raise it to the 4

^{th}power and so on? Does it actually*matter*?Cauchy’s great achievement was to rigorously define the limit of a sequence and, in doing so, discover analysis; the mathematics of limits.

We say that the limit of a function \(f(x)\) as \(x\) tends to \(c\) is equal to \(l\) if and only if for any given positive \(\epsilon\), no matter how small, we can find a positive \(\delta\) such that the absolute difference between \(f(x)\) and \(l\) is always less than \(\epsilon\) if the absolute difference between \(x\) and \(c\) is less than \(\delta\). In mathematical notation we write this as

\[
\forall \epsilon > 0 \;\; \exists \delta > 0 \;\; (\left|x-c\right|<\delta \implies \left|f(x)-l\right|<\epsilon)
\]

where \(\forall\) is the universal quantifier and should be read as *for all*, \(\exists\) is the existential quantifier and should be read as*there exists*, \(\implies\) means*implies*and the vertical bars denote the absolute value of the expression between them.With this definition we are not dependent upon the manner in which we approach a limit, only upon the size of its terms.

We now define the derivative \(\frac{\mathrm{d}f(x)}{\mathrm{d}x}\) with

\[
\forall \epsilon > 0 \;\; \exists \delta > 0 \;\; \left(\left|\Delta x\right|<\delta \implies \left|\frac{f(x + \Delta x)}{\Delta x} - \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right|<\epsilon\right)
\]

If such a limit exists, we say that the function is differentiable at \(x\).For example, the derivative of \(x^2\) is \(2x\) since for all \(x\)

\[
\left| \Delta x \right| < \epsilon \implies \left| \frac{(x+\Delta x)^2 - x^2}{\Delta x} - 2x \right| = \left| 2x + \Delta x - 2x \right| = \left| \Delta x \right| < \epsilon
\]

So in this case we choose \(\delta = \epsilon\).As we did with infinitesimals, we can derive the various identities of the differential calculus with this rigorous definition of a limit. Derivation 2 provides a proof of the product rule in these terms, for example.

Derivation 2: Proving The Product Rule With Limits

Given a differentiable function \(f\) we have

Now, given a second differentiable function \(g\), we have

\[
\forall \epsilon_f > 0 \;\; \exists \delta > 0 \;\; \left(\left|\Delta x\right| < \delta \implies \left|\frac{f(x + \Delta x) - f(x)}{\Delta x} - \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right| < \epsilon_f\right)\\
\\
\begin{align*}
\left|\Delta x\right| < \delta &\implies \left|f(x+\Delta x)-f(x)-\Delta x \times \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right| < \left|\Delta x\right| \times \epsilon_f\\
\\
&\implies f(x+\Delta x) - \left(f(x)+\Delta x \times \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \in \pm \left|\Delta x\right| \times \epsilon_f\\
\\
&\implies f(x+\Delta x) \in \left(f(x)+\Delta x \times \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \pm \left|\Delta x\right| \times \epsilon_f
\end{align*}
\]

where \(\in\) means *within*.Now, given a second differentiable function \(g\), we have

\[
\begin{align*}
&f(x+\Delta x) \times g(x + \Delta x)\\
&\quad\in \left(\left(f(x)+\Delta x \times \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \pm \left|\Delta x\right| \times \epsilon_f\right)\\
&\quad\quad \times \left(\left(g(x)+\Delta x \times \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right) \pm \left|\Delta x\right| \times \epsilon_g\right)\\
&\\
&\quad\in \left(f(x)+\Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \times \left(g(x)+\Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right)\\
&\quad\quad \pm \left(
\left(f(x)+\Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \left|\Delta x\right| \epsilon_g
+\left(g(x)+\Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right) \left|\Delta x\right| \epsilon_f
+\Delta x^2 \epsilon_f \epsilon_g\right)\\
&\\
&= f(x) \times g(x) + f(x) \Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x} + g(x) \Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x} + \Delta x^2 \frac{\mathrm{d}f(x)}{\mathrm{d}x} \frac{\mathrm{d}g(x)}{\mathrm{d}x}\\
&\quad\quad \pm
\left(
\left(f(x)+\Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \left|\Delta x\right| \epsilon_g
+\left(g(x)+\Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right) \left|\Delta x\right| \epsilon_f
+\Delta x^2 \epsilon_f \epsilon_g
\right)
\end{align*}
\]

Rearranging yields
\[
\begin{align*}
&\frac{f(x+\Delta x) \times g(x + \Delta x) - f(x) \times g(x)}{\Delta x} - \left(f(x) \frac{\mathrm{d}g(x)}{\mathrm{d}x} + g(x) \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right)\\
&\\
&\quad \in \Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x} \frac{\mathrm{d}g(x)}{\mathrm{d}x}
\pm
\left(
\left(f(x)+\Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \epsilon_g
+\left(g(x)+\Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right) \epsilon_f
+\Delta x \epsilon_f \epsilon_g
\right)
\end{align*}
\]

or
\[
\begin{align*}
&\left|\frac{f(x+\Delta x) \times g(x + \Delta x) - f(x) \times g(x)}{\Delta x} - \left(f(x) \frac{\mathrm{d}g(x)}{\mathrm{d}x} + g(x) \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right)\right|\\
&\\
&\quad < \left| \Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x} \frac{\mathrm{d}g(x)}{\mathrm{d}x}
+
\left(
\left(f(x)+\Delta x \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right) \epsilon_g
+\left(g(x)+\Delta x \frac{\mathrm{d}g(x)}{\mathrm{d}x}\right) \epsilon_f
+\Delta x \epsilon_f \epsilon_g
\right)
\right|
\end{align*}
\]

We can choose \(\delta\) so that \(\left|\Delta x\right|\) and, since \(f\) and \(g\) are differentiable,
\(\epsilon_f\) and \(\epsilon_g\) are all as small as we like so it follows that
\[
\begin{align*}
&\forall \epsilon > 0 \;\; \exists \delta > 0\\
&\quad \left(\left|\Delta x\right| < \delta \implies \left|\frac{f(x+\Delta x) \times g(x + \Delta x) - f(x) \times g(x)}{\Delta x} - \left(f(x) \frac{\mathrm{d}g(x)}{\mathrm{d}x} + g(x) \frac{\mathrm{d}f(x)}{\mathrm{d}x}\right)\right| < \epsilon\right)
\end{align*}
\]

and therefore, by definition, that
\[
\frac{\mathrm{d}\left(f\left(x\right) \times g\left(x\right)\right)}{\mathrm{d}x} = f(x) \times \frac{\mathrm{d}g(x)}{\mathrm{d}x} + g(x) \times \frac{\mathrm{d}f(x)}{\mathrm{d}x}
\]

or
\[
\frac{\mathrm{d}\left(f \times g\right)}{\mathrm{d}x} = f \times \frac{\mathrm{d}g}{\mathrm{d}x} + \frac{\mathrm{d}f}{\mathrm{d}x} \times g
\]

Phew!
That this proof is so much longer and more difficult than the one using infinitesimals is something that mathematicians have had to learn to live with; the price of rigour is generally paid in ink.

The rigour that Cauchy brought to the differential calculus was part of a great revolution in mathematical thinking that took place during the 19

^{th}century. Indeed, almost all of the techniques of modern mathematics were developed during this period.### Infinitesimals 2.0

In the latter half of the 20^{th}century the infinitesimals enjoyed something of a renaissance. Both Robinson, with his non-standard numbers^{[9]}, and Conway, with his surreal numbers^{[10]}, developed consistent number systems in which infinitesimals could be given a rigorous definition.Their approach was to embed something akin to Cauchy’s limits into the very idea of a number. Loosely speaking, Robinson defined a non-standard number as an infinite sequence of real numbers with arithmetic operations and functions applied on an element by element basis. For example

\[
\begin{align*}
x &= (x_0, x_1, x_2, x_3, \dots)\\
y &= (y_0, y_1, y_2, y_3, \dots)\\
\\
x+y &= (x_0+y_0, x_1+y_1, x_2+y_2, x_3+y_3, \dots)
\end{align*}
\]

Loosely speaking, two non-standard numbers \(x\) and \(y\) are considered equal if for all but a finite set of indices \(i\), \(x_i = y_i\) or, in limit notation, that
\[
\exists n \;\; \forall i>n \;\; \left(x_i = y_i\right)
\]

The remaining comparison operators are similarly defined and the real numbers are the subset of the non-standard numbers whose elements are identical for all but a finite set of indices.Now consider the standard number whose elements are a strictly decreasing sequence of positive numbers. For example

\[
\delta = \left(1, \tfrac{1}{2}, \tfrac{1}{4}, \tfrac{1}{8}, \dots \right)
\]

By the rules of non-standard arithmetic this number is greater than zero since every element in it is greater than zero. Furthermore, it is smaller than any positive real number \(x\) since there will be some \(n\) for which \(\frac{1}{2^n}\) is smaller than \(x\) and hence so will be the infinite sequence of elements of \(\delta\) after the \(n^{th}\).So here we have a genuine, bona fide infinitesimal with all of the properties of those in Newton and Leibniz’s differential calculus!

If a non-standard number \(z\) can be represented by a real number plus an infinitesimal non-standard number, we call that real number the standard part of \(z\), or \(\mathrm{st}(z)\). We can thus define the derivative of a function \(f\) as

\[
\mathrm{st}\left(\frac{f(x+\delta)-f(x)}{\delta}\right)
\]

for a standard real \(x\) and an infinitesimal non-standard \(\delta\), if this value exists.For example, let’s consider the derivative of \(x^2\) a third time.

\[
\begin{align*}
\delta &= \left(\delta_0, \delta_1, \delta_2, \delta_3, \dots\right)\\
\\
x+\delta &= \left(x+\delta_0, x+\delta_1, x+\delta_2, x+\delta_3, \dots\right)\\
\\
\left(x+\delta\right)^2 &= \left(\left(x+\delta_0\right)^2, \left(x+\delta_1\right)^2, \left(x+\delta_2\right)^2, \left(x+\delta_3\right)^2, \dots\right)\\
\\
\frac{\left(x+\delta\right)^2 -x^2}{\delta} &= \frac{\left(\left(x+\delta_0\right)^2 -x^2, \left(x+\delta_1\right)^2 - x^2, \left(x+\delta_2\right)^2 - x^2, \left(x+\delta_3\right)^2 - x^2, \dots\right)}{\left(\delta_0, \delta_1, \delta_2, \delta_3, \dots\right)}\\
\\
&= \frac{\left(2 \times x \times \delta_0 + \delta_0^2, 2 \times x \times \delta_1 + \delta_1^2, 2 \times x \times \delta_2 + \delta_2^2, 2 \times x \times \delta_3 + \delta_3^2, \right)}{\left(\delta_0, \delta_1, \delta_2, \delta_3, \dots\right)}\\
\\
&= \left(2x + \delta_0, 2x + \delta_1, 2x + \delta_2, 2x + \delta_3, \dots \right)\\
\\
\mathrm{st}\left(\frac{(x+\delta)^2-x^2}{\delta}\right) &= 2x
\end{align*}
\]

Note that at no point did we rely upon the value of \(\delta\), just that it was an infinitesimal, and hence the result stands for all infinitesimals.The problem with this loose definition of the non-standard numbers is that the vast majority of them are comprised of oscillating or random sequences which cannot meaningfully be ordered by the less than and greater than operators. Fortunately, the formal definition addresses this deficiency by demonstrating that it is possible to define rules by which we can do so, albeit ones which we cannot ever hope to write down in full.

Very roughly speaking, these rules unambiguously equate oscillating/random sequences with convergent or divergent sequences. With

*magic*.Robinson proved that the non-standard numbers do not lead to any logical contradictions other than those, if any, that are consequences of the standard reals and that his infinitesimals have exactly the properties of Newton’s.

We can therefore dispense with the full limit notation and simply go back to using our original infinitesimals.

Now that we know exactly how the differential calculus is defined we are nearly ready to begin analysing numerical differentiation algorithms.

Before we can start, however, there is one last piece of mathematics that we shall need.

### Taylor's Theorem

For those who seek to develop numerical algorithms, Taylor’s theorem is perhaps the single most useful result in mathematics.It demonstrates how a sufficiently differentiable function can be approximated within some region to within some error by a polynomial. Specifically, it shows that

\[
f(x+\delta) = f(x) + \delta \times f^{\prime}(x) + \tfrac{1}{2}\delta^2 \times f^{\prime\prime}(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + O\left(\delta^{n+1}\right)
\]

where we are using the notational convention of \(f^\prime(x)\) for the derivative of \(f\) evaluated at \(x\), \(f^{\prime\prime}(x)\) for the second derivative and \(f^{(n)}(x)\) for the \(n^{th}\) derivative. The exclamation mark is the factorial of the value preceding it and \(O\left(\delta^n+1\right)\) is an error term of order \(\delta^n+1\).By sufficiently differentiable we mean that all of the derivatives of \(f\) up to the \(n+1^{th}\) must exist, and that all of the derivatives of \(f\) up to the \(n^{th}\) must be continuous, between \(x\) and \(x+\delta\), inclusive of the bounds in the latter case but not in the former.

In fact, the error term is

*exactly*defined by a more accurate statement of Taylor’s theorem
\[
f(x+\delta) = f(x) + \delta \times f^{\prime}(x) + \tfrac{1}{2}\delta^2 \times f^{\prime\prime}(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + R_n
\]

where
\[
R_n = \tfrac{1}{\left(n+1\right)!}\delta^{(n+1)} \times f^{(n+1)}(y)
\]

for some \(y\) between \(x\) and \(x+\delta\).If we put no limit upon \(n\) we recover the Taylor series of a function \(f\) in the region of \(x\)

\[
\begin{align*}
f(x+\delta) &= f(x) + \delta \times f^{\prime}(x) + \tfrac{1}{2}\delta^2 \times f^{\prime\prime}(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + \dots\\
\\
&= \sum_{i \geqslant 0} \tfrac{1}{i!} \delta^i \times f^{(i)}(x)
\end{align*}
\]

where \(\sum\) is the sum of the expression to its right for every unique value of \(i\) that satisfies the inequality beneath it, the factorial of 0 being 1 and the 0^{th}derivative of a function being the function itself.Note that this identity holds if and only if the sum has a well defined limit under Cauchy’s definition.

The reason why Taylor’s theorem is of such utility in numerical computing is that it provides us with an explicit formula for approximating a function with a polynomial and bounds on the error that results from doing so.

Such polynomials are very easy to mathematically and numerically manipulate and thus can dramatically simplify many mathematical computations; they are used to very great effect in Physics, for example.

Furthermore, it gives us an explicit formula for the error in the value of a function that results from an error in its argument, such as might occur through floating point rounding.

So, now we have a thorough grasp of the differential calculus and are equipped with the numerical power tool of Taylor’s theorem, we are ready to scrutinise some of the numerical algorithms for approximating the derivatives of functions.

I’m afraid we shall have to wait until next time before we do so, however.

### References

[1]*You're Going To Have To Think! Why [Insert Technique Here] Won't Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[2]

*You're Going To Have To Think! Why Fixed Point Arithmetic Won’t Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[3]

*You're Going To Have To Think! Why Arbitrary Precision Arithmetic Won’t Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[4]

*You're Going To Have To Think! Why Rational Arithmetic Won’t Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[5]

*You're Going To Have To Think! Why Computer Algebra Won’t Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[6]

*You're Going To Have To Think! Why Interval Arithmetic Won’t Cure Your Floating Point Blues*, www.thusspakeak.com, 2013.[7]

*IEEE standard for binary floating-point arithmetic*. ANSI/IEEE std 754-1985, 1985.[8] Berkeley, G.

*The Analyst; Or, A Discourse Addressed to an Infidel Mathematician*, Printed for J. Tonson, 1734.[9] Robinson, A.

*Non-standard Analysis (New Ed.)*, Princeton University Press, 1996.[10] Knuth, D.

*Surreal Numbers; How Two Ex-Students Turned on to Pure Mathematics and Found Total Happiness*, Addison Wesley, 1974.
Based upon an article I wrote for ACCU's Overload magazine.

## Leave a comment