# You’re Going To Have To Think! Why Computer Algebra Won’t Cure Your Calculus Blues

We began the second half of this series of posts[1] with a brief history of the differential calculus and a description of perhaps the most powerful mathematical tool for numerical computing, Taylor's theorem, which states that for a function $$f$$
$f(x+\delta) = f(x) + \delta \times f'(x) + \tfrac{1}{2}\delta^2 \times f''(x) + \dots + \tfrac{1}{n!} \delta^n \times f^{(n)}(x) + R_n\\ \min\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \leqslant R_n \leqslant \max\left(\tfrac{1}{(n+1)!} \delta^{(n+1)} \times f^{(n+1)}(x+\theta\delta)\right) \; \mathrm{for} \; 0 \leqslant \theta \leqslant 1$
where the exclamation mark stands for the factorial, $$f'(x)$$ the first derivative of $$f$$ at $$x$$, $$f''(x)$$ the second and $$f^{(n)}(x)$$ the $$n^\mathrm{th}$$ and where, by convention, the $$0^\mathrm{th}$$ derivative is the function itself.
We then used Taylor’s theorem to perform a detailed analysis of finite difference approximations of various orders of derivatives and, in the previous instalment, sought to use polynomial approximations to automate the calculation of their formulae.
We concluded with Ridders' algorithm[2] which treated the symmetric finite difference as a function of the change in the argument, $$\delta$$, and used a polynomial approximation of it to estimate the value of the difference with a $$\delta$$ of zero.
You will no doubt recall that this was a significant improvement over every algorithm we had previously examined with an average error roughly just one decimal order of magnitude worse than the theoretical minimum.

Given that this series of posts has not yet concluded, the obvious question is whether or not we can close that gap.
The obvious answer is that we can.
Sort of.

One of the surprising facts about differentiation is that it is almost always possible to find the expression for the derivative of a function if you have the expression for the function itself. This might not seem surprising until you consider the inverse operation; there are countless examples where having the expression for the derivative doesn't mean that we can find the expression for the function.
This is enormously suggestive of a method by which we can further improve our calculation of derivatives; get the computer to generate the correct expression for us.
Figure 1: The Golden Ratio Tree
 × ÷ + 1 2 1 √ 5

### Computer Algebra Revisited

We first discussed computer algebra as part of our quest to find an infinite precision numeric type[3]. The idea was to represent an expression as a tree rather than directly compute its value. Figure 1 shows how we might represent the golden ratio
$\tfrac{1}{2}\left(1+\sqrt{5}\right)$
For example, listing 1 gives the definition of the ak integer expression object

Listing 1: ak.intExpr
ak.EXPRESSION_T = 'ak.expression';

function Expression(){}
Expression.prototype = {TYPE: ak.EXPRESSION_T};

ak.intExpr = function(x) {
var e = new Expression();
var state = ak.bignum(x);

e.SUB    = 'num';
e.approx = function()  {return state.toNumber();};
e.exact  = function(i) {return intExact(state, i);};
e.sign   = function()  {return state.at(0);};

return Object.freeze(e);
};


You may recall that my proposal to compute the digits of the decimal expansion of the expression one a time with the exact member function (with negative indices to the left of the decimal point and positive to the right) as a means of effectively maintaining infinite precision was ultimately doomed to failure because of the impossibility of comparing equal values. There was simply no way, in general, to decide when to give up.

However, the simplicity and near universal applicability of the rules of differentiation gives these expression objects something of a reprieve; they are supremely well suited for automating those rules.

### Symbolic Differentiation

Key to automating the rules of differentiation are the facts that we can identify the type of an expression with its SUB member and that we can distinguish between variables with the id method of ak.varExpr objects, as defined in listing 2.

Listing 2: ak.varExpr
var varID  = zero;
var varDef = ak.intExpr(zero);

ak.varExpr = function() {
var e = new Expression();
var state = {id: varID};

e.SUB    = 'var';
e.id     = function() {return state.id;};
e.value  = function() {return varValue(e, state, arguments);};

e.approx = function()  {return state.val.approx();};;
e.exact  = function(i) {return state.val.exact(i);};
e.sign   = function()  {return state.val.sign()};

e.value(varDef);

return Object.freeze(e);
};

function varValue(e, state, args) {
if(args.length>0) {
if(ak.type(args[0])!==ak.EXPRESSION_T) {
throw new Error('non-expression value passed to ak.varExpr');
}
state.val = args[0];
}
return state.val;
}


We exploit the former in much the same way that we use ak.type to dispatch arithmetic operation to the appropriate function, as shown in listing 3.

Listing 3: ak.symbolicDerivative
var zero = ak.intExpr(0);
var one  = ak.intExpr(1);
var two  = ak.intExpr(2);

var symbolicDerivative = {};

ak.symbolicDerivative = function(f, x) {
if(ak.type(f)!==ak.EXPRESSION_T) {
throw new Error('non-expression function passed to ak.symbolicDerivative');
}

if(ak.type(x)!==ak.EXPRESSION_T || x.SUB!=='var') {
throw new Error('non-expression variable argument passed to ak.symbolicDerivative');
}

return symbolicDerivative[f.SUB](f, x);
};


Now we can simply add members to the symbolicDerivative object that apply symbolic transformations to $$f$$ appropriate to the type of expression it represents.
The most trivial is the rule that the derivative of a constant is zero, as shown in listing 4.

Listing 4: Constant Derivative
symbolicDerivative['num'] = function(f, x) {
return zero;
};


The next simplest rule is that for the derivative of a variable; one if it's with respect to itself and zero otherwise. This is where we exploit the id method of ak.varExpr objects, as illustrated by listing 5.

Listing 5: Variable Derivative
symbolicDerivative['var'] = function(f, x) {
return f.id()===x.id() ? one : zero;
};


Now the general plan of action is to apply the chain rule to recursively construct the derivative of a complex expression in terms of the derivative of an expression with respect to its argument and the derivatives of its arguments with respect to the variable.
$\frac{\mathrm{d}f(g)}{\mathrm{d}x} \Bigg|_x = \frac{\mathrm{d}f}{\mathrm{d}y} \Bigg|_{g(x)} \times \frac{\mathrm{d}g}{\mathrm{d}x} \Bigg|_x$
For a simple example consider the exponential function whose derivative is equal to itself.
Applying the chain rule we have
$\frac{\mathrm{d}}{\mathrm{d}x} e^{f(x)} = e^{f(x)} \times \frac{\mathrm{d}f}{\mathrm{d}x} \Bigg|_x$
as implemented in listing 6.

Listing 6: Exponential Derivative
symbolicDerivative['exp'] = function(f, x) {
return ak.mul(f, ak.symbolicDerivative(f.arg(), x));
};


Note that because ak.symbolicDerivative returns ak expression objects we can use the overloaded arithmetic operations to recursively construct the derivative of an expression according to the chain rule.
One thing that should be relatively obvious is that the derivative of the exponential function itself is now exact to machine precision, as demonstrated by program 1.

Program 1: Symbolic Derivative Errors

Here we use an ak.approxExpr object to set the value of the variable. This is a previously undescribed expression object from Expression.js that recognises the futility of infinite precision and only supports approximate evaluation, as shown in listing 7.

Listing 7: ak.approxExpr
ak.approxExpr = function(x) {
var e = new Expression();
x = Number(x);

e.SUB    = 'num';
e.approx = function() {return x;};

return Object.freeze(e);
};


Moving on, addition and subtraction expression objects are barely less trivial to differentiate than constants and variables, as shown by listing 8.

Listing 8: Addition And Subtraction Derivatives
symbolicDerivative['add'] = function(f, x) {
};

symbolicDerivative['sub'] = function(f, x) {
return ak.sub(ak.symbolicDerivative(f.lhs(), x), ak.symbolicDerivative(f.rhs(), x));
};


A more complex example is multiplication, for which we need the product rule
$\frac{\mathrm{d}(f \times g)}{\mathrm{d}x} = f \times \frac{\mathrm{d}g}{\mathrm{d}x} + \frac{\mathrm{d}f}{\mathrm{d}x} \times g$
Listing 9 gives the implementation of the derivative of a product, albeit with the terms of the addition in the opposite order to the above formula

Listing 9: Multiplication Derivative
symbolicDerivative['mul'] = function(f, x) {
var lhs = ak.mul(ak.symbolicDerivative(f.lhs(), x), f.rhs());
var rhs = ak.mul(f.lhs(), ak.symbolicDerivative(f.rhs(), x));
};


In the case of division we are effectively multiplying one function by the reciprocal of another, so we must apply both the chain rule and the product rule
\begin{align*} \frac{\mathrm{d}}{\mathrm{d}x} \frac{f(x)}{g(x)} &= \frac{\mathrm{d}}{\mathrm{d}x}\left(f(x) \times \frac{1}{g(x)}\right)\\ &= f(x) \times \frac{\mathrm{d}}{\mathrm{d}x} \frac{1}{g(x)} + \frac{1}{g(x)} \times \frac{\mathrm{d}}{\mathrm{d}x} f(x)\\ &= -\frac{f(x)}{g(x)^2} \times \frac{\mathrm{d}}{\mathrm{d}x}g(x) + \frac{1}{g(x)} \times \frac{\mathrm{d}}{\mathrm{d}x} f(x) \end{align*}
An implementation is given in listing 10

Listing 10: Division Derivative
symbolicDerivative['div'] = function(f, x) {
var lhs = ak.div(ak.symbolicDerivative(f.lhs(), x), f.rhs());
var ratio = ak.div(f.lhs(), ak.pow(f.rhs(), two));
var rhs = ak.mul(ratio, ak.symbolicDerivative(f.rhs(), x));
return ak.sub(lhs, rhs);
};


in which the terms have been swapped again to use subtraction of the first from the second instead of adding the former's negation to the latter.

As a final example, we shall take a look at raising one expression to the power of another. Now it is not entirely obvious how to do this. Differentiating a variable raised to a constant non-zero power should be familiar enough
$\frac{\mathrm{d}}{\mathrm{d}x} x^c = c \times x^{c-1}$
However, differentiating a constant raised to the power of a variable isn't quite so straightforward
$\frac{\mathrm{d}}{\mathrm{d}x} c^x = ?$
The trick is to exponentiate the logarithm of $$c^x$$
$\frac{\mathrm{d}}{\mathrm{d}x} c^x = \frac{\mathrm{d}}{\mathrm{d}x} e^{\ln c^x} = \frac{\mathrm{d}}{\mathrm{d}x} e^{x \ln c}$
after which the result is trivially
$\ln c \times e^{x \ln c} = \ln c \times c^x$
Combining this with the chain rule, the product rule and the fact that the derivative of the exponential is equal to itself yields
\begin{align*} \frac{\mathrm{d}}{\mathrm{d}x} f(x)^{g(x)} &= \frac{\mathrm{d}}{\mathrm{d}x} e^{g(x) \times \ln f(x)}\\ &= \frac{\mathrm{d}}{\mathrm{d}x} \left(g(x) \times \ln f(x)\right) \times e^{g(x) \times \ln f(x)}\\ &= \left(\ln f(x) \times \frac{\mathrm{d}}{\mathrm{d}x} g(x) + g(x) \times \frac{\mathrm{d}}{\mathrm{d}x} \ln f(x)\right) \times f(x)^{g(x)}\\ &= \left(\ln f(x) \times \frac{\mathrm{d}}{\mathrm{d}x} g(x) + \frac{g(x)}{f(x)} \times \frac{\mathrm{d}}{\mathrm{d}x} f(x)\right) \times f(x)^{g(x)} \end{align*}
Listing 11 shows the implementation of the derivative of the power function

Listing 11: Power Derivative
symbolicDerivative['pow'] = function(f, x) {
var lhs = ak.mul(ak.symbolicDerivative(f.rhs(), x), ak.log(f.lhs()));
var rhs = ak.mul(ak.div(f.rhs(), f.lhs()), ak.symbolicDerivative(f.lhs(), x));
};


I shan't bother describing the remaining arithmetic operations but shall instead direct you to SymbolicDerivative.js in which they are implemented.

### Exact Differentiation?

On the face of it we have finally achieved what we set out to do; implement an algorithm that can calculate the derivative of a function to machine precision.
In using expression objects to symbolically differentiate expressions we are able to generate an expression that is mathematically identical to the derivative.
All that remains is to evaluate it.
Unfortunately there are a few problems.

### Memory Use

The first problem is that the expressions representing derivatives can grow in complexity at an alarming rate.
For example, consider the function
$f(x) = e^{x^2}$
Differentiating with respect to $$x$$ yields
$\frac{\mathrm{d}}{\mathrm{d}x} f(x) = 2x \times e^{x^2}$
Differentiating again yields
$\frac{\mathrm{d}^2}{\mathrm{d}x^2} f(x) = 2e^{x^2} + 4x^2 \times e^{x^2}$
Once more and we have
$\frac{\mathrm{d}^3}{\mathrm{d}x^3} f(x) = 4x \times e^{x^2} + 8x \times e^{x^2} + 8x^3 \times e^{x^2}$
Clearly the trees representing these expressions will grow quite rapidly unless we algebraically simplify them. For example, the third derivative can be simplified to
$\frac{\mathrm{d}^3}{\mathrm{d}x^3} f(x) = \left(12x + 8x^3\right) \times e^{x^2}$
reducing the number of arithmetic operations from 15 to 7.

Unfortunately implementing a computer algebra system that is capable of performing such simplifications is no easy task. Much like chess programs they require large databases of valid transformations of expressions, some heuristic that accurately captures our sense of simplicity and expensive brute-force search algorithms to traverse the variety of ways in which those transformations can be applied.
Unless we are willing to expend a very great deal more effort we shall have to accept the fact that symbolic differentiation will be something of a memory hog.

### Cancellation Error

The next problem also stems from the difficulty in simplifying expressions but is rather more worrying. Consider redundant expressions which though complex, when fully simplified, yield trivial expressions.
As an example consider
$f(x) = \frac{x}{x}$
Trivially, this has a derivative of zero for all $$x$$, but when we apply our expression objects to compute the derivative, they will blindly follow the rules to yield
$\frac{\mathrm{d}}{\mathrm{d}x} f(x) = \frac{\mathrm{d}}{\mathrm{d}x} \left(x \times \frac{1}{x}\right) = \frac{1}{x} - x \times \frac{1}{x^2}$
Mathematically, this is identically equal to 0 for all $$x$$, but since we are using floating point some errors will inevitably creep in. We shall consequently be subtracting two nearly equal numbers which is the very incantation we use to summon the dreaded cancellation error.
Program 2 demonstrates the result of this calculation for $$x$$ from 0.01 to 1 in steps of 0.01 and clearly shows the effect of cancellation on the result.

Program 2: Symbolic Derivative Cancellation Errors

Whilst this is a simple example, it is indicative of a wider problem. Even if we have taken great care to ensure that the expression representing a function is not susceptible to cancellation error we do not know whether the same can be said of the expression representing its derivative.

We can at least use interval arithmetic to monitor numerical error in the calculation of derivatives. You will recall that interval arithmetic works by placing bounds on arithmetic operations by computing the worst case smallest and largest results given the bounds on its arguments and numerical rounding[4].
Rather than making changes to the ak expression objects, we shall simply traverse the expression tree performing an equivalent calculation with ak.interval objects. This is made simple by the fact that the SUB members of arithmetic operation expressions are identical to the method names of the equivalent operations in ak, as shown in listing 13.

Listing 13: ak.symbolicDerivativeBounds
function bounds(df) {
if(df.SUB==='var' || df.SUB==='num') return ak.interval(df.approx());
if(df.arg) return ak[df.SUB](bounds(df.arg()));
return ak[df.SUB](bounds(df.lhs()), bounds(df.rhs()));
}

ak.symbolicDerivativeBounds = function(f, x) {
var df = symbolicDerivative[f.SUB](f, x);
var b  = {};

b.approx = df.approx();
b.bounds = function() {return bounds(df);};
return Object.freeze(b);
};


Program 3 demonstrates the calculation of bounds for the derivative of $$x \div x$$ and clearly reveals the presence of significant cancellation error near zero.

Program 3: Symbolic Derivative Bounds

This combination of expression objects and interval arithmetic is as effective an algorithm for the calculation of derivatives as is possible without implementing a fully functional computer algebra system.

### Numerical Approximation

There are many mathematical functions for which we have no simple formulae and must instead rely upon approximations. These approximations are often iterative in nature, stopping when some convergence criteria is satisfied.
If we naively implement an iterative approximation using an expression object as the argument we shall almost certainly run into trouble.
Not only is the resulting expression tree is liable to be extremely large, it may take a different form for different values of the argument. If we generate an approximating expression using the initial value of the argument, we may very well introduce significant errors for other values.
To accurately represent iterative functions with expression objects we shall need to implement expression objects representing loops, conditional statements and so on and so forth.
Unfortunately working out the symbolic derivatives of such expressions is, in general, a tricky proposition.

Assuming that we have done so, or that the approximation takes the same form for all values of the argument, we still aren't quite out of the woods. The derivative of an accurate approximating expression for a function is not necessarily an accurate approximating expression for the derivative of that function.
For example, consider the function
$g(x) = f(x) \times \left(1+\frac{\sin(100 \times x)}{100}\right)$
which serves as an approximation to $$f(x)$$.
Now, this is trivially everywhere equal to within 1% of the value of $$f(x)$$, but if we calculate its derivative we find that
\begin{align*} \frac{\mathrm{d}}{\mathrm{d}x} g(x) &= \frac{\mathrm{d}}{\mathrm{d}x} f(x) \times \left(1+\frac{\sin(100 \times x)}{100}\right) + f(x) \times 100 \times \frac{\cos(100 \times x)}{100}\\ &= \frac{\mathrm{d}}{\mathrm{d}x} f(x) \times \left(1+\frac{\sin(100 \times x)}{100}\right) + f(x) \times \cos(100 \times x) \end{align*}
which can differ from the actual derivative by as much as 100% of the value of $$f(x)$$!

In such cases we may be better served by a polynomial approximation algorithm since these can effectively smooth out this high frequency noise by terminating once the derivative starts growing as $$\delta$$ shrinks close to its wavelength.
This observation hints at how we must proceed if we wish to use expression objects in conjunction with numerical approximations of functions.

Rather than represent the approximation of the function with an expression tree, we must create a new type of expression object that implements the approximation. The derivative member function must then return an expression object representing the derivative of the function.
In the worst case this may simply be an expression object implementing a polynomial approximation algorithm. Better still would be an expression object implementing a specifically designed approximation of the derivative. Best of all is the case when the derivative is known in closed form, such as the derivative of a numerical integration for example, in which case it can be an expression tree.

So whilst symbolic differentiation is a powerful tool, we must still think carefully about how we use it.

### References

[1] You're Going To Have To Think! Why [Insert Algorithm Here] Won't Cure Your Calculus Blues, www.thusspakeak.com, 2013.

[2] Ridders, C.J.F., Advances in Engineering Software, Vol. 4. No. 2., Elsevier, 1982.

[3] You're Going To Have To Think! Why Computer Algebra Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

[4] You're Going To Have To Think! Why Interval Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

Based upon an article I wrote for ACCU's Overload magazine.