You're Going To Have To Think! Why Computer Algebra Won't Cure Your Floating Point Blues


We have seen how fixed point[1][2] and rational[3] arithmetic do not completely solve the supposed deficiencies of floating point arithmetic, so the question still remains as to whether we can do any better.
The heart of the problem is that it is fundamentally impossible to numerically represent irrationals with a finite number of digits. We could conceivably represent algebraic irrationals, those that yield zero for some polynomial with integer terms, with a list of primes and the rational powers to which they should be raised. The bookkeeping would be an absolute nightmare though, to say nothing of the complexity of adding and subtracting such numbers.
But even with such a representation we would be flummoxed by transcendental irrationals like \(\pi\).
Instead, we might try to explicitly manipulate mathematical formulae rather than approximately evaluate their intermediate steps. For example, when taking the square root of two we should return a result that represents the operation itself rather than an approximation of its result. When the calculation is complete we could then evaluate it to any precision we desire, one digit at a time, effectively moving from arbitrary precision to infinite precision, addressing all of the weaknesses of our other numerical types.
Figure 1: The Golden Ratio Tree
÷ +
1 2 1

Manipulating string representations of formulae would be rather unwieldy, so instead we shall represent them with trees. For example, the formula for the golden ratio
\[ \tfrac{1}{2} (1 + \sqrt{5}) \]
can be represented by the tree given in figure 1.

The nodes of the tree should be interpreted as the application of the operation they contain to the results of the nodes below it, with the leaf nodes being equated to the numbers they contain.
Such representations are often referred to as expression objects since the values they contain capture complete expressions rather than their results.

An Expression Class

The simplest types of expressions are integer constants and it is these that I shall use to illustrate the approach taken for ak expression objects, as shown in listing 1.

Listing 1: ak.intExpr
ak.EXPRESSION_T = 'ak.expression';

function Expression(){}
Expression.prototype = {TYPE: ak.EXPRESSION_T};

ak.intExpr = function(x) {
 var e = new Expression();
 var state = ak.bignum(x);

 e.SUB    = 'num';
 e.approx = function()  {return state.toNumber();};
 e.exact  = function(i) {return intExact(state, i);};
 e.sign   = function()  {return;};

 return Object.freeze(e);

All expressions objects will be of type ak.EXPRESSION_T so the SUB member is provided to distinguish between subtypes.

We have two ways to evaluate an expression, approximate and exact. Given that the state of an integer expression is an ak.bignum approximate evaluation is trivially done by converting it to a Number object. Exact evaluation is provided with the exact and sign methods. The former returns the decimal digit in the \(i^\mathrm{th}\) place, or in other words the digit that should be multiplied by \(10^i\), and the latter, surprise surprise, the sign.
Extracting the \(i^\mathrm{th}\) decimal digit entails some careful slicing into the digits of the bignum and is implemented in the helper function intExact, given in listing 2.

Listing 2: ak.intExpr intExact
function intExact(state, i) {
 var o, d, n, m;

 i = ak.trunc(i);
 if(!(i>=0 && i<ak.DEC_DIG*state.size())) return ak.NaN;

 o = i%ak.DEC_DIG;
 d = (i-o)/ak.DEC_DIG;
 n = Math.abs(;
 m = Math.pow(10, o);

 return n>=m ? ((n-n%m)/m)%10 : ak.NaN;

An important point to note is that if the index is negative, and consequently represents a fractional digit, or if it exceeds the number of decimal digits of the integer, intExact returns NaN. All of the expression objects will this approach to indicate that we have exhausted digits to the left or right of the decimal point since it won't always be obvious in advance when we will do so.

Spigot Algorithms

It is clearly a trivial matter to extract individual digits from integers, but what might not be so obvious is that we can do the same for irrationals like \(\pi\), as shown in listing 3.

Listing 3: ak.piExpr
ak.piExpr = function() {
 var e = new Expression();
 var state = {};

 e.SUB    = 'num';
 e.approx = function()  {return ak.PI;};
 e.exact  = function(i) {return piExact(state, i);};
 e.sign   = function()  {return 1;};

 return Object.freeze(e);

Clearly the heavy lifting is done by the piExact function, given in listing 4.

Listing 4: ak.piExpr piExact
var zero = ak.bignum(0);  var one    = ak.bignum(1);
var two  = ak.bignum(2);  var three  = ak.bignum(3);
var four = ak.bignum(4);  var seven  = ak.bignum(7);
var ten  = ak.bignum(10); var thirty = ak.bignum(30);

function piExact(state, i) {
 var q2, q2r, d, rdt;

 i = ak.trunc(-i)+1;
 if(!(i>=1 && i<=ak.INT_MAX)) return ak.NaN;

 if(!(i>=state.i)) {
  state.i = 0;   state.d = 3;
  state.k = one; state.l = three;
  state.q = one; state.r = zero;  state.t = one;

 while(i>state.i) {
  q2  = ak.add(state.q, state.q);
  q2r = ak.add(q2, state.r);
  d   = ak.div(ak.add(q2r, state.q), state.t);
  rdt = ak.sub(state.r, ak.mul(d, state.t));

  if(, q2), rdt), state.t)) {
   state.d =;
   state.q = ak.mul(state.q, ten);
   state.r = ak.mul(rdt, ten);
  else {
   state.q = ak.mul(state.q, state.k);
   state.r = ak.mul(q2r, state.l);
   state.t = ak.mul(state.t, state.l);
   state.k = ak.add(state.k, one);
   state.l = ak.add(state.l, two);
 return state.d;

This is more or less a straight port of Gibbons's algorithm[4] from Haskell to JavaScript and, given that the digits aren't cached, is only appropriate for traversal from more to less significant digits, as demonstrated in program 1.

Program 1: ak.piExpr

Gibbons's algorithm is an example of a spigot algorithm, so named because they compute the digits of their results one at a time without reference to previous digits, like drips from a leaky spigot.
Now it may not be immediately obvious quite how this algorithm is generating the digits of \(\pi\) but it is essentially a simple idea; find a base in which \(\pi\) has simple, predictable digits and convert from that base to base ten one digit at a time. To find one we must redefine exactly what we mean by the base of a number, as I shall show by doing the same for \(e\).

Recall that, by using Taylor's theorem, we were able to represent \(e\) as an infinite series.
\[ e = 1 + 1 + \frac 1 2 + \frac 1 6 + \frac 1 {24} + ... + \frac 1 {n!} + ... \]
where the exclamation mark means the factorial, the product of every number from one to the number preceding it.
Trivially, this means that
\[ e - 1 = 1 + \frac 1 2 + \frac 1 6 + \frac 1 {24} + ... + \frac 1 {n!} + ... \]
which we can rewrite as
\[ e - 1 = 1 + \frac 1 2 \times \left(1 + \frac 1 3 \times \left(1 + \frac 1 4 \times \left(1 + ... + \frac 1 n \times \left(1 + ...\right) \right) \right) \right) \]
This isn't so different in form to how we would represent it in decimal
\[ e - 1 = 1 + \frac 1 {10} \times \left(7 + \frac 1 {10} \times \left(1 + \frac 1 {10} \times \left(8 + ...\right) \right) \right) \]
if we just extend the definition of the base to allow digits to be multiplied by different values. This might seem a little odd, but we do it all the time when working with, well, time.

To convert our weird base representation to decimal one digit at a time we must first note that it can be rewritten as
\[ \begin{align*} e - 1 &= f_2 \circ f_3 \circ f_4 \circ ... \circ f_n \circ ...\\ f_k(x) &= \frac{x+k}{k} \end{align*} \]
where \(\circ\) is the composition operator, so \(f \circ g (x) = f(g(x))\).
Each of these functions maps the interval \([1, 2]\) onto a subset of itself and chains of them onto ever smaller subsets. A consequence of this is that if
\[ f_2 \circ ... \circ f_n (1) \]
\[ f_2 \circ ... \circ f_n (2) \]
have the same integer part, then it must be the leading digit of \(e-1\). When we find that digit, say \(d\), we define a new function
\[ g(x) = 10 \times (f_2 \circ ... \circ f_n (x) - d) \]
and carry on composing the remaining functions with it until we get agreement on the integer part again, which must then be the first decimal digit.
Explicitly calling nested functions would be prohibitively expensive, but fortunately we don't have to. If we restrict ourselves to rational arithmetic, which is not much of a burden given that our starting values are rationals and the functions map rationals to rationals, we have
\[ \begin{align*} f_k\left(\frac{x_n}{x_d}\right) &= \frac{\dfrac{x_n}{x_d}+k}{k}\\ &= \frac{x_n + k \times x_d}{k \times x_d} \end{align*} \]
If we represent such rationals with a vector we can rewrite this as
\[ f^\prime_k\left( \begin{pmatrix} x_n\\ x_d \end{pmatrix} \right) = \begin{pmatrix} 1 & k\\ 0 & k \end{pmatrix} \times \begin{pmatrix} x_n\\ x_d \end{pmatrix} = \begin{pmatrix} x_n + k \times x_d\\ k \times x_d \end{pmatrix} \]
by the rules of matrix-vector multiplication, and consequently
\[ f^\prime_n \circ ... \circ f^\prime_m \left(\begin{pmatrix} x_n\\ x_d \end{pmatrix}\right) = \prod_{i=n}^m \begin{pmatrix} 1 & i\\ 0 & i \end{pmatrix} \times \begin{pmatrix} x_n\\ x_d \end{pmatrix} \]
where \(\Pi\) stands for the product of the terms following it evaluated at each of the values from that beneath up to and including that above it.
If we represent the accumulated products with
\[ \begin{pmatrix} q & r\\ s & t \end{pmatrix} \]
then stripping the leading digit is achieved with the matrix multiplication
\[ \begin{pmatrix} 10 & -10d\\ 0 & 1 \end{pmatrix} \times \begin{pmatrix} q & r\\ s & t \end{pmatrix} = \begin{pmatrix} 10q-10ds & 10r-10dt\\ s & t \end{pmatrix} \]
by the rules of matrix-matrix multiplication.
Noting that \(s\) is equal to zero leads to the implementation given in listing 5.

Listing 5: ak.eExpr
ak.eExpr = function() {
 var e = new Expression();
 var state = {};

 e.SUB    = 'num';
 e.approx = function()  {return ak.E;};
 e.exact  = function(i) {return eExact(state, i);};
 e.sign   = function()  {return 1;};

 return Object.freeze(e);

function eExact(state, i) {
 var qr, d, rdt;

 i = ak.trunc(-i)+1;
 if(!(i>=1 && i<=ak.INT_MAX)) return ak.NaN;

 if(!(i>=state.i)) {
  state.i = 0;     state.d = 1;   state.k = three;
  state.q = one;   state.r = two; state.t = two;

 while(i>state.i) {
  qr  = ak.add(state.q, state.r);
  d   = ak.div(qr, state.t);
  rdt = ak.sub(state.r, ak.mul(d, state.t));

  if(, state.q), rdt), state.t)) {
   state.d =;
   state.q = ak.mul(state.q, ten);
   state.r = ak.mul(rdt, ten);
  else {
   state.r = ak.mul(state.k, qr);
   state.t = ak.mul(state.k, state.t);
   state.k = ak.add(state.k, one);
 return i>1 ? state.d : state.d+1;

Like ak.piExpr, ak.eExpr performs no caching of digits and is therefore suited for generation of digits in the order of most to least significant, as demonstrated by program 2.

Program 2: ak.eExpr

ak Expression Overloads

To illustrate how overloads will work for expression objects we shall restrict ourselves to approx evaluation for now.
Listing 6 shows how ak.abs and ak.neg are implemented.

Listing 6: ak Expression abs And neg
function abs(x) {
 var e = new Expression();
 e.SUB    = 'abs';
 e.arg    = function() {return x;};
 e.approx = function() {return Math.abs(x.approx());};
 return Object.freeze(e);
ak.overload(ak.abs, ak.EXPRESSION_T, abs);

function neg(x) {
 var e = new Expression();
 e.SUB    = 'neg';
 e.arg    = function() {return x;};
 e.approx = function() {return -x.approx();};
 return Object.freeze(e);
ak.overload(ak.neg, ak.EXPRESSION_T, neg);

These are pretty good exemplars for how operators work for expression objects; rather than performing calculations, they create new expressions object with methods that implement the arithmetic operations that they represent. In addition to the class method that we use to distinguish between different types of expressions, they provide an arg method to provide access to the argument of the function.
Binary operators work in much the same fashion, as shown in listing 7.

Listing 7: ak Expression add And mul
function add(lhs, rhs) {
 var e = new Expression();
 e.SUB    = 'add';
 e.lhs    = function() {return lhs;};
 e.rhs    = function() {return rhs;};
 e.approx = function() {return lhs.approx()+rhs.approx();};
 return Object.freeze(e);
ak.overload(ak.add, [ak.EXPRESSION_T, ak.EXPRESSION_T], add);

function mul(lhs, rhs) {
 var e = new Expression();
 e.SUB    = 'mul';
 e.lhs    = function() {return lhs;};
 e.rhs    = function() {return rhs;};
 e.approx = function() {return lhs.approx()*rhs.approx();};
 return Object.freeze(e);
ak.overload(ak.mul, [ak.EXPRESSION_T, ak.EXPRESSION_T], mul);

Here we provide lhs and rhs methods for access to both of the arguments of the binary operators.

The full set of expression overloads is given in listing 8.

Listing 8: ak Expression Overloads
ak.overload(ak.abs,  ak.EXPRESSION_T, abs);
ak.overload(ak.acos, ak.EXPRESSION_T, acos);
ak.overload(ak.asin, ak.EXPRESSION_T, asin);
ak.overload(ak.atan, ak.EXPRESSION_T, atan);
ak.overload(ak.cos,  ak.EXPRESSION_T, cos);
ak.overload(ak.cosh, ak.EXPRESSION_T, cosh);
ak.overload(ak.exp,  ak.EXPRESSION_T, exp);
ak.overload(ak.inv,  ak.EXPRESSION_T, inv);
ak.overload(ak.log,  ak.EXPRESSION_T, log);
ak.overload(ak.neg,  ak.EXPRESSION_T, neg);
ak.overload(ak.sin,  ak.EXPRESSION_T, sin);
ak.overload(ak.sinh, ak.EXPRESSION_T, sinh);
ak.overload(ak.sqrt, ak.EXPRESSION_T, sqrt);
ak.overload(ak.tan,  ak.EXPRESSION_T, tan);
ak.overload(ak.tanh, ak.EXPRESSION_T, tanh);

ak.overload(ak.add,  [ak.EXPRESSION_T, ak.EXPRESSION_T], add);
ak.overload(ak.cmp,  [ak.EXPRESSION_T, ak.EXPRESSION_T], cmp);
ak.overload(ak.dist, [ak.EXPRESSION_T, ak.EXPRESSION_T], dist);
ak.overload(ak.div,  [ak.EXPRESSION_T, ak.EXPRESSION_T], div);
ak.overload(ak.eq,   [ak.EXPRESSION_T, ak.EXPRESSION_T], eq);
ak.overload(,   [ak.EXPRESSION_T, ak.EXPRESSION_T], ge);
ak.overload(,   [ak.EXPRESSION_T, ak.EXPRESSION_T], gt);
ak.overload(ak.le,   [ak.EXPRESSION_T, ak.EXPRESSION_T], le);
ak.overload(,   [ak.EXPRESSION_T, ak.EXPRESSION_T], lt);
ak.overload(ak.mod,  [ak.EXPRESSION_T, ak.EXPRESSION_T], mod);
ak.overload(ak.mul,  [ak.EXPRESSION_T, ak.EXPRESSION_T], mul);
ak.overload(,   [ak.EXPRESSION_T, ak.EXPRESSION_T], ne);
ak.overload(ak.pow,  [ak.EXPRESSION_T, ak.EXPRESSION_T], pow);
ak.overload(ak.sub,  [ak.EXPRESSION_T, ak.EXPRESSION_T], sub);

The implementations of all of the ak expression objects are in Expression.js.

Variable Expressions

Expression objects really come into their own once we add support for variables, as is done in listing 9.

Listing 9: ak.varExpr
var varID  = zero;
var varDef = ak.intExpr(zero);

ak.varExpr = function() {
 var e = new Expression();
 var state = {id: varID};
 varID = ak.add(varID, one);

 e.SUB   = 'var';    = function() {return;};
 e.value = function() {return varValue(e, state, arguments);};

 e.approx = function()  {return state.val.approx();};;
 e.exact  = function(i) {return state.val.exact(i);};
 e.sign   = function()  {return state.val.sign()};


 return Object.freeze(e);

function varValue(e, state, args) {
 if(args.length>0) {
  state.val = args[0];
 return state.val;

The varExpr forwards to the approx, exact and sign methods of the value it contains, which we can set by passing an argument to its value method which, with or without an argument, returns the variable's value.
Given such variable expressions it is a trivial matter to construct function expressions, as shown by program 3.

Program 3: Function Expressions

As convenient as this undoubtedly is, the really important feature of ak.varExpr is that instances have identities, each returning a unique value from a call to its id method. This might not seem like such a big deal, but it's absolutely essential.

Computer Algebra Systems

Figure 2: My Calculator Is Pretty Damn Cool

Figure 3: Closed Form Differentiation

Figure 4: Closed Form Integration

Figure 5: Well Maybe Not That Cool
The combination of identifiable variables and a traversable expression trees makes it possible to implement a function that simplifies an expression. For example, the expression \(x \times y \div x\) can be simplified to \(y\). This is pretty much how computer algebra systems like Mathematica, MathCAD and even some top of the range calculators operate, as illustrated in figure 2.

We can go further still; if simplification can transform an expression tree than why can't it do the same for differentials, or integrals or any other algebraic manipulations?

Figure 3 illustrates the calculation of the derivative of \(e^x\) at 1 and conclusively demonstrates that computer algebra systems can completely avoid the problems that arise from cancellation error when approximating it using finite differences.
Figure 4 shows the calculation of the indefinite integral of \(x \ln(x)\). This is quite a tricky integral unless you are familiar with the technique of integration by parts, which my calculator evidently is.
To check that this is the correct answer we need only differentiate it and confirm that we get the expression being integrated.
\[ \begin{align*} \frac{\mathrm{d}}{\mathrm{d}x}&\left(\frac{x^2 \ln x}{2} - \frac{x^2}{4} + c \right)\\ &= \left(\frac{2x \ln x}{2} + \frac{x^2}{2x}\right) - \left(\frac{2x}{4}\right)\\ &= x \ln x + \frac{x}{2} - \frac{x}{2}\\ &= x \ln x \end{align*} \]
Closed form results are impossible to achieve in general, however. As an example, consider the expression
\[ \int_{-\infty}^c e^{-x^2}\mathrm{d}x \]
My calculator’s evaluation of this is somewhat less impressive, as shown in figure 5.

In practice computer algebra systems are extremely good at applying lengthy sequences of relatively simple symbolic manipulations but tend to struggle when more subtle sequences of transformations are required.

Exact Evaluation

Now let’s finally return to the issue of exact evaluation.
I’m afraid that I must admit that I’m not entirely sure how to do it. Deriving spigot algorithms for specific values is one thing, but for functions it's something else entirely.

It might not be too unreasonably difficult for basic arithmetic, but I suspect that for numerical algorithms such as integration and differentiation it might prove a little trickier.
Unfortunately there is one major problem that such an approach cannot address. Recall that we use NaNs to indicate that we have exhausted every non-zero digit after the decimal point.
For many expressions it is not a simple task to determine whether or not we have exhausted its digits. As an example, consider the expression
\[ \sin^2 x + \cos^2 x - 1 \]
This is exactly equal to zero for every value of \(x\), but the first two terms won’t be exactly representable as a decimal fractions for almost all values of \(x\). This expression shall inevitably be trapped in the endless production of zeros, hopelessly searching for a non-zero trailing digit.
The only way we shall escape this fate is to implement a full blown computer algebra system that can simplify all such difficult expressions into forms that can be computed in finite time.

Derivation 1: The Consequence Of Inconsistency
Using the common notation where \(\wedge\) means and, \(\vee\) means or, \(\neg\) means not and \(\Rightarrow\) means therefore, assume that for an axiomatic system \(A\) and a proposition \(P\) we have proven that \(P \wedge \neg P\).
Then for a proposition \(Q\) we have
\[ \begin{align*} &P \wedge \neg P\\ \Rightarrow &P \vee Q\\ \Rightarrow &\neg (\neg P \wedge \neg Q)\\ \Rightarrow &\neg (\neg Q)\\ \Rightarrow &\neg \neg Q\\ \Rightarrow &Q \end{align*} \]
If we define \(Q\) as \(\neg R\) then we have trivially demonstrated that we can also prove any proposition false.

Note that this also means that the proposition

\(A\) is consistent

is provably true.
Unfortunately no such system currently exists and, I am sorry to report, no such system ever will.
In 1931 Kurt Gödel proved that for any sufficiently complex axiomatic system, that is to say a set of definitions, assumed truths and definition of valid propositions, there are either infinitely many propositions that cannot be proved or disproved or there exists a proposition that can be proven both true and false. Now you might think that one little contradiction isn't such a terrible price to pay to be able to prove every proposition true or false, but unfortunately axiomatic systems are tremendously sensitive to contradictions since they imply that every proposition can be proven both true and false, as shown in derivation 1.

The bar for complexity is surprisingly low. Any system of axioms from which we can derive integer arithmetic with both addition and multiplication will fall foul of Gödel's remarkable proof. This caused something of a stir in the mathematical community, who had hitherto been labouring under the illusion that with such basic mathematical foundations it should, at least in principle, be possible to prove or disprove every proposition.
Modern mathematicians have come to terms with the fact that there are undecidable propositions because the alternative is to render all of mathematics trivial and uninteresting.

Gödel went on to show that it is impossible to prove that any axiomatic system can be proven to be consistent without the addition of further axioms, creating a new system that is subject to exactly the same problem. Unless, that is, the system is actually inconsistent, as was illustrated in derivation 1. At some point you simply have to take it on faith and assume that a system of axioms is consistent.
The implication that the axiomatic foundations of modern mathematics may not be consistent might be of some concern to those of a philosophical bent, but for the jobbing mathematician the fact that a hundred years or so of intense scrutiny hasn't yet revealed any inconsistencies leads to a certain confidence that they are.
Alan Turing took things a step further by demonstrating that it wasn’t even always possible to determine whether a proposition was decidable or not.

So in principal expression objects seem promising, but in practice they are not capable of entirely solving the problem of exact numerical computing. Next!


[1] You're Going To Have To Think! Why Fixed Point Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[2] You're Going To Have To Think! Why Arbitrary Precision Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[3] You're Going To Have To Think! Why Rational Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[4] Gibbons, J. Unbounded Spigot Algorithms for the Digits of Pi, American Mathematical Monthly. Vol. 113. No. 4. Pages 318-328, 2006.

Based upon an article I wrote for ACCU's Overload magazine.

Leave a comment