You're Going To Have To Think! Why Interval Arithmetic Won't Cure Your Floating Point Blues


So far we have discussed floating point arithmetic[1], fixed point arithmetic[2], arbitrary precision arithmetic[3], rational arithmetic[4] and computer algebra[5] and have found that, with the notable exception of the unfortunately impossible to achieve infinite precision that the last of these seems to promise, they all suffer from the problems that floating point arithmetic is so often criticised for.

No matter what number representation we use, we shall in general have to think carefully about how we use them.
We must design our algorithms so that we keep approximation errors in check, only allowing them to grow as quickly as is absolutely unavoidable.
We must ensure that these algorithms can, as far as is possible, identify their own modes of failure and issue errors if and when they arise.
Finally, we must use them cautiously and with full awareness of their numerical properties.

Now, all of this sounds like hard work, so it would be nice if we could get the computer to do the analysis for us.
Whilst it would be extremely difficult to automate the design of stable numerical algorithms, thankfully there is a numeric type that can keep track of the errors as they accumulate.

Interval Arithmetic

Recall that, assuming we are rounding to nearest, the basic floating point arithmetic operations are guaranteed to introduce a proportional error of no greater than \(1 \pm \frac{1}{2} \epsilon\). It’s not wholly unreasonable to assume that more complex floating point functions built into our chips and libraries will also do no worse than \(1 \pm \frac{1}{2} \epsilon\).
This suggests that it might be possible to represent a number with upper and lower bounds of accuracy and propagate the growth of these bounds through arithmetic operations.
To capture the upper and lower bound of the result of a calculation accurately we need to perform it twice; once rounding towards minus infinity and once rounding towards plus infinity. Unfortunately, JavaScript provides no facility for manipulating the rounding mode.

We shall, instead, propagate proportional errors at a rate of \(1 \pm 1\frac{1}{2} \epsilon\) since we shall be able to do this without switching the IEEE rounding mode. Specifically, we can multiply a floating point result by \(1+\epsilon\) to get an upper bound and multiply it by \(1-\epsilon\) to get a lower bound. The rate of error propagation results from widening the \(1 \pm \frac{1}{2} \epsilon\) proportional error in the result by this further \(1 \pm \epsilon\) to yield the bounds.
Naturally, this will not work correctly for denormalised numbers since they have a zero rather than a one before the decimal point, so we shall have to treat them as a special case. Specifically, rather than multiplying by \(1 \pm \epsilon\), we shall add and subtract the smallest non-zero floating point number, conveniently provided by ak.MIN_VALUE.

Before we apply rounding error to the upper and lower bounds of the result a calculation we shall want them to represent the largest and the smallest possible result respectively. We therefore define the basic arithmetic operations as follows.
\[ \begin{align*} (a,b)+(c,d) &= (a+c,b+d)\\ (a,b)-(c,d) &= (a-d,b-c)\\ (a,b)\times(c,d) &= (\min(a \times c, a \times d, b \times c, b \times d), \max(a \times c, a \times d, b \times c, b \times d))\\ (a,b)\div(c,d) &= (\min(a \div c, a \div d, b \div c, b \div d), \max(a \div c, a \div d, b \div c, b \div d)) \end{align*} \]
As an example, consider the square root of 2. Working with 3 accurate decimal digits of precision, this would be represented by the interval
\[ (1.413,1.415) \]
Squaring this result would yield
\[ (1.413 \times 1.413 - 0.001, 1.415 \times 1.415 + 0.001) \]
\[ (1.996, 2.003) \]
which clearly straddles the exact result of 2 and furthermore gives a good indication of the numerical error.
Unfortunately, as is often the case, the devil is in the details.

The first thing we need to decide is whether the bounds are open or closed; whether the value represented is strictly between the bounds or can be equal to them. If we choose the former we cannot represent those numbers which are exact in floating point, like zero for example, so we shall choose the latter.
However, if either of the bounds is infinite we should prefer to consider them as open so as to simplify the rules of our interval arithmetic. To ensure consistency we shall map a lower bound of \(+\infty\) to the maximum floating point value and an upper bound of \(-\infty\) to the negated maximum floating point value.

A consequence of treating infinite bounds as open is that multiplying them by zero can yield zero rather than NaN. In particular, we have as special case of
\[ (-\infty, \infty) \times (0, 0) = (0, 0) \]
and hence
\[ \frac{(0,0)}{(0,0)} = (-\infty, \infty) \]
This might seem a bit odd, but it makes perfect sense once we realise that (-∞, ∞) is pronounced any number.
Making certain assumptions we can consistently, albeit not particularly mathematically soundly, define division by intervals containing zero or either infinity as follows. Given
\[ \begin{align*} 0 & < a \leqslant b\\ 0 & < c\\ 0 & < d\\ 0 & < e\\ 0 & < f \end{align*} \]
we define
\[ \begin{align*} (-\infty,\infty) &= \frac{(a,b)}{(0,0)} = \frac{(-b,-a)}{(0,0)} = \frac{(-c,d)}{(0,0)} = \frac{(0,0)}{(0,0)}\\ &= \frac{(a,b)}{(-e,f)} = \frac{(-b,-a)}{(-e,f)} = \frac{(-c,d)}{(-e,f)} = \frac{(0,0)}{(-e,f)}\\ &= \frac{(-c,d)}{(0,f)} = \frac{(-c,d)}{(-e,0)}\\ (0,\infty) &= \frac{(a,b)}{(0,f)} = \frac{(-b,-a)}{(-e,0)} = \frac{(0,d)}{(0,f)} = \frac{(-c,0)}{(-e,0)}\\ (-\infty,0) &= \frac{(a,b)}{(-e,0)} = \frac{(-b,-a)}{(0,f)} = \frac{(0,d)}{(-e,0)} = \frac{(-c,0)}{(0,f)} \end{align*} \]
\[ \begin{align*} (0,\infty) &= \frac{(d,\infty)}{(f,\infty)} = \frac{(-\infty,-c)}{(-\infty,-e)}\\ (-\infty,0) &= \frac{(d,\infty)}{(-\infty,-e)} = \frac{(-\infty,-c)}{(f,\infty)} \end{align*} \]

An Interval Class

Listing 1 gives the definition of an interval number class.

Listing 1: ak.interval
ak.INTERVAL_T = 'ak.interval';

function Interval(){}
Interval.prototype = {TYPE: ak.INTERVAL_T, valueOf: function(){return ak.NaN;}};

ak.interval = function() {
 var i     = new Interval();
 var arg0  = arguments[0];
 var state = {lb: 0, ub: 0};

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 if(isNaN( || isNaN(state.ub)) { = ak.NaN;
  state.ub = ak.NaN;

 if( ak.INFINITY) =  ak.MAX_VALUE;
 if(state.ub===-ak.INFINITY) state.ub = -ak.MAX_VALUE;  = function() {return;};
 i.ub  = function() {return state.ub;};
 i.mid = function() {return 0.5*(;};

 i.toNumber = i.mid;
 i.toString = function() {return '[' + + ',' + state.ub + ']';};

 i.toExponential = function(d) {
  return '['','+state.ub.toExponential(d)+']';

 i.toFixed = function(d) {
  return '['','+state.ub.toFixed(d)+']';

 i.toPrecision = function(d) {
  return '['','+state.ub.toPrecision(d)+']';

 return Object.freeze(i);

var constructors = {};

As usual, the class only provides property accessor and type conversion methods, which all have very simple implementations. The lb and ub properties are the defining elements, being the lower and upper bounds of the interval respectively.
The state is initialised by dispatching calls to a constructors object based on the types of the arguments after which it is validated by propagating NaNs to both bounds and handling infinites as described above.

ak.interval Constructors

The constructors are similar in structure to those of our other classes, as shown in listing 2.

Listing 2: ak.interval Constructors
constructors[ak.NUMBER_T] = function(state, x, args) {
 var arg1 = args[1];
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, x, arg1);

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, lb, ub) {
 var i = sort(Number(lb), Number(ub)); =;
 state.ub = i.ub;

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state, x) {
 var y = Number(x);
 var i = widen(y, y) =;
 state.ub = i.ub;

constructors[ak.OBJECT_T] = function(state, obj) {
 var lb = ak.nativeType( ? Number( : Number(;
 var ub = ak.nativeType(obj.ub)===ak.FUNCTION_T ? Number(obj.ub()) : Number(obj.ub);
 var i  = sort(lb, ub); =;
 state.ub = i.ub;

Note that if the object is constructed with a pair of numbers, either explicitly or via properties of an initialising object, they are assumed to be the exact bounds of the interval and are put into the correct order with a call to sort. If constructed with a single number, it is assumed to be an inexact value and we add our conservative error estimates with a call to widen
The definitions of sort and widen are given in listing 3.

Listing 3: ak.interval sort And widen
var EPS_SUB = 1-ak.EPSILON;
var EPS_SUP = 1+ak.EPSILON;

function sort(x, y) {
 return x<=y ? {lb: x, ub: y} : {lb: y, ub: x};

function widenLB(lb) {
 if(lb>ak.MIN_NORMAL)       lb *= EPS_SUB;
 else if(lb<-ak.MIN_NORMAL) lb *= EPS_SUP;
 else                       lb -= ak.MIN_VALUE;
 return lb

function widenUB(ub) {
 if(ub>ak.MIN_NORMAL)       ub *= EPS_SUP;
 else if(ub<-ak.MIN_NORMAL) ub *= EPS_SUB;
 else                       ub += ak.MIN_VALUE;
 return ub;

function widen(lb, ub) {
 var i = sort(lb, ub); = widenLB(;
 i.ub = widenUB(i.ub);
 return i;

The sort function is simple enough, but widen could bear some further explanation. We have already noted that denormalised numbers must accumulate errors additively rather than multiplicitively, which we do in the final clauses of the widening functions.
The tricky bit is to make sure that the lower bound always widens towards minus infinity and the upper bound towards plus infinity. We do this by carefully choosing whether to multiplicatively widen normal valued bounds towards or away from zero depending on their signs.

ak.interval Overloads

As usual, overloaded functions do most of the work and, as usual, neg is amongst the simplest, as shown in listing 4.

Listing 4: ak.interval neg
function neg(x) {
 return ak.interval(-x.ub(),;
ak.overload(ak.neg, ak.INTERVAL_T, neg);

Since sign manipulation is exact, this function introduces no additional error to the interval.
Unusually, abs isn't one of the simplest, as shown in listing 5.

Listing 5: ak.interval abs
function abs(x) {
 if(>=0) return x;
 if(x.ub()<=0) return ak.interval(-x.ub(),;
 return ak.interval(0, Math.max(, x.ub()));
ak.overload(ak.abs, ak.INTERVAL_T, abs);

The subtlety here is that if we have a negative lower bound and a positive upper bound, then the smallest absolute value in the interval is zero and not the smaller absolute bound.

In comparison, addition and subtraction are relatively simple, as illustrated by listing 6.

Listing 6: ak.interval add And sub
function add(lhs, rhs) {
 return ak.interval(widen(, lhs.ub()+rhs.ub()));
ak.overload(ak.add, [ak.INTERVAL_T, ak.INTERVAL_T], add);

function sub(lhs, rhs) {
 return ak.interval(widen(, lhs.ub();
ak.overload(ak.sub, [ak.INTERVAL_T, ak.INTERVAL_T], sub);

Things start getting a little tricky when it comes to multiplication, as shown in listing 7.

Listing 7: ak.interval mul
function mul(lhs, rhs) {
 var ll, lu, ul, uu, lb, ub;

 if(isNaN( || isNaN( return ak.interval(ak.NaN, ak.NaN);

 ll = *;
 lu = * rhs.ub();
 ul = lhs.ub() *;
 uu = lhs.ub() * rhs.ub();

 if(isNaN(ll)) ll = 0;
 if(isNaN(lu)) lu = 0;
 if(isNaN(ul)) ul = 0;
 if(isNaN(uu)) uu = 0;

 lb = Math.min(Math.min(ll, lu), Math.min(ul, uu));
 ub = Math.max(Math.max(ll, lu), Math.max(ul, uu));

 return ak.interval(widen(lb, ub));
ak.overload(ak.mul, [ak.INTERVAL_T, ak.INTERVAL_T], mul);

Now this is essentially the scheme described above where the bounds are set to the minimum and maximum of all possible products. However, since we are treating infinities as open bounds, we must take care to ensure that when multiplied by zero they yield zero rather than NaN. Since the only multiplicative operations that result in NaN are those involving a NaN or two or those of an infinity and zero, by handling the former with an early return we can safely replace those that arise from the latter with zeros.

Division is similarly based on that scheme, albeit with a few more corner cases, as illustrated in listing 8.

Listing 8: ak.interval div
function div(lhs, rhs) {
 var ll, lu, ul, uu, lb, ub;

 if(isNaN( || isNaN( return ak.interval(ak.NaN, ak.NaN);

 if(>0 || rhs.ub()<0) {
  ll = /;
  lu = / rhs.ub();
  ul = lhs.ub() /;
  uu = lhs.ub() / rhs.ub();

  if(isNaN(ll)) ll =  ak.INFINITY;
  if(isNaN(lu)) lu = -ak.INFINITY;
  if(isNaN(ul)) ul = -ak.INFINITY;
  if(isNaN(uu)) uu =  ak.INFINITY;

  lb = Math.min(Math.min(ll, lu), Math.min(ul, uu));
  ub = Math.max(Math.max(ll, lu), Math.max(ul, uu));

  return ak.interval(widen(lb, ub));

 if(( < 0 && rhs.ub() > 0)
 || ( && rhs.ub()===0)
 || ( < 0 && lhs.ub() > 0)) {
  lb = -ak.INFINITY;
  ub =  ak.INFINITY;
 else if(( &&>=0)
      || (rhs.ub()===0 && lhs.ub()<=0)) {
  lb =  0;
  ub =  ak.INFINITY;
 else if(( && lhs.ub()<=0)
      || (rhs.ub()===0 &&>=0)) {
  lb = -ak.INFINITY;
  ub =  0;
 return ak.interval(lb, ub);
ak.overload(ak.div, [ak.INTERVAL_T, ak.INTERVAL_T], div);

Once again, we handle NaNs with an early return to simplify later cases.
If the right hand side of the division does not include zero, if the upper bound is negative or the lower bound is positive, we have a similar implementation to that of multiplication. The only possible source of NaNs is the division of one infinity by another and, given that infinities represent open bounds and that arbitrarily large numbers divided by arbitrarily large numbers result in arbitrarily large numbers, they should be replaced with infinities. Since ak.interval ensures that lower bounds cannot be +∞ and upper bounds cannot be -∞ we can easily identify which NaNs should correspond to which infinities.
The last three cases handle division by zero in what I hope is a reasonably consistent fashion. If the right hand side straddles, or is identically equal to, zero then the result of the division can be any number, as represented by (-∞, +∞). Similarly, if the left hand side straddles zero and the right hand side includes zero then the result could be any number. Finally, if the left and right hand sides represent any number of a given sign, including zero, we return the relavent semi-infinite interval.

Listing 9 shows the full complement of ak.interval overloads.

Listing 9: ak.interval Overloads
ak.overload(ak.abs,  ak.INTERVAL_T, abs);
ak.overload(ak.acos, ak.INTERVAL_T, acos);
ak.overload(ak.asin, ak.INTERVAL_T, asin);
ak.overload(ak.atan, ak.INTERVAL_T, atan);
ak.overload(ak.cos,  ak.INTERVAL_T, cos);
ak.overload(ak.cosh, ak.INTERVAL_T, cosh);
ak.overload(ak.exp,  ak.INTERVAL_T, exp);
ak.overload(ak.inv,  ak.INTERVAL_T, inv);
ak.overload(ak.log,  ak.INTERVAL_T, log);
ak.overload(ak.neg,  ak.INTERVAL_T, neg);
ak.overload(ak.sin,  ak.INTERVAL_T, sin);
ak.overload(ak.sinh, ak.INTERVAL_T, sinh);
ak.overload(ak.sqrt, ak.INTERVAL_T, sqrt);
ak.overload(ak.tan,  ak.INTERVAL_T, tan);
ak.overload(ak.tanh, ak.INTERVAL_T, tanh);

ak.overload(ak.add,  [ak.INTERVAL_T, ak.INTERVAL_T], add);
ak.overload(ak.dist, [ak.INTERVAL_T, ak.INTERVAL_T], dist);
ak.overload(ak.div,  [ak.INTERVAL_T, ak.INTERVAL_T], div);
ak.overload(ak.eq,   [ak.INTERVAL_T, ak.INTERVAL_T], eq);
ak.overload(ak.mod,  [ak.INTERVAL_T, ak.INTERVAL_T], mod);
ak.overload(ak.mul,  [ak.INTERVAL_T, ak.INTERVAL_T], mul);
ak.overload(,   [ak.INTERVAL_T, ak.INTERVAL_T], ne);
ak.overload(ak.pow,  [ak.INTERVAL_T, ak.INTERVAL_T], pow);
ak.overload(ak.sub,  [ak.INTERVAL_T, ak.INTERVAL_T], sub);

The complete implementation of ak.interval can be found in Interval.js.

Using ak.interval

Note that the only comparisons provided are equality and inequality since intervals can overlap, or even be subsets and supersets of one another, making it rather difficult to define exactly what it means to say one is less than or greater than another.

Program 1 illustrates the use ak.interval of objects for calculations, specifically for solving quadratic equations using the formula we learnt as children; that
\[ a x^2 + b x + c = 0 \]
has solutions
\[ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} \]
Program 1: Using ak.interval

The functions solve and quad use the ak.calc RPN calculator, which in turn uses the overloaded functions, to compute the solutions to the quadratic equation and the value of the left hand side for a given argument.
I think that it's pretty clear that the results of these calculations both include the exact results and capture the uncertainty introduced by working with finite precision.
Note that we use the two argument ak.interval constructors for integers that we know can be exactly represented.


Unfortunately interval arithmetic is not entirely foolproof. One problem is that it can yield overly pessimistic results if an interval appears more than once in an expression. For example, consider the result of multiplying the interval \((-1, 1)\) by itself.
Ignoring rounding error, doing so yields
\[ \begin{align*} (-1, 1) \times (-1, 1) &= \left(\min(-1 \times -1, -1 \times 1, 1 \times -1, 1 \times 1), \max(-1 \times -1, -1 \times 1, 1 \times -1, 1 \times 1)\right)\\ &= (-1, 1) \end{align*} \]
rather than the correct result of \((0, 1)\).
This can be particularly troublesome if such expressions appear as the denominator in a division. For example, given
\[ \begin{align*} x &= (-1, 1)\\ y &= (\tfrac{1}{2}, 1)\\ z &= (0, 1) \end{align*} \]
and again ignoring rounding errors, we have
\[ \begin{align*} \frac{z}{x \times x + y} &= \frac{(0, 1)}{(-1, 1) + (\tfrac{1}{2}, 1)}\\ &= \frac{(0, 1)}{(-\tfrac{1}{2}, 2)}\\ &= (-\infty, \infty) \end{align*} \]
rather than
\[ \begin{align*} \frac{z}{x \times x + y} &= \frac{(0, 1)}{(0, 1) + (\tfrac{1}{2}, 1)}\\ &= \frac{(0, 1)}{(\tfrac{1}{2}, 2)}\\ &= (0, 2) \end{align*} \]
If we wish to ensure that such expressions yield as accurate results as possible we shall have to rearrange them so that we avoid aliasing. For example, rather than
\[ x \times x + 3x - 1 \]
we should prefer
\[ (x + 1.5)^2 - 3.25 \]
Provided we keep in mind the fact that interval arithmetic can be pessimistic we can still use it naively to give us a warning of possible precision loss during a calculation.
Unfortunately there is a much bigger problem that we must consider.

Precisely Wrong

Consider the finite difference approximation to the derivative
\[ \frac{\mathrm{d}f}{\mathrm{d}x} \approx \frac{f(x + \delta) - f(x)}{\delta} \]
Recall that such calculations are sensitive to cancellation error when \(\delta\) is very small. By using our interval type we might identify if such errors have occurred.
Listing 10 illustrates how we might do so for the derivative of the exponential function at 1 with integer i leading binary zeros in \(\delta\).

Listing 10: An Approximate Derivative
var d  = Math.pow(2, -i);
var x  = ak.interval(1, 1);
var xd = ak.interval(1+d, 1+d);
var df = ak.calc(xd,'e^',x,'e^','-');

var dfdx = ak.interval(, df.ub()/d);

Note that because we have chosen d to be a power of two then, so long as it's no smaller than the floating point epsilon, 1+d is exactly representable which is why we can safely use an exact interval to represent it. In the final line we are exploiting the fact that a division by a negative power of 2 is also exact.

Program 2 plots the number of accurate binary digits in the finite difference approximation in black and the number of precise binary digits in the finite difference formula in red against the position of the most significant binary digit in \(\delta\).

Program 2: Finite Difference Precision

Clearly there’s a linear relationship between the number of leading zeros of \(\delta\) and the lack of precision in the finite difference calculation. Unfortunately, it is initially in the opposite sense to that between the number of leading zeros and the accuracy of the approximation.
We must therefore be extremely careful to distinguish between these two types of error. If we consider precision alone we are liable to very precisely calculate the wrong number.

So, whilst intervals are an extremely useful tool for ensuring that errors in precision do not grow to significantly impact the accuracy of a calculation, they cannot be used blindly. As has consistently been the case, we shall have to think carefully about our calculations if we wish to have confidence in their results.


[1] You're Going To Have To Think! Why [Insert Technique Here] Won't Cure Your Floating Point Blues,, 2013.

[2] You're Going To Have To Think! Why Fixed Point Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[3] You're Going To Have To Think! Why Arbitrary Precision Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[4] You're Going To Have To Think! Why Rational Arithmetic Won’t Cure Your Floating Point Blues,, 2013.

[5] You're Going To Have To Think! Why Computer Algebra Won’t Cure Your Floating Point Blues,, 2013.

Based upon an article I wrote for ACCU's Overload magazine.

Leave a comment