You’re Going To Have To Think! Why Fixed Point Arithmetic Won’t Cure Your Floating Point Blues

| 0 Comments

Fixed point arithmetic is perhaps the simplest alternative to floating point. Fixed point numbers maintain a fixed number of digits after the point, rather than a fixed number of digits of precision. Typically they are represented by an integer with the assumption that some constant number of the least significant base 10 digits fall below the decimal point. For example, assuming 2 decimal places, we would represent π with 314.
Listing 1 provides an implementation of a base 10 fixed point number class.

Listing 1: ak.fixed
ak.FIXED_T = 'ak.fixed';

function Fixed(){}
Fixed.prototype = {TYPE: ak.FIXED_T, valueOf: function(){return ak.NaN;}};

ak.fixed = function() {
 var f     = new Fixed();
 var arg0  = arguments[0];
 var state = {places: 0, digits: 0, scale: 1};

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 if(!(state.places>=0 && state.places<=ak.DEC_DIG)) {
  throw new Error('invalid decimal places in ak.fixed');
 }

 if     (state.digits<-ak.DEC_MAX) state.digits = -ak.INFINITY;
 else if(state.digits> ak.DEC_MAX) state.digits =  ak.INFINITY;

 state.scale = Math.pow(10, state.places);

 f.digits = function() {return state.digits;};
 f.places = function() {return state.places;};
 f.scale  = function() {return state.scale;};

 f.toNumber = function() {return state.digits/state.scale;};
 f.toString = function() {return toString(state);};

 return Object.freeze(f);
};

var constructors = {};

This is a typical example of how types are implemented in the ak library. We use JavaScript's prototypical class mechanism to provide a shared TYPE string that we'll use to dispatch our overloaded function calls and a valueOf method that we'll return to shortly. We then dynamically add methods that reference private state via closures and finally freeze the object so that it can't be changed.

ak.fixed Constructors

Object construction is managed by a hand rolled dynamic dispatch to a constructors object based on the types of the arguments, as shown in listing 2.

Listing 2: ak.fixed Constructors
constructors[ak.NUMBER_T] = function(state, places, args) {
 var arg1 = args[1];
 state.places = ak.trunc(places);
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1);
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, digits) {
 state.digits = ak.trunc(digits);
};

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
 state.digits = 0;
};

constructors[ak.OBJECT_T] = function(state, obj) {
 state.places = (ak.nativeType(obj.places)===ak.FUNCTION_T)
              ? ak.trunc(obj.places()) 
              : ak.trunc(obj.places);

 state.digits = (ak.nativeType(obj.digits)===ak.FUNCTION_T)
              ? ak.trunc(obj.digits())
              : ak.trunc(obj.digits);
};

Here, as with our approach to overloading, we rely upon JavaScript's provision for accessing members with an array syntax. If the first argument of ak.fixed is a Number then it is interpreted as the number of decimal places. If it is followed by a second Number then that is interpreted as the digits. If there's no second argument, then the fixed point number will be equal to zero and if a second argument is supplied and it is not a Number, then the dispatch will fail and an exception will be thrown.
If ak.fixed is passed an Object then the places and digits members of its state will be set to the places and digits members of that object or, if they are functions, with the results of calling them. This allows us to initialise an ak.fixed object with another ak.fixed object or with an initialiser of the form {places:places, digits:digits}.

Once the constructors have initialised the state we enforce its validity. Firstly, we throw an exception if the required number of decimal places exceeds the number of decimal digits of precision, given by ak.DEC_DIG. Secondly, we overflow to \(\pm\infty\) if the magnitude of the digits exceeds the maximum integer with as many digits as the decimal precision allows, given by ak.DEC_MAX.
We then compute the scale; the power of 10 by which the digits should be divided to recover the value.

ak.fixed Methods

Arithmetic objects in the ak library have few, if any, methods beyond read access to member data and type conversions. ak.fixed is again typical, providing read access to the digits, the number of decimal places and the scale, and conversions to numbers and strings.
The reason that there are no methods that change the member data is because ak.fixed is supposed to be a number, rather than an object containing a number. Changing its places or digits would therefore make no more sense than changing the digits of 42.
Having immutable state also confers the significant advantage that we only need to validate it during initialisation.

The operation of the accessors and conversion to number should be self-evident and the implementation of toString is given in listing 3.

Listing 3: ak.fixed toString
function toString(state) {
 var r, l;

 if(state.places===0 || !isFinite(state.digits)) {
  return state.digits.toFixed(0);
 }

 r = state.digits % state.scale;
 l = (state.digits-r) / state.scale;
 r = state.scale + Math.abs(r);

 return l.toFixed(0) + '.' + r.toFixed(0).slice(1);
}

This function has a few subtleties including, arguably, one outright hack. To understand what it's doing, it's important to know exactly how the % operator works in JavaScript. It is defined as the remainder of a division that is rounded towards zero or, equivalently, by
\[ a\,\%\,b = \mathrm{sign}(a) \times (|a|\,\%\,|b|) \]
where \(\mathrm{sign}(a)\) is the sign of \(a\).

The result of subtracting r from l is consequently the exact multiple of state.scale with the greatest magnitude less than or equal to that of l and dividing it by state.scale is exactly equal to the integer part of the fixed point number.
The arguably hacky step is replacing r with the result of adding state.scale to its absolute value. That we should use its absolute value should be clear since the sign is already represented in the integer part. That we should add the scale is, I think, less clear.
The reason that we do this is because the fractional part of the fixed point value might have leading zeros. To make sure that such zeros appear in the string representation we add a leading non-zero digit to it which we subsequently remove when we construct the string. As it happens popping a 1 in front of ak.DEC_MAX results in a number smaller than the largest representable integer, so we can be sure that adding the scale will not cause any rounding and we can simply strip the first character of the string representation of the result to yield, leading zeros and all, that of the fractional part.

Now it might seem a bit odd that the toString method returns an actual string representation of the fixed point value whilst the valueOf method always returns ak.NaN, allowing automatic conversion to strings but not to numbers, which require an explicit call to toNumber instead.
The reason for this is that it's all too easy to forget to use the ak overloaded functions and use standard JavaScript ones instead. If I were to allow automatic conversion to Number, then just one such mistake could be enough to switch an entire calculation back to floating point, leading to extremely subtle bugs.
Unfortunately, if an object that doesn't support valueOf is used where a number is expected, JavaScript will attempt to parse the result of toString as a number. By adding a valueOf method that always returns ak.NaN, I can be certain that automatic conversions to numbers are guaranteed to fail.
toString is comparatively trouble free, however, and is actually quite convenient from a UI perspective. It's important to note, however, that we cannot rely upon automatic conversion to strings when concatenating with the + operator since valueOf will be called instead of toString.

ak.fixed Overloads

Given that ak.fixed has very few methods, it should come as no great surprise that most of its behaviour is determined by overloaded functions, the simplest of which being ak.abs and ak.neg, as illustrated by listing 4.

Listing 4: ak.fixed abs And neg
function abs(f) {
 return ak.fixed(f.places(), Math.abs(f.digits()));
}
ak.overload(ak.abs, ak.FIXED_T, abs);

function neg(f) {
 return ak.fixed(f.places(), -f.digits());
}
ak.overload(ak.neg, ak.FIXED_T, neg);

Note that these functions do not manipulate ak.fixed state directly, but rather operate upon the values returned by the property accessors, constructing a new object for the result. These too are typical examples of how overloaded functions will work in the ak library.

The next simplest functions are comparisons, some of which are given in listing 5.

Listing 5: ak.fixed Comparisons
function eq(f0, f1) {
 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed eq');
 }
 return f0.digits()===f1.digits();
}
ak.overload(ak.eq,  [ak.FIXED_T, ak.FIXED_T], eq);

function lt(f0, f1) {
 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed lt');
 }
 return f0.digits()<f1.digits();
}
ak.overload(ak.lt,  [ak.FIXED_T, ak.FIXED_T], lt);

function cmp(f0, f1) {
 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed cmp');
 }
 return ak.cmp(f0.digits(), f1.digits());
}
ak.overload(ak.cmp, [ak.FIXED_T, ak.FIXED_T], cmp);

The important thing to note about these functions is that they throw exceptions if their arguments have different numbers of decimal places. Defining the results of operations with such arguments is a rather tricky proposition, so I have taken the easy route of treating them as if they were entirely distinct types. Having made that decision the comparison function can simply compare the digits of their arguments.

Addition and subtraction are equally straightforward, as shown in listing 6.

Listing 6: ak.fixed add And sub
function add(f0, f1) {
 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed add');
 }
 return ak.fixed(f0.places(), f0.digits()+f1.digits());
}
ak.overload(ak.add, [ak.FIXED_T, ak.FIXED_T], add);

function sub(f0, f1) {
 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed sub');
 }
 return ak.fixed(f0.places(), f0.digits()-f1.digits());
}
ak.overload(ak.sub, [ak.FIXED_T, ak.FIXED_T], sub);

Note that, overflow aside, these functions introduce no rounding errors into the result.

Multiplication and division require a little more thought since we must take into account the scale. Given two fixed point numbers \(x_1\) and \(x_2\) we have
\[ \begin{align*} x_1 &= \frac{d_1}{s}\\ x_2 &= \frac{d_2}{s} \end{align*} \]
for integer digits \(d_1\) and \(d_2\) and common scale \(s\). Multiplication and division therefore yield
\[ x_1 \times x_2 = \frac{d_1 \times d_2}{s \times s} = \frac{(d_1 \times d_2) \div s}{s}\\ x_1 \div x_2 = \frac{d_1 \div d_2}{s \div s} = \frac{(d_1 \div d_2) \times s}{s} \]
Note that the numerators of these fractions are not necessarily integers so we must round them, as shown in listing 7.

Listing 7: ak.fixed mul And div
function mul(f0, f1) {
 var dig = ak.round(f0.digits()*f1.digits()/f0.scale());

 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed mul');
 }
 return ak.fixed(f0.places(), dig);
}
ak.overload(ak.mul, [ak.FIXED_T, ak.FIXED_T], mul);

function div(f0, f1) {
 var dig = ak.round(f0.scale()*f0.digits()/f1.digits());

 if(f0.places()!==f1.places()) {
  throw new Error('places mismatch in ak.fixed div');
 }
 return ak.fixed(f0.places(), dig);
}
ak.overload(ak.div, [ak.FIXED_T, ak.FIXED_T], div);

We can be sure that the result of these rounding operations will be as expected since even the largest decimal integers have a few unused bits of precision that can be safely lost during the floating point multiplications and divisions.

The full set of ak.fixed overloads is given in listing 8.

Listing 8: ak.fixed Overloads
ak.overload(ak.abs,  ak.FIXED_T, abs);
ak.overload(ak.inv,  ak.FIXED_T, inv);
ak.overload(ak.neg,  ak.FIXED_T, neg);
ak.overload(ak.sqrt, ak.FIXED_T, sqrt);

ak.overload(ak.add, [ak.FIXED_T, ak.FIXED_T], add);
ak.overload(ak.cmp, [ak.FIXED_T, ak.FIXED_T], cmp);
ak.overload(ak.div, [ak.FIXED_T, ak.FIXED_T], div);
ak.overload(ak.eq,  [ak.FIXED_T, ak.FIXED_T], eq);
ak.overload(ak.ge,  [ak.FIXED_T, ak.FIXED_T], ge);
ak.overload(ak.gt,  [ak.FIXED_T, ak.FIXED_T], gt);
ak.overload(ak.le,  [ak.FIXED_T, ak.FIXED_T], le);
ak.overload(ak.lt,  [ak.FIXED_T, ak.FIXED_T], lt);
ak.overload(ak.mod, [ak.FIXED_T, ak.FIXED_T], mod);
ak.overload(ak.mul, [ak.FIXED_T, ak.FIXED_T], mul);
ak.overload(ak.ne,  [ak.FIXED_T, ak.FIXED_T], ne);
ak.overload(ak.sub, [ak.FIXED_T, ak.FIXED_T], sub);

The full source for ak.fixed can be found in Fixed.js.

The Strengths Of Fixed Point

So now we know exactly how to implement fixed point numbers we should probably ask ourselves why we should want to.
Well, there are three principal advantages to fixed point numbers.

The first, and by far the least important, advantage is that unlike IEEE 754-1985 floating point, it can exactly represent decimal fractions. This is trotted out ad nauseum as a reason to use fixed point but, given that a revision of the standard describes the implementation of base 10 floating point numbers, it isn’t particularly compelling.
The second, slightly more important, advantage is that, if implemented with integers, fixed point arithmetic can be dramatically faster on some platforms. Unfortunately JavaScript isn't one of them.
The third, and by far the most important, advantage is that, as noted above, addition and subtraction introduce no error beyond that in their arguments, as illustrated in program 1.

Program 1: Fixed Versus Floating Point Addition

This means that it is much easier to reason about addition using fixed point arithmetic; if we are summing numbers that can be exactly represented we have no rounding issues at all and need only worry about overflow.
This, together with the ability to represent some decimal fractions exactly is why many financial transactions are mandated to use base 10 fixed point numbers.

The Weaknesses Of Fixed Point

Unfortunately, multiplication and division are rather more problematic.

Our innate bias for decimal numbers means that we can make the mistake of attributing too much significance to the fact that decimal fixed point numbers can exactly represent decimal fractions (at least those that don't have too many digits after the point) and too little to the fact that they don't handle other kinds of fractions particularly well, as illustrated in program 2.

Program 2: Fixed Versus Floating Point Division

I don't for a minute think that anybody would be surprised by this, but suspect that that is exactly why it's not considered as big a problem as decimal fractions not being exactly representable in binary floating point; we're so used to it that it doesn't seem to be a particularly big deal.
If we were to use decimal floating point instead of fixed point we could reduce the magnitude of such rounding errors to not much greater than those of binary floating point, but the point is that a decimal representation doesn't remove rounding errors, it just moves them.
Given that binary representations pack a wider range of numbers into their bits, and consequently have smaller rounding errors in general, we would be well advised to lose our unreasonable attachment to decimals.

The focus on places, rather than precision, means that fixed point numbers of any base are severely affected by execution order in expressions involving multiplication and division. To demonstrate by just how much, let us now assume that we are using 2 decimal place fixed point numbers to calculate the value of the expression
\[ 1,000 \times 1,000 \div 1,000,000 \]
If we calculate this in the order
\[ (1,000 \times 1,000) \div 1,000,000 \]
we get a result of one, whereas if we calculate it in the order
\[ 1,000 \times (1,000 \div 1,000,000) \]
we get a result of zero.
This is a significantly less accurate result than that we get with floating point, as illustrated in program 3.

Program 3: Sensitivity To Execution Order

The problem is that multiplicative errors accumulate proportionally rather than absolutely. Additively fixed point numbers behave themselves but the instant we start multiplying and dividing they start acting up.

A further problem with fixed point numbers is that, because they firmly nail down the position of the point, they overflow much more quickly than floating point numbers, as demonstrated by program 4.

Program 4: Fixed Point Overflow

Because of these weaknesses, fixed point isn't particularly suitable for general purpose arithmetic and is consequently not a credible replacement for floating point, at least in this form.


Based upon an article I wrote for ACCU's Overload magazine.

Leave a comment