As well as required arithmetic operations, such as addition, subtraction, multiplication and division, the IEEE 754 floating point standard^{[1]} has a number of recommended functions. For example
One recommended function that JavaScript does not provide, and which I should like to add to the
If we were working with a relatively low level language, such as C, then my first instinct when implementing this function would be to reinterpret the memory containing its floating point argument as an array of bytes and use bitwise operations to extract the sign bit as a boolean and the bits of the exponent and significand as integers. Recall that IEEE floating point numbers take the form
Now we could use integer operations to calculate the bits of the next floating point number, put them back into an array of bytes and finally reinterpret its contents as the floating point result.
Unfortunately, JavaScript isn't a relatively low level language and won't let us do this sort of thing, so we shall have to come up with an alternative approach.
For example, for positive normal values \(l_0\), \(u_0\), \(l_1\) and \(u_1\) we defined addition as
The final multiplication of the lower and upper bounds by \(1\epsilon\) and \(1+\epsilon\) has the advantage of being simple to implement but unfortunately frequently overestimates the effect of rounding. Nevertheless, the idea of using multiplication to find the next floating point number is an appealing one.
The first thing to note is that the difference between one and the greatest floating point number that is less than one is, in fact, \(\frac12\epsilon\) since the results of arithmetic operations are normalised to have a leading digit of one whenever possible and consequently
The next thing to note is that, even though \(1+\epsilon\) is the first floating point number greater than one, \(x \times (1+\epsilon)\) is not necessarily the first floating point number away from zero after \(x\), as demonstrated by program 1 which calculates the difference between \(x\) and the first floating point number that's less than \(x \times (1+\epsilon)\) for some randomly chosen values of \(x\).
A simple way to see why this fails is to multiply \(1\frac34\) by \(1+\epsilon\) using exact arithmetic to yield
Now we could implement
Now it's not entirely obvious why this works and so you may be forgiven if you're a little sceptical that it will do so for all normal floating point numbers.
To see why it does, first note that
This means that the result of the division will only add bits to \(x\) after the last digit in its significand and that they must be rounded up when it is represented as a floating point number.
Nifty, eh?
Fortunately, they're much easier to deal with than normal numbers since the least significant digit of every subnormal number corresponds to the same power of two. In fact, this is true of all floating point numbers with a magnitude that is smaller than twice the smallest normal number.
For any such number we need simply add or subtract the smallest positive subnormal number to find the next floating point number after or before it. JavaScript conveniently defines
We must also take care to treat nonfinite numbers as special cases. Specifically, the next numbers towards zero after plus and minus infinity are plus and minus the largest finite floating point number, provided by JavaScript's
Finally, we shall define the next number after any number towards itself to be itself.
I don't think that there's anything particularly surprising about this function, given that it simply branches to the appropriate arithmetic operation for any given number. There is perhaps a slight subtlety in the final return statement which will be reached if
Program 3 demonstrates how to use
All that's left to say is that the implementation of
[2] You're Going To Have To Think! Why Interval Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.
finite
determines whether its argument is neither infinite nor NaN and isnan
determines whether its argument is NaN; behaviours that shouldn't be particularly surprising since they're more or less equivalent to JavaScript's isFinite
and isNaN
functions respectively.One recommended function that JavaScript does not provide, and which I should like to add to the
ak
library, is nextafter
which returns the first representable floating point number after its first argument in the direction towards its second.
A Bit Fiddly
Figure 1: IEEE 754 Single Precision Floating Point
±a_{1}a_{2}a_{3}...a_{8}b_{1}b_{2}b_{3}...b_{23}  
Binary Exponent (a_{1}...a_{8})  Decimal Exponent  Binary Value 
00000000  0  ±0.b_{1}b_{2}b_{3}...b_{23}×2^{126} 
00000001  1  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{126} 
00000010  2  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{125} 
00000011  3  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{124} 
...  ...  ... 
01111111  127  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{0} 
10000000  128  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{1} 
...  ...  ... 
11111100  252  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{125} 
11111101  253  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{126} 
11111110  254  ±1.b_{1}b_{2}b_{3}...b_{23}×2^{127} 
11111111  255 
±∞ if b=0 NaN otherwise 
\[
\pm b \times 2^a
\]
where \(b\) is the significand and \(a\) is the exponent, and are arranged in memory with the sign bit first, followed by the bits of the exponent and then those of the significand, as summarised in figure 1 for 32 bit single precision numbers. Note that finite numbers with nonzero exponent bits have an implied leading digit of one and are referred to as normal, whereas those whose exponents' bits are all zero have an implied leading digit of zero and are referred to as subnormal.Now we could use integer operations to calculate the bits of the next floating point number, put them back into an array of bytes and finally reinterpret its contents as the floating point result.
Unfortunately, JavaScript isn't a relatively low level language and won't let us do this sort of thing, so we shall have to come up with an alternative approach.
Go Forth And Multiply
If you have a particularly good memory you will recall that we did something similar when widening the bounds of our interval arithmetic numerical type^{[2]},interval
. This sought to keep track of numerical errors by defining the results of arithmetic operations as the widest possible intervals given the ranges of values in their interval arguments, finally widening their bounds to reflect rounding error.For example, for positive normal values \(l_0\), \(u_0\), \(l_1\) and \(u_1\) we defined addition as
\[
\left[l_0, u_0\right] + \left[l_1, u_1\right] = \left[\left(l_0+l_1\right) \times \left(1\epsilon\right), \left(u_0+u_1\right) \times \left(1+\epsilon\right)\right]
\]
where \(\epsilon\) is the difference between one and the smallest floating point number greater than one, known as the floating point epsilon.
Derivation 1: Next Towards Zero
Given a positive normal floating point number
Now if the last nonzero bit of \(x\) is \(b_k\) this yields
Secondly, if all of the bits \(b_i\) are zero then, by the same derivation of the value of \(1\frac12\epsilon\), we have
Finally, for negative floating point numbers we simply negate them before the multiplication and rounding and then negate the result to prove that this yields the next closest floating point number to zero.
\[
x = 1 \, . \, b_1 \, b_2 \, \dots \, b_{n1} \, b_n \, \times \, 2^a
\]
then, using exact arithmetic, we have
\[
\begin{align*}
&x \times \left(1\tfrac12\epsilon\right) = x  x \times \tfrac12\epsilon\\
&\quad= 1 \, . \, b_1 \, b_2 \, \dots \, b_{n1} \, b_n \, 0 \, 0_\phantom{1} \, 0_\phantom{2} \, \dots \, 0_\phantom{n1} \, 0_\phantom{n} \, \times \, 2^a\\
&\quad 0 \, . \, 0_\phantom{1} \, 0_\phantom{2} \, \dots \, 0_\phantom{n1} \, 0_\phantom{n} \, 1 \, b_1 \, b_2 \, \dots \, b_{n1} \, b_n \, \times \, 2^a\\
\end{align*}
\]
Firstly, if any of the bits \(b_i\) are nonzero then the \((n+1)^\mathrm{th}\) bit of the subtraction must include a carry from the subtractions of the following bits and so it will result in zero with another carry.Now if the last nonzero bit of \(x\) is \(b_k\) this yields
\[
1 \, . \, b_1 \, b_2 \, \dots \, b_{k1} \, 0 \, 1 \, \dots \, 1  0 \, c_1 \, c_2 \, \dots \, c_{n1} \, c_n \, \times \, 2^a
\]
for some bits \(c_i\) and where the vertical bar marks the end of the bits of the significand, which will round down to
\[
1 \, . \, b_1 \, b_2 \, \dots \, b_{k1} \, 0 \, 1 \, \dots \, 1 \, \times \, 2^a
\]
as required.Secondly, if all of the bits \(b_i\) are zero then, by the same derivation of the value of \(1\frac12\epsilon\), we have
\[
\begin{align*}
x \times \left(1\tfrac12\epsilon\right) &= 0 \, . \, 1 \, 1 \, \dots \, 1  1 \, 0 \, \dots 0 \, \times \, 2^a\\
&= 1 \, . \, 1 \, \dots \, 1 \, 1 \, \times \, 2^{a1}
\end{align*}
\]
which is exactly representable as a floating point number if \(a\) is greater than the most negative normal exponent.Finally, for negative floating point numbers we simply negate them before the multiplication and rounding and then negate the result to prove that this yields the next closest floating point number to zero.
The first thing to note is that the difference between one and the greatest floating point number that is less than one is, in fact, \(\frac12\epsilon\) since the results of arithmetic operations are normalised to have a leading digit of one whenever possible and consequently
\[
\begin{align*}
1  \epsilon = &1.0 \dots 00 \times 2^0\\
&0.0 \dots 01 \times 2^0\\
= &0.1 \dots 11 \times 2^0\\
= &1.1 \dots 10 \times 2^{1}\\
\\
1  \tfrac12\epsilon = &1  \epsilon + \tfrac12\epsilon\\
= &1.1 \dots 10 \times 2^{1}\\
+ &0.0 \dots 01 \times 2^{1}\\
= &1.1 \dots 11 \times 2^{1}
\end{align*}
\]
This means that for any finite floating point number that's greater than the smallest normal number, or less than its negation, we can find the next closest one to zero by multiplying it by \(1\frac12\epsilon\), as proven in derivation 1.The next thing to note is that, even though \(1+\epsilon\) is the first floating point number greater than one, \(x \times (1+\epsilon)\) is not necessarily the first floating point number away from zero after \(x\), as demonstrated by program 1 which calculates the difference between \(x\) and the first floating point number that's less than \(x \times (1+\epsilon)\) for some randomly chosen values of \(x\).
Program 1: Not The Next After



A simple way to see why this fails is to multiply \(1\frac34\) by \(1+\epsilon\) using exact arithmetic to yield
\[
\begin{align*}
1\tfrac34 + 1\tfrac34 \times \epsilon = &1 \, . \, 1 \, 1 \, 0 \, \dots \, 0 \, 0  0 \, 0 \, 0 \, \dots \, 0 \, 0 \, \times 2^0\\
+ &0 \, . \, 0 \, 0 \, 0 \, \dots \, 0 \, 1  1 \, 1 \, 0 \, \dots \, 0 \, 0 \, \times 2^0\\
= &1 \, . \, 1 \, 1 \, 0 \, \dots \, 0 \, 1  1 \, 1 \, 0 \, \dots \, 0 \, 0 \, \times 2^0
\end{align*}
\]
where, as in derivation 1, the vertical bar marks the end of the significand. When this is rounded to the nearest floating point number we get
\[
1\tfrac34 \times (1+\epsilon) = 1 \, . \, 1 \, 1 \, 0 \, \dots \, 1 \, 0 \, \times 2^0
\]
which is clearly a step too far.Now we could implement
nextafter
away from zero by multiplying its argument \(x\) by \(1 + \epsilon\), then multiplying the result by \(1 \tfrac12\epsilon\) and returning the latter if it's not equal to \(x\) and the former if it is. We can, however, do better than that.
The Next Division
The trick is to divide \(x\) by \(1\frac12\epsilon\) instead, as demonstrated by program 2.
Program 2: The Next After All



Derivation 2: Bounds On The Division's Result
First, recall that a geometric series is defined as
\[
S_n = \sum_{i=1}^n a r^{i1} = a \frac{1r^n}{1r}
\]
where \(\sum\) is the summation sign. Setting \(r\) to \(\frac12\epsilon\), \(a\) to one and \(n\) to infinity yields
\[
\sum_{i=1}^\infty \left(\tfrac12\epsilon\right)^{i1} = \frac{1}{1\tfrac12\epsilon}
\]
and so
\[
\frac{x}{1\tfrac12\epsilon} = x \times \left(1 + \tfrac12\epsilon + \sum_{i=3}^\infty \left(\tfrac12\epsilon\right)^{i1}\right)
\]
Next, we have
\[
\begin{align*}
\sum_{i=3}^\infty \left(\tfrac12\epsilon\right)^{i1} &= \sum_{i=1}^\infty \tfrac14\epsilon^2 \left(\tfrac12\epsilon\right)^{i1}
= \frac{\tfrac14\epsilon^2}{1\tfrac12\epsilon}
\end{align*}
\]
Finally, we can derived bounds for this with
\[
\begin{array}{rcccl}
\tfrac12\epsilon^2  \tfrac18\epsilon^3 &<& \tfrac12\epsilon^2 &<& \tfrac12\epsilon\\
\tfrac14\epsilon^2  \tfrac18\epsilon^3 &<& \tfrac14\epsilon^2 &<& \tfrac12\epsilon  \tfrac14\epsilon^2\\
\tfrac14\epsilon^2 \left(1  \tfrac12\epsilon\right) &<& \tfrac14\epsilon^2 &<& \tfrac12\epsilon \left(1  \tfrac12\epsilon\right)\\
\tfrac14\epsilon^2 &<& \frac{\tfrac14\epsilon^2}{1  \tfrac12\epsilon} &<& \tfrac12\epsilon
\end{array}
\]
and consequently
\[
x \times \left(1+\tfrac12\epsilon\right) < \frac{x}{1\tfrac12\epsilon} < x \times \left(1+\epsilon\right)
\]
To see why it does, first note that
\[
\begin{align*}
x \div \left(1\tfrac12\epsilon\right) &< x \times \left(1+\epsilon\right)\\
x \div \left(1\tfrac12\epsilon\right) &> x \times \left(1+\tfrac12\epsilon\right)
\end{align*}
\]
as proven by derivation 2.This means that the result of the division will only add bits to \(x\) after the last digit in its significand and that they must be rounded up when it is represented as a floating point number.
Nifty, eh?
Those Guys Ain't Normal
Now everything that we've done so far depends upon the fact that normal floating point numbers have an implied leading digit of one and since subnormal numbers don't, we shall have to handle them differently.Fortunately, they're much easier to deal with than normal numbers since the least significant digit of every subnormal number corresponds to the same power of two. In fact, this is true of all floating point numbers with a magnitude that is smaller than twice the smallest normal number.
For any such number we need simply add or subtract the smallest positive subnormal number to find the next floating point number after or before it. JavaScript conveniently defines
Number.MIN_VALUE
to be precisely that number, which we've also copied into ak.MIN_VALUE
.We must also take care to treat nonfinite numbers as special cases. Specifically, the next numbers towards zero after plus and minus infinity are plus and minus the largest finite floating point number, provided by JavaScript's
Number.MAX_VALUE
and also copied into ak.MAX_VALUE
. Furthermore, the next number after NaN towards any number should also be NaN, as should be the next number towards NaN after any number.Finally, we shall define the next number after any number towards itself to be itself.
Implementation Details
Now that we've worked out how to find the next floating point number after any given floating point number in any given direction using nothing but arithmetic operations, all that's left to do is implement it. This is done by theak.nextAfter
function given in listing 1.Listing 1: ak.nextAfter
var M = 2*ak.MIN_NORMAL; var R = 1.00.5*ak.EPSILON; ak.nextAfter = function(x, y) { x = Number(x); y = Number(y); if(y>x) { if(x>= M) return x/R; if(x<=M) return x!==ak.INFINITY ? x*R : ak.MAX_VALUE; return x+ak.MIN_VALUE; } else if(y<x) { if(x<=M) return x/R; if(x>= M) return x!==ak.INFINITY ? x*R : ak.MAX_VALUE; return xak.MIN_VALUE; } return isNaN(y) ? y : x; };
I don't think that there's anything particularly surprising about this function, given that it simply branches to the appropriate arithmetic operation for any given number. There is perhaps a slight subtlety in the final return statement which will be reached if
x
equals y
or if either is NaN; if y
is NaN and x
isn't then we should return NaN which is why we need the final test.Program 3 demonstrates how to use
ak.nextAfter
to find the next floating point numbers toward plus and minus infinity for numbers of various magnitudes, using our ak.normalInvCDF
implementation of the inverse of the normal cumulative distribution function to generate them.
Program 3: Using ak.nextAfter



All that's left to say is that the implementation of
ak.nextAfter
can be found in NextAfter.js.
References
[1] IEEE standard for binary floatingpoint arithmetic. ANSI/IEEE std 7541985, 1985.[2] You're Going To Have To Think! Why Interval Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.
Leave a comment