Pseudo Algorithmical

| 0 Comments

It is often the case that numerical algorithms require a source of random numbers; we have used them for an, admittedly rather crude, optimisation algorithm[1] and for the selection of initial cluster representatives in the \(k\) means clustering algorithm[2].
The notion of generating random numbers with computers, which are meticulously designed to operate in an entirely predictable manner, is admittedly a rather odd one. One approach is to deliberately design elements of the computer to be extremely sensitive to physical conditions like temperature. Another is to detune a radio and use the noise as a source of randomness[3]. Finally, we might exploit the fundamentally probabilistic nature of the universe as described by quantum mechanics[4].
Unfortunately, such sources of randomness are typically too slow for general use and we must instead settle for sequences of numbers that are almost random.

Wait, what's that?

Pseudo Random Numbers

You may be forgiven if you think that the idea of a sequence of numbers being almost random is a bit dodgy; surely either they are or they aren't! What we actually mean by it is a sequence of numbers that, whilst categorically not random, pass many of the statistical tests that we use to try to identify randomness.
Such sequences are known as pseudo random numbers and every programming language that I'm familiar with uses them for their standard random number generators. Perhaps the simplest of them, at least from an implementation perspective, are those produced by multiplicative congruential generators, which take the form
\[ x_{i+1} = \left(a \times x_i\right)\;\%\;m \]
where \(x\;\%\;m\) means the remainder from the integer division of \(x\) by \(m\).
Here, \(a\) is known as the multiplier, \(m\) the modulus and \(x_0\) the seed, all of which are positive integers. Note that, by the properties of the remainder, we can further insist that both \(a\) and \(x_0\) are strictly less than \(m\) since, for positive integers \(i\) and \(j\), we have
\[ \begin{align*} \left((a + i \times m) \times (x + j \times m)\right)\;\%\;m &= \left(a \times x + i \times m \times x + a \times j \times m + i \times j \times m^2\right)\;\%\;m\\ &= \left(a \times x\right)\;\%\;m \end{align*} \]
which follows from the fact that adding integer multiples of \(m\) to a number trivially has no effect upon the remainder when dividing by \(m\).
Note that this also implies that \(m\) must be greater than one, since neither \(a\) nor \(x_0\) equal zero.

For example, if we choose
\[ \begin{align*} a & = 5\\ m & = 7\\ x_0 & = 3 \end{align*} \]
we get the sequence
\[ \begin{align*} x_1 &= (5 \times 3)\;\%\;7 = 15\;\%\;7 = 1\\ x_2 &= (5 \times 1)\;\%\;7 = \phantom{0}5\;\%\;7 = 5\\ x_3 &= (5 \times 5)\;\%\;7 = 25\;\%\;7 = 4\\ x_4 &= (5 \times 4)\;\%\;7 = 20\;\%\;7 = 6\\ x_5 &= (5 \times 6)\;\%\;7 = 30\;\%\;7 = 2\\ x_6 &= (5 \times 2)\;\%\;7 = 10\;\%\;7 = 3 \end{align*} \]
which, having returned to the seed, repeats indefinitely with
\[ x_{i+7} = x_i \]
Now the first six terms in this sequence are a pretty random looking rearrangement of the integers one to six, which suggests that given sufficiently large values for \(a\) and \(m\) we might generate lengthy sequences of random looking shuffles of the integers between one and \(m-1\).

One obvious way in which this can fail is if \(a\) equals one, in which case every term must equal \(x_0\), which is about as far from random as it is possible to get. Note that choosing \(a\) greater than one means that we must also choose \(m\) greater than two.
Slightly less obvious is that we must be careful with our choices of \(a\) and \(m\), as is trivially demonstrated with
\[ \begin{align*} a & = 2^{12}\\ m & = 2^{15} \end{align*} \]
which yields the sequence
\[ \begin{align*} x_1 &= (2^{12} \times x_0)\;\%\;2^{15}\\ x_2 &= (2^{12} \times x_1)\;\%\;2^{15} = (2^{24} \times x_0)\;\%\;2^{15} = (2^9 \times x_0)\;\%\;2^{15}\\ x_3 &= (2^{12} \times x_2)\;\%\;2^{15} = (2^{21} \times x_0)\;\%\;2^{15} = (2^6 \times x_0)\;\%\;2^{15}\\ x_4 &= (2^{12} \times x_3)\;\%\;2^{15} = (2^{18} \times x_0)\;\%\;2^{15} = (2^3 \times x_0)\;\%\;2^{15}\\ x_5 &= (2^{12} \times x_4)\;\%\;2^{15} = (2^{15} \times x_0)\;\%\;2^{15} = 0 \end{align*} \]
after which we get stuck at zero, proving that it is not enough for \(a\) and \(m\) to simply be large.

Derivation 1: Common Factors
The remainder of \(x_i\) divided by \(m\) can be expressed as
\[ x_i - \bigg{\lfloor} \frac{x_i}{m} \bigg{\rfloor} \times m \]
where \(\lfloor x \rfloor\) is the largest integer less than or equal to \(x\), or in other words its floor.
Consequently, we have
\[ \begin{align*} (a^\prime b \times x_i)\;\%\;m^\prime b &= a^\prime b x_i - \bigg{\lfloor} \frac{a^\prime b x_i}{m^\prime b} \bigg{\rfloor} \times m^\prime b\\ &= a^\prime b x_i - \bigg{\lfloor} \frac{a^\prime x_i}{m^\prime} \bigg{\rfloor} \times m^\prime b\\ &= \left(a^\prime x_i - \bigg{\lfloor} \frac{a^\prime x_i}{m^\prime} \bigg{\rfloor} \times m^\prime\right) \times b\\ &= \left(a^\prime x_i \; \% \; m^\prime\right) \times b \end{align*} \]
and
\[ (a \times x_i^\prime b)\;\%\;m^\prime b = (a b \times x_i^\prime)\;\%\;m^\prime b = \left(a x_i^\prime \; \% \; m^\prime\right) \times b \]
Note that if either \(a\) and \(m\) or \(x_0\) and \(m\) have a common factor of \(b\), so that
\[ \begin{align*} a &= a^\prime \times b\\ m &= m^\prime \times b \end{align*} \]
or
\[ \begin{align*} x_0 &= x_0^\prime \times b\\ m &= m^\prime \times b \end{align*} \]
for positive integers \(a^\prime\), \(b\), \(x_0^\prime\) and \(m^\prime\), then every number in the sequence, except for \(x_0\) in the first case, will have the same factor in common with the modulus, as shown by derivation 1.
Furthermore, if \(b\) equals one then no number in the sequence can have any common factors with the modulus.
Consequently, to maximise the number of steps before it repeats, we should choose a modulus with as few unique prime factors as possible and a multiplier and seed that have none of them in common with it.

By The Power Of, Er, Two

Given all of this it is extremely tempting to choose a modulus that is a power of two since that means that we can replace the remainder with a, typically much faster, bitwise and
\[ (a \times x) \;\%\; 2^m = (a \times x) \; \& \; \left(2^m - 1\right) \]
Unfortunately, it is still not enough to simply choose a multiplier and seed that are odd since some choices will yield sequences that repeat sooner than others. For example, compare
\[ \begin{align*} a & = \phantom{0}5\\ m & = 16\\ x_0 & = \phantom{0}1\\ \\ x_1 &= (5 \times \phantom{0}1)\;\%\;16 = \phantom{0}5\;\%\;16 = \phantom{0}5\\ x_2 &= (5 \times \phantom{0}5)\;\%\;16 = 25\;\%\;16 = \phantom{0}9\\ x_3 &= (5 \times \phantom{0}9)\;\%\;16 = 45\;\%\;16 = 13\\ x_4 &= (5 \times 13)\;\%\;16 = 65\;\%\;16 = \phantom{0}1 \end{align*} \]
with
\[ \begin{align*} a & = \phantom{0}7\\ m & = 16\\ x_0 & = \phantom{0}1\\ \\ x_1 &= (7 \times 1)\;\%\;16 = \phantom{0}7\;\%\;16 = 7\\ x_2 &= (7 \times 7)\;\%\;16 = 49\;\%\;16 = 1 \end{align*} \]
Moreover, even if we manage to choose a multiplier and seed that yield a sequence that repeats after as many steps as possible, that's still not enough for it to pass the statistical tests for randomness, although the reasons why are rather subtle.

So subtle, in fact, that even IBM managed to get it wrong.

The Amazing RANDU

Back in the sixties IBM included the RANDU generator in the libraries that they distributed for their mainframe computers, defined by
\[ \begin{align*} a & = 65539\\ m & = 2^{31} \end{align*} \]
whose multiplier yields a sequence of maximal length for that modulus whenever the seed is odd.
The great advantage that that particular multiplier had was that it allowed the multiplication to be replaced by bit shifts and addition, which were significantly faster. Specifically
\[ \begin{align*} (65539 \times x)\;\%\;2^{31} &= \left(\left(2^{16} + 2 + 1\right) \times x\right) \; \& \; \left(2^{31}-1\right)\\ &= ((x<<16) + (x<<1) + x) \; \& \; \left(2^{31}-1\right) \end{align*} \]
where \(x \& m\) denotes the result of the bitwise and operation of \(x\) and \(m\).
Note that this implies that the mainframes in question were using twos-complement 32 bit signed integers, in which negative numbers have the most leftmost bit set and rightmost bits equal to those of the positive number that results from adding \(2^{32}\) to them. The benefit of this scheme is that, as far as the binary representation is concerned, addition and subtraction behave exactly the same way that they do for 32 bit unsigned integers.
The bitwise and consequently removes the sign from the result, leaving behind a 31 bit unsigned integer which can be exactly represented by a 32 bit signed integer.

On the face of it the numbers in this sequence look pretty random, as demonstrated by program 1 which relies upon the fact that JavaScript's bitwise operations convert their arguments to just such 32 bit signed integers.

Program 1: RANDU

Marsaglia's Theorem

To see the problem, we need the first part of Marsaglia's theorem[5] which states that if we use a multiplicative congruential generator to randomly generate a sequence of \(n\) dimensional vectors \(\mathbf{v}_i\) whose elements are all between zero and one with
\[ \left(\mathbf{v}_i\right)_j = \frac{x_{i+j}}{m} \]
then given a set of \(n\) integers \(c_i\), at least one of which is non-zero, such that
\[ \left(\sum_{i=0}^{n-1} c_i \times a^i\right)\;\%\;m = 0 \]
where \(\sum\) is the summation sign, then the vectors will be constrained to fewer than
\[ \sum_{i=0}^{n-1} |c_i| \]
\(n-1\) dimensional subspaces, where the vertical bars represent the magnitude of the terms between them, as proven in derivation 2. Note that by analogy with such subspaces in three dimensions being two dimensional planes, these are typically referred to as hyperplanes.

Derivation 2: Marsaglia's Theorem, Part One
First note that if, for a set of \(n\) dimensional vectors \(\mathbf{v}_i\), there exist constants \(a_j\) and \(b\) such that
\[ \sum_{j=0}^{n-1} a_j \times \left(\mathbf{v}_i\right)_j = b \]
where at least one of the \(a_j\) is non-zero then they must be constrained to an \(n-1\) dimensional hyperplane.
Secondly, given the bounds that we have already established for \(a\), \(x_0\) and \(m\), the terms in the sequence must be greater than zero and less than \(m\) and so
\[ 0 < \left(\mathbf{v}_i\right)_j < 1 \]
Next, since
\[ x_{i+j}\;\%\;m = \left(a \times x_{i+j-1}\right)\;\%\;m = \left(a^2 \times x_{i+j-2}\right)\;\%\;m = \dots = \left(a^j \times x_i\right)\;\%\;m \]
we also have
\[ \left(\sum_{j=0}^{n-1} c_j \times x_{i+j}\right)\;\%\;m = \left(\sum_{j=0}^{n-1} c_j \times a^j \times x_i\right)\;\%\;m = \left(x_i \times \sum_{j=0}^{n-1} c_j \times a^j\right)\;\%\;m = 0 \]
and so
\[ \sum_{j=0}^{n-1} c_j \times x_{i+j} \]
must be an integer multiple of \(m\) and
\[ \frac{\sum_{j=0}^{n-1} c_j \times x_{i+j}}{m} = \sum_{j=0}^{n-1} c_j \times \frac{x_{i+j}}{m} = \sum_{j=0}^{n-1} c_j \times \left(\mathbf{v}_i\right)_j \]
must be an integer.

Finally, given the bounds upon the elements of the vectors \(\mathbf{v}_i\), it must be the case that
\[ \sum_{c_i < 0} c_i < \sum_{j=0}^{n-1} c_j \times \left(\mathbf{v}_i\right)_j < \sum_{c_i > 0} c_i \]

and so the vectors must be confined to fewer hyperplanes than
\[ \sum_{c_i > 0} c_i - \sum_{c_i < 0} c_i = \sum_{i=0}^{n-1} |c_i| \]

The easiest way to demonstrate Marsaglia's theorem is to deliberately choose bad values for \(a\) and \(m\). For example, if we use
\[ \begin{align*} a &= 43691\\ m &= 2^{16} = 65536 \end{align*} \]
to randomly generate two dimensional vectors in the unit square, then we have
\[ -1 + 3 \times a = 2 \times m \]
and we should consequently find that such vectors lie upon fewer than four straight lines, as is confirmed by program 2.

Program 2: Marsiglia's Theorem In Action

If you change a to 311 in program 2, you will see that this problem lies entirely with the multiplier.

Hursley, We Have A Problem

The implications of Marsaglia's theorem for RANDU can be revealed by squaring its multiplier
\[ a^2 = \left(2^{16}+3\right)^2 = 2^{32} + 6 \times 2^{16} + 9 \]
The remainder of this square after division by the modulus is trivially
\[ a^2\;\%\;m = \left(2^{32} + 6 \times 2^{16} + 9\right)\;\%\;2^{31} = 6 \times 2^{16} + 9 \]
and so
\[ a^2\;\%\;m - 6 \times a + 9 = \left(6 \times 2^{16} + 9\right) - \left(6 \times 2^{16} + 18\right) + 9 = 0 \]
Rearranging yields
\[ \left(9 - 6 \times a + a^2\right)\;\%\;m = 0 \]
which means that if RANDU is used to generate vectors in the unit cube then they will lie upon fewer than sixteen planes!

Program 3 shows that this is indeed the case by counting the number of unique values of
\[ \frac{9 \times x_i - 6 \times x_{i+1} + x_{i+2}}{m} \]
for terms of the sequence generated by RANDU.

Program 3: Counting The Planes

The final part of Marsaglia's theorem shows that the maximum number of hyperplanes upon which \(n\) dimensional vectors generated by a multiplicative congruential generator with a modulus of \(m\) fall must be fewer than
\[ \left(n! \times m \right)^\frac{1}{n} \]
where the exclamation mark stands for the factorial. For three dimensional vectors generated with a modulus of \(2^{31}\) this comes to a maximum of 2343 hyperplanes, of which RANDU manages to visit just 0.64%.

Linear Congruential Generators

We can generalise multiplicative congruential generators by adding a constant increment to each term before calculating the remainder
\[ x_{i+1} = \left(a \times x_i + c\right)\;\%\;m \]
Such schemes are known as linear congruential generators but, unfortunately, don't make it any easier to pick values that yield good pseudo random sequences and we must rely upon choices that have stood the test of time, as we shall do when it comes to implementing a few in the next post.

References

[1] It's All Downhill From Here, www.thusspakeak.com, 2014.

[2] K Means Business, www.thusspakeak.com, 2014.

[3] random.org.

[4] Quantum Random Numbers Server, Australian National University.

[5] Marsaglia, G., Random Numbers Fall Mainly in the Planes, Proceedings of the National Academy of Sciences, Vol. 61, No. 1, 1968.

Leave a comment