The Weak, The Inappropriate And The B-D Shuffle

| 0 Comments

We have been looking[1][2] at how we might approximate random variables with pseudo random number generators, such as the sequences generated with linear congruences of the form
\[ x_{i+1} = \left(a \times x_i\right)\;\%\;m \]
where \(x\;\%\;m\) represents the remainder of the integer division of \(x\) by \(m\), \(a\) is known as the multiplier, \(c\) as the increment, \(m\) as the modulus and \(x_0\) as the seed.
Whilst these are supremely easy to implement and extremely efficient in terms of both speed of execution and use of memory, choosing appropriate values for \(a\), \(c\), \(m\) and \(x_0\) is a rather difficult proposition; get it wrong and they will produce weak pseudo random numbers and, unfortunately, the ways in which we can do so are extremely subtle. For example, if we use such sequences to randomly generate two-dimensional vectors then poor choices will result in their lying upon relatively few lines, as demonstrated by program 1 using our ak.congruentialRnd generator which, like Math.random, scales the numbers that it returns so that they are greater than or equal to zero and strictly less than one.

Program 1: Getting It Very, Very Wrong!

Marsaglia's theorem[3] shows that this problem occurs in higher dimensions too, so that in \(n\) dimensions the vectors might be constrained to a surprisingly small number of \(n-1\) dimensional subspaces.

First, Give Them A Good Shuffle...

One thing we might try to improve the randomness of linear congruential pseudo random number generators is to shuffle them, that is to say observe them non-sequentially with

\(\displaystyle{ x^\prime_i = x_{j_i} }\)

where \(j_i\) is a randomised sequence of integers.

Unfortunately advancing from the seed to a randomly selected element of the sequence at every step would be prohibitively expensive, but we can do almost as well by advancing a random number of elements from the current one instead, as illustrated by program 2.

Program 2: Randomly Advancing Through The Sequence

The randomisation of the advance through the sequence is done here with the lines

d = ak.floor(16*rnd2());
for(j=0;j<d;++j) rnd1();

which throw away between zero and fifteen numbers of the sequence before one is used as a pseudo random number.
Whilst this certainly breaks up the lines rather effectively, on average program 2 uses nine and a half times as many pseudo random numbers as program 1 which, although a good deal better than picking elements from the sequence entirely at random, is hardly ideal.

The MacLaren-Marsaglia Shuffle

We can significantly reduce the computational expense of shuffling the sequence with a trick developed by MacLaren and Marsaglia[4]. Rather than throwing away any of the pseudo random numbers, they suggested filling a buffer with them and using a second generator to pick which to use at each step. That number is then replaced with a fresh one from the first random number generator, allowing us to jump both forwards and backwards through the sequence, eventually using all of them, as is done in program 3.

Program 3: The MacLaren-Marsaglia Shuffle

This time the randomisation of the progress through the sequence is done with the lines

j = ak.floor(v.length*rnd2());
x1 = v[j];
v[j] = rnd1();

which pick the buffered number to use and replace, breaking up the lines no less effectively than program 2 but with just twice as many pseudo random numbers than were used in program 1.

The Bays-Durham Shuffle

We can do better still with a trick invented by Bays and Durham[5]; use the previous value taken from the buffer of the pseudo random numbers to pick the next one to use and replace, as demonstrated by program 4.

Program 4: The Bays-Durham Shuffle

Note that now we're randomising the path through the sequence with

j = ak.floor(v.length*x0);
x1 = v[j];
v[j] = rnd();

As you can see, this breaks up the lines just as effectively as our previous schemes but only needs one pseudo random number at each step!

ak.baysDurhamRnd

It's a relatively trivial task to implement a general purpose Bays-Durham shuffle for pseudo random number generators, as illustrated by listing 1.

Listing 1: ak.baysDurhamRnd
ak.baysDurhamRnd = function(rnd, n) {
 var v, i, x;

 if(ak.nativeType(rnd)!==ak.FUNCTION_T) {
  throw new Error('invalid generator in ak.baysDurhamRnd');
 }

 if(ak.nativeType(n)===ak.UNDEFINED_T) n = 16;
 else if(ak.nativeType(n)!==ak.NUMBER_T) {
  throw new Error('non-numeric buffer size in ak.baysDurhamRnd');
 }
 n = Number(n);

 if(n!==ak.floor(n) || n<2) {
  throw new Error('invalid buffer size in ak.baysDurhamRnd');
 }

 v = new Array(n);
 for(i=0;i<n;++i) v[i] = rnd();
 x = rnd();

 return function() {
  i = ak.floor(n*x);
  x = v[i];
  v[i] = rnd();
  return x;
 };
};

This function takes a pseudo random number generator rnd and returns a new one that applies the Bays-Durham shuffle with a buffer length of n to the numbers that it generates in almost exactly the same way as we did in program 4, the principal difference being the tests that the arguments are valid and the defaulting of the buffer length to sixteen if it is not specified.

Program 5 demonstrates how to use ak.baysDurhamRnd to break up the lines in the pseudo random number generator from program 1.

Program 5: Using ak.baysDurhamRnd

The implementation of ak.baysDurhamRnd can be found in BaysDurhamRnd.js.

References

[1] Pseudo Algorithmical, www.thusspakeak.com, 2016.

[2] Talkin' 'Bout My Generators, www.thusspakeak.com, 2016.

[3] Marsaglia, G., Random Numbers Fall Mainly in the Planes, Proceedings of the National Academy of Sciences, Vol. 61, No. 1, 1968.

[4] MacLaren, M. & Marsaglia, G., Uniform Random Number Generators, Journal of the ACM, Vol. 12, No. 1, 1965.

[5] Bays, C. & Durham, S., Improving a Poor Random Number Generator, ACM Transactions on Mathematical Software, Vol. 2, No. 1, 1976.

Leave a comment