Define Normal

| 0 Comments

The normal distribution, \(N(\mu, \sigma)\), is perhaps the most important in statistics, cropping up in all sorts of places. Its parameters, \(\mu\) and \(\sigma\), are equal to its mean and standard deviation respectively, and its PDF is given by
Figure 1: The Normal PDF
\[ \phi_{\mu,\sigma}(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{(x-\mu)^2}{2\sigma^2}} \]
which takes the familiar form of a bell curve, as shown in figure 1.

The CDF is simply given by the integral of the PDF
\[ \Phi_{\mu,\sigma}(x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{(z-\mu)^2}{2\sigma^2}}\mathrm{d}z \]
Unfortunately there's no closed form formula for this integral, but we can still use it to derive some basic properties of the normal distribution. For example, given \(X \sim N(\mu, \sigma)\) then \(aX+b \sim N(a\mu+b, |a|\sigma)\) as shown in derivation 1.

Derivation 1: The Distribution Of aX+b
By the substitution
\[ y = az+b \implies z = \frac{y-b}{a}, \mathrm{d}z = \frac{1}{a}\mathrm{d}y \]
we have
\[ \begin{align*} \Phi_{\mu,\sigma}(x) &= \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{(z-\mu)^2}{2\sigma^2}}\mathrm{d}z = \int_{-a\infty}^{ax+b}\frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{\left(\tfrac{y-b}{a}-\mu\right)^2}{2\sigma^2}}\frac{1}{a}\mathrm{d}y\\ &= \int_{-a\infty}^{ax+b}\frac{1}{\sqrt{2\pi}(a\sigma)} e^{-\tfrac{\left(y-b-a\mu\right)^2}{2(a \sigma)^2}}\mathrm{d}y = \Phi_{a\mu+b,|a|\sigma}(ax+b) \end{align*} \]
Note that the absolute value of \(a\) appears since if \(a\) is negative then whilst the integrand is negative, the bounds of integration range from larger to smaller, negating the result.

It's fairly straightforward to show that its CF is given by
\[ \hat{\phi}_{\mu,\sigma}(t) = e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \]
as is done in derivation 2. A consequence of this is, since the CF of the sum of two random variables is the product of their CFs, we can see by inspection that the sum of two normally distributed random variables is also normally distributed
\[ \hat{\phi}_{\mu_1,\sigma_1}(t) \times \hat{\phi}_{\mu_2,\sigma_2}(t) = e^{i \mu_1 t -\frac{1}{2} \sigma_1^2 t^2} \times e^{i \mu_2 t -\frac{1}{2} \sigma_2^2 t^2} = e^{i \left(\mu_1 + \mu_2\right) t -\frac{1}{2} \left(\sigma_1^2 + \sigma_2^2\right) t^2} = \hat{\phi}_{\mu_1+\mu_2,\sqrt{\sigma_1^2+\sigma_2^2}}(t) \]
Now this might not seem as useful a function to have at our disposal as the CDF, but it's central to the importance of the normal distribution.

Derivation 2: The Normal CF
\[ \hat{\phi}_{\mu,\sigma}(t) = \int_{-\infty}^{+\infty} \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{(x-\mu)^2}{2\sigma^2}} e^{itx} \mathrm{d}x = \int_{-\infty}^{+\infty} \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{1}{2\sigma^2}\left((x-\mu)^2 - 2\sigma^2 itx\right)} \mathrm{d}x\\ \]
Expanding the quadratic in the exponent yields
\[ \begin{align*} (x-\mu)^2 - 2\sigma^2 itx &= x^2 - 2 \mu x + \mu^2 - 2\sigma^2 itx\\ &= x^2 - 2\left(\mu + \sigma^2 it\right) x + \mu^2\\ &= \left(x - \left(\mu + \sigma^2 it\right)\right)^2 + \mu^2 - \left(\mu + \sigma^2 it\right)^2\\ &= \left(x - \left(\mu + \sigma^2 it\right)\right)^2 - 2\mu \sigma^2 it - \sigma^4 i^2 t^2\\ &= \left(x - \mu^\ast\right)^2 - 2\mu \sigma^2 it + \sigma^4 t^2 \end{align*} \]
Substituting this back into the exponent gives
\[ \begin{align*} \hat{\phi}_{\mu,\sigma}(t) &= \int_{-\infty}^{+\infty} \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{\left(x - \mu^\ast\right)^2}{2\sigma^2}} e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \mathrm{d}x\\ &= e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \int_{-\infty}^{+\infty} \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac{\left(x - \mu^\ast\right)^2}{2\sigma^2}} \mathrm{d}x\\ &= e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \end{align*} \]
since the integrand is a normal PDF and the integral must therefore equal one.

The Central Limit Theorem

Let us assume that we have some random variable \(X\) with a distribution that has a well defined, or in other words finite, mean \(\mu\) and variance \(\sigma^2\). If we define
\[ Y = X - \mu \]
then the central limit theorem states that the mean of \(n\) observations of \(Y\) is normally distributed with mean zero and variance \(\frac{\sigma^2}{n}\) as \(n\) tends to infinity, as proven in derivation 3.

Derivation 3: The Central Limit Theorem
Applying Taylor's theorem to the CF of the distribution of \(\frac{Y}{n}\), we have
\[ \begin{align*} E\left[e^{it\tfrac{Y}{n}}\right] = E\left[e^{i\tfrac{t}{n}Y}\right] &= E\left[1 + i\frac{t}{n}Y + \tfrac{1}{2}i^2 \left(\frac{t}{n}\right)^2 Y^2 + O\left(\left(\frac{t}{n}\right)^3\right)\right]\\ &= E[1] + i\frac{t}{n} E[Y] - \tfrac{1}{2}\frac{t^2}{n^2} E\left[Y^2\right] + O\left(\left(\frac{t}{n}\right)^3\right)\\ &= 1 + i\frac{t}{n} E[X-\mu] - \tfrac{1}{2}\frac{t^2}{n^2} E\left[(X-\mu)^2\right] + O\left(\left(\frac{t}{n}\right)^3\right)\\ &= 1 - \tfrac{1}{2}\frac{t^2}{n^2} \sigma^2 + O\left(\left(\frac{t}{n}\right)^3\right)\\ &\underset{n \to \infty}{=} 1 - \tfrac{1}{2}\frac{t^2}{n^2} \sigma^2 \end{align*} \]
By the properties of the CF, the sum of \(n\) observations of \(\frac{Y}{n}\) has the CF
\[ \left(E\left[e^{it\tfrac{Y}{n}}\right]\right)^n \underset{n\to\infty}{=} \left(1 - \tfrac{1}{2}\frac{t^2}{n^2} \sigma^2\right)^n = \left(1 - \tfrac{1}{n}\tfrac{1}{2}\frac{t^2}{n} \sigma^2\right)^n \underset{n\to\infty}{=} e^{-\tfrac{1}{2}\tfrac{\sigma^2}{n}t^2} \]
So the limit of the CF of the average of \(n\) observations of \(Y\) tends to that of \(N\left(0, \tfrac{\sigma}{\sqrt{n}}\right)\) and hence the distribution of the average tends to it.

Note that the final limit is one of the definitions of the exponential function
\[ e^x = \lim_{n \to \infty}\left(1 + \frac{x}{n}\right)^n \]
as can be confirmed by considering the limit of the derivative of the right hand side.

This is the reason why the normal distribution is found in so many situations; if the randomness in a measurement or process results from the contribution of lots of small sources then, providing that they satisfy the conditions of the central limit theorem (which can be loosened to allow differently distributed sources, provided they in turn satisfy another condition), it must be approximately normally distributed.
Of course, this only holds if the randomness accumulates additively. If it accumulates multiplicatively then we must take the logarithm to recover a normal distribution, since by the properties of logarithms
\[ \ln \left(x_0 \times x_1 \times x_2 \times \dots \right) = \ln x_0 + \ln x_1 + \ln x_2 + \dots \]
In such cases the distribution of the original value is known as the log normal distribution and we shall return to it in a future post.

To demonstrate the central limit theorem in action we'll need an implementation of the normal PDF.

ak.normalPDF

As we did for the uniform distribution[1], we shall implement the normal distribution functions as functions, albeit with added methods to allow access to their parameters. The function representing the normal PDF, ak.normalPDF, is given in listing 1.

Listing 1: ak.normalPDF
var RT2PI = Math.sqrt(2*ak.PI);

function pdf(m, s, x) {
 var y = (x-m)/s;
 return Math.exp(-y*y/2) / (RT2PI*s);
}

ak.normalPDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(x){return pdf(state.mu, state.sigma, x);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

var constructors = {};

The function itself simply forwards to pdf which is a straightforward implementation of the normal PDF.
As usual we initialise the state by dispatching to a constructors object according to the arguments passed to ak.normalPDF. We shall again reuse constructors for all of the normal distribution functions so must support initialisation with a source of random numbers for the normal random number generator, as shown for the default parameter constructors in listing 2.

Listing 2: Normal Distribution Default Parameter Constructors
constructors[ak.UNDEFINED_T] = function() {
};

constructors[ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

These leave the parameters with the default values defined in ak.normalPDF of \(\mu\) equal to zero and \(\sigma\) equal to one. To parameterise the PDF we can either pass just \(\sigma\), leaving \(\mu\) at its default of zero, or both \(\mu\) and \(\sigma\), as illustrated in listing 3.

Listing 3: Normal Distribution Parameterised Constructors
constructors[ak.NUMBER_T] = function(state, x, args) {
 var arg1 = args[1];
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, x, arg1, args);

 if(!isFinite(state.mu)) throw new Error('invalid mu in ak.normal distribution');
 if(state.sigma<=0 || !isFinite(state.sigma)) {
  throw new Error('invalid sigma in ak.normal distribution');
 }
};

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state, x) {
 state.sigma = Number(x);
};

constructors[ak.NUMBER_T][ak.FUNCTION_T] = function(state, x, rnd) {
 state.sigma = Number(x);
 state.rnd = rnd;
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, x0, x1, args) {
 var arg2 = args[2];

 state.mu = Number(x0);
 state.sigma = Number(x1);

 constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, arg2);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function() {
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

To demonstrate the central limit theorem we shall compare a histogram of the sum of \(n\) uniform random numbers, provided by Math.random, to a normal PDF. This sum is trivially equal to \(n\) times the mean and so, by the result of derivation 1, must also be normally distributed. A standard uniform random variable U has a mean and variance of
\[ \begin{align*} E[U] &= \tfrac{1}{2}\\ E\left[\left(U - E[U]\right)^2\right] &= \tfrac{1}{12} \end{align*} \]
and consequently the sum must have a normal distribution with parameters
\[ \begin{align*} \mu &= n \times \tfrac{1}{2} = \frac{n}{2}\\ \sigma &= n \times \sqrt{\tfrac{1}{12}} \div \sqrt{n} = \sqrt{\frac{n}{12}} \end{align*} \]
Note that since the area under the histogram equals the number of observations and that under the PDF equals one, we must multiply the latter by the number of observations before we compare it to the former, as is done in program 1.

Program 1: Demonstrating The Central Limit Theorem

A pretty effective demonstration, I'm sure you'll agree!

ak.normalCF

The next distribution function is the normal CF, which we showed above is given by
\[ \hat{\phi}_{\mu,\sigma}(t) = e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \]
and its implementation is given in listing 4.

Listing 4: ak.normalCF
function cf(m, s, t) {
 var ts = t*s;
 var r  = Math.exp(-ts*ts/2);
 var mt = m*t;
 return r===0 ? ak.complex(0) : ak.complex(r*Math.cos(mt), r*Math.sin(mt));
}

ak.normalCF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(t){return cf(state.mu, state.sigma, t);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

Once again we are taking care to avoid returning NaN for infinite arguments by returning zero if r is equal to zero.

Finally, as noted above, the constructors object is used for all normal distribution function implementations. The CF function forwards to cf which explicitly constructs the result from its polar representation of magnitude r and argument mt.

ak.normalCDF

Things get a smidgeon more complicated when it comes to implementing the normal CDF since, as noted above, there's no explicit formula for it. Instead, we shall have to rely upon numerical approximation, as shown in listing 5.

Listing 5: ak.normalCDF
var SPLIT = 7.07106781186547;

var N0 = 220.206867912376;
var N1 = 221.213596169931;
var N2 = 112.079291497871;
var N3 = 33.912866078383;
var N4 = 6.37396220353165;
var N5 = 0.700383064443688;
var N6 = 3.52624965998911e-02;
var M0 = 440.413735824752;
var M1 = 793.826512519948;
var M2 = 637.333633378831;
var M3 = 296.564248779674;
var M4 = 86.7807322029461;
var M5 = 16.064177579207;
var M6 = 1.75566716318264;
var M7 = 8.83883476483184e-02;

function cdf(m, s, x) {
 var z = Math.abs(x-m)/s;
 var c = 0;
 var e, n, d, f;

 if(isNaN(z)) return ak.NaN;

 if(z<=37) {
  e = Math.exp(-z*z/2);
  if(z<SPLIT) {
   n = (((((N6*z + N5)*z + N4)*z + N3)*z + N2)*z + N1)*z + N0;
   d = ((((((M7*z + M6)*z + M5)*z + M4)*z + M3)*z + M2)*z + M1)*z + M0;
   c = e*n/d;
  }
  else {
   f = z + 1/(z + 2/(z + 3/(z + 4/(z + 13/20))));
   c = e/(RT2PI*f);
  }
 }
 return x<=m ? c : 1-c;
}

ak.normalCDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(x){return cdf(state.mu, state.sigma, x);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

The cdf function that does all of the work is an implementation of Hart's approximation[2] transcribed from West's visual basic implementation[3] into FORTRANScript.

Sorry! I meant JavaScript!

Functions like this are unfortunately a common sight in numerical computing. All too often they're implemented in FORTRAN (hence my choice of coding "style") and are riddled with magic numbers. Personally I try not to worry to much and instead rely upon extensive unit tests to satisfy myself that I haven't mistyped any of them. That and perform the magic dance around my desk during compilation, of course.
Whilst the constants are rather opaque, the structure of the approximation formulae are pretty standard. The first includes a term that can be thought of as the truncation of the ratio of two Taylor series
\[ \frac{a_0 + a_1 x + a_2 x^2 + \dots}{b_0 + b_1 x + b_2 x^2 + \dots} \]
and the second a term that is the truncation of a continued fraction representation of a function
\[ b_0 x + \dfrac{a_0}{b_1 x + \dfrac{a_1}{b_2 x + \dfrac{a_2}{b_3 x + \dots}}} \]
Approximations of these forms are often more accurate than simple truncated Taylor series approximations and we shall no doubt come across them again.

To demonstrate ak.normalCDF we first note that, for a CDF \(P\) of a random variable \(X\), we have
\[ \Pr(a \leqslant X \leqslant b) = P(b) - P(a) \]
We can use this to compare a histogram of the probabilities of a set of intervals to the PDF, as shown in program 2. Note that this time we multiply the PDF by the width of the intervals so that each point approximately equals the probability that an observation falls within an interval of that width centred at that point. We should consequently expect the graph to pass through the top of each bar close to its midpoint.

Program 2: ak.normalCDF

Once again, we have a pretty damn fine match confirming that ak.normalCDF is behaving as it should, at least for the parameters in question.

ak.normalInvCDF

Applying a numerical function inversion algorithm, none of which we have yet implemented, to the approximate CDF is not a particularly efficient solution as compared to approximating the inverse CDF in the same fashion as we did for the CDF itself. Specifically, we shall use Wichura's approximation[4] as shown in listing 6.

Listing 6: ak.normalInvCDF
var SPLIT1 = 0.425e0;
var SPLIT2 = 5.0e0;
var CONST1 = 0.180625E0;
var CONST2 = 1.6e0;

var A0 = 3.3871328727963666080e0;
var A1 = 1.3314166789178437745e+2;
var A2 = 1.9715909503065514427e+3;
var A3 = 1.3731693765509461125e+4;
var A4 = 4.5921953931549871457e+4;
var A5 = 6.7265770927008700853e+4;
var A6 = 3.3430575583588128105e+4;
var A7 = 2.5090809287301226727e+3;
var B1 = 4.2313330701600911252e+1;
var B2 = 6.8718700749205790830e+2;
var B3 = 5.3941960214247511077e+3;
var B4 = 2.1213794301586595867e+4;
var B5 = 3.9307895800092710610e+4;
var B6 = 2.8729085735721942674e+4;
var B7 = 5.2264952788528545610e+3;

var C0 = 1.42343711074968357734e0;
var C1 = 4.63033784615654529590e0;
var C2 = 5.76949722146069140550e0;
var C3 = 3.64784832476320460504e0;
var C4 = 1.27045825245236838258e0;
var C5 = 2.41780725177450611770e-1;
var C6 = 2.27238449892691845833e-2;
var C7 = 7.74545014278341407640e-4;
var D1 = 2.05319162663775882187e0;
var D2 = 1.67638483018380384940e0;
var D3 = 6.89767334985100004550e-1;
var D4 = 1.48103976427480074590e-1;
var D5 = 1.51986665636164571966e-2;
var D6 = 5.47593808499534494600e-4;
var D7 = 1.05075007164441684324e-9;

var E0 = 6.65790464350110377720e0;
var E1 = 5.46378491116411436990e0;
var E2 = 1.78482653991729133580e0;
var E3 = 2.96560571828504891230e-1;
var E4 = 2.65321895265761230930e-2;
var E5 = 1.24266094738807843860e-3;
var E6 = 2.71155556874348757815e-5;
var E7 = 2.01033439929228813265e-7;
var F1 = 5.99832206555887937690e-1;
var F2 = 1.36929880922735805310e-1;
var F3 = 1.48753612908506148525e-2;
var F4 = 7.86869131145613259100e-4;
var F5 = 1.84631831751005468180e-5;
var F6 = 1.42151175831644588870e-7;
var F7 = 2.04426310338993978564e-15;

function invCDF(m, s, c) {
 var q = c-0.5;
 var r, n, d, z;

 if(isNaN(q)) return ak.NaN;
 if(c<=0) return -ak.INFINITY;
 if(c>=1) return  ak.INFINITY;

 if(Math.abs(q) <= SPLIT1) {
  r = CONST1 - q*q;
  n = ((((((A7*r+A6)*r+A5)*r+A4)*r+A3)*r+A2)*r+A1)*r+A0;
  d = ((((((B7*r+B6)*r+B5)*r+B4)*r+B3)*r+B2)*r+B1)*r+1;
  z = q*n/d;
 }
 else {
  r = q<0 ? c : 1-c;
  r = Math.sqrt(-Math.log(r));

  if(r <= SPLIT2) {
   r -= CONST2;
   n = ((((((C7*r+C6)*r+C5)*r+C4)*r+C3)*r+C2)*r+C1)*r+C0;
   d = ((((((D7*r+D6)*r+D5)*r+D4)*r+D3)*r+D2)*r+D1)*r+1;
  }
  else {
   r -= SPLIT2;
   n = ((((((E7*r+E6)*r+E5)*r+E4)*r+E3)*r+E2)*r+E1)*r+E0;
   d = ((((((F7*r+F6)*r+F5)*r+F4)*r+F3)*r+F2)*r+F1)*r+1;
  }
  z = (q<0) ? -n/d : n/d;
 }
 return m + z*s;
}

ak.normalInvCDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(c){return invCDF(state.mu, state.sigma, c);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

We didn't have to wait so very long for some more examples of approximations based on ratios of polynomials!

To demonstrate ak.normalInvCDF we can simply plot the CDF of the inverse CDF over the interval \([0, 1]\) to confirm that it's a straight line from \([0, 0]\) to \([1, 1]\), as is done in program 3.

Program 3: ak.normalInvCDF

ak.normalRnd

We could just use the inverse CDF trick to turn a standard uniform random number into a normally distributed random number but, given the number of operations in ak.normalInvCDF, it wouldn't be particularly efficient. Fortunately there's a much more efficient approach, as illustrated in listing 7.

Listing 7: ak.normalRnd
function rnd(m, s, r, b) {
 var u0, u1, r2, c;

 if(b.full) {
  b.full = false;
  return b.z;
 }

 do {
  u0 = 2*r() - 1;
  u1 = 2*r() - 1;
  r2 = u0*u0 + u1*u1;
 }
 while(r2===0 || r2>=1);

 c = s*Math.sqrt(-2*Math.log(r2)/r2);
 b.full = true;
 b.z = m + u0*c;
 return m + u1*c;
}

ak.normalRnd = function() {
 var state = {mu: 0, sigma: 1, rnd: Math.random, buffer: {full: false, z: 0}};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(){return rnd(state.mu, state.sigma, state.rnd, state.buffer);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 f.rnd = function(){return state.rnd;};
 return Object.freeze(f);
};

Now, whilst ak.normalRnd pretty much follows our usual scheme for implementing distribution functions (state.buffer aside), it might not be immediately obvious what the rnd function upon which it depends is actually doing.
To understand it we first need the Box-Muller transform[5] which shows how to transform two independent standard uniform random numbers
\[ u_0 \sim U(0, 1)\\ u_1 \sim U(0, 1) \]
into two independent standard normal random numbers
\[ z_0 = \sqrt{-2 \ln u_0} \cos \left(2\pi u_1\right)\sim N(0, 1)\\ z_1 = \sqrt{-2 \ln u_0} \sin \left(2\pi u_1\right)\sim N(0, 1) \]
as proven in derivation 4.

Derivation 4: The Box-Muller Transform
First note that
\[ \begin{align*} z_0^2 + z_1^2 &= -2 \ln u_0 \left(\cos^2 \left(2\pi u_1\right) + \sin^2 \left(2\pi u_1\right)\right) = -2 \ln u_0\\ \frac{z_1}{z_0} &= \frac{\sin\left(2\pi u_1\right)}{\cos\left(2\pi u_1\right)} = \tan\left(2\pi u_1\right) \end{align*} \]
and hence
\[ \begin{align*} u_0 &= e^{-\tfrac{z_0^2 + z_1^2}{2}}\\ u_1 &= \frac{1}{2\pi}\arctan \frac{z_1}{z_0} \end{align*} \]
Applying integration by substitution to the product of the uniform CDFs we have
\[ \int_\mathbf{U} 1\,\mathrm{d}u_0 \mathrm{d}u_1 = \int_\mathbf{Z}1\,\left|\frac{\partial\left(u_0,u_1\right)}{\partial\left(z_0,z_1\right)}\right| \mathrm{d}z_0 \mathrm{d}z_1 = \int_\mathbf{Z}\left|\frac{\partial u_0}{\partial z_0}\frac{\partial u_1}{\partial z_1} - \frac{\partial u_0}{\partial z_1}\frac{\partial u_1}{\partial z_0}\right| \mathrm{d}z_0 \mathrm{d}z_1 \]
The partial derivatives of \(u_0\) and \(u_1\) with respect to \(z_0\) and \(z_1\) are
\[ \begin{array}{lllll} \frac{\partial u_0}{\partial z_0} = -z_0 e^{-\tfrac{z_0^2 + z_1^2}{2}}\\ \frac{\partial u_0}{\partial z_1} = -z_1 e^{-\tfrac{z_0^2 + z_1^2}{2}}\\ \frac{\partial u_1}{\partial z_0} = \frac{1}{2\pi} \frac{1}{1+\left(\frac{z_1}{z_0}\right)^2} \times -\frac{z_1}{z_0^2} &=& \frac{1}{2\pi} \frac{z_0^2}{z_0^2+z_1^2} \times -\frac{z_1}{z_0^2} &=& -\frac{1}{2\pi} \frac{z_1}{z_0^2+z_1^2}\\ \frac{\partial u_1}{\partial z_1} = \frac{1}{2\pi} \frac{1}{1+\left(\frac{z_1}{z_0}\right)^2} \times \frac{1}{z_0} &=& \frac{1}{2\pi} \frac{z_0^2}{z_0^2+z_1^2} \times \frac{1}{z_0} &=& \phantom{-}\frac{1}{2\pi} \frac{z_0}{z_0^2+z_1^2} \end{array} \]
and hence
\[ \begin{align*} \frac{\partial u_0}{\partial z_0}\frac{\partial u_1}{\partial z_1} - \frac{\partial u_0}{\partial z_1}\frac{\partial u_1}{\partial z_0} &= -z_0 e^{-\tfrac{z_0^2 + z_1^2}{2}} \times \frac{1}{2\pi} \frac{z_0}{z_0^2+z_1^2} - z_1 e^{-\tfrac{z_0^2 + z_1^2}{2}} \times \frac{1}{2\pi} \frac{z_1}{z_0^2+z_1^2}\\ &= -\frac{1}{2\pi} \frac{z_0^2 + z_1^2}{z_0^2+z_1^2} e^{-\tfrac{z_0^2 + z_1^2}{2}}\\ &= -\frac{1}{\sqrt{2\pi}} e^{-\tfrac{1}{2}z_0^2} \times \frac{1}{\sqrt{2\pi}}e^{-\tfrac{1}{2}z_1^2} \end{align*} \]
yielding
\[ \int_\mathbf{Z}\left|\frac{\partial u_0}{\partial z_0}\frac{\partial u_1}{\partial z_1} - \frac{\partial u_0}{\partial z_1}\frac{\partial u_1}{\partial z_0}\right| \mathrm{d}z_0 \mathrm{d}z_1 =\int_\mathbf{Z}\frac{1}{\sqrt{2\pi}} e^{-\tfrac{1}{2}z_0^2} \mathrm{d}z_0 \times \frac{1}{\sqrt{2\pi}}e^{-\tfrac{1}{2}z_1^2} \mathrm{d}z_1 \]
the product of two independent standard normal CDFs.

On the face of it the implementation of rnd doesn't seem to rely upon this result, but in fact it it is based on Bell's algorithm[6] for efficiently applying the transform. To see how, first assume that we have a uniform source of random points in the unit circle, in the sense that the probability that a point will fall within a region is proportional to the area of that region.

Figure 2: A Point In The Unit Circle
For such a point \((x, y)\) we label its angle to the positive \(x\) axis with \(\theta\) and its distance to the origin with \(r\), as illustrated in figure 2. Now \(\theta\) is uniformly distributed over \([0, 2\pi]\) and \(r^2\) is uniformly distributed over \([0, 1]\) (since the area of a circle of radius \(r\) is equal to \(\pi r^2\)).
Noting that by the definitions of trigonometric functions
\[ \begin{align*} \cos \theta &= \frac{x}{r}\\ \sin \theta &= \frac{y}{r} \end{align*} \]
we can rewrite the Box-Muller transform as
\[ \begin{align*} z_0 &= \sqrt{-2 \ln r^2} \frac{x}{r} = \sqrt{-2 \frac{\ln r^2}{r^2}} x\\ z_1 &= \sqrt{-2 \ln r^2} \frac{y}{r} = \sqrt{-2 \frac{\ln r^2}{r^2}} y \end{align*} \]
and replace the comparatively costly trigonometric function calls with a single division. All that remains to be done is to work out how to uniformly draw points from within the unit circle. Doing so directly would require trigonometric functions, rather defeating the point of the algorithm. Instead we draw points from the square enclosing it, trivially done with a pair of observations of \(U(-1,1)\), and throw away any that do not lie within it, as done in the do loop in rnd.
Finally, note that since we generate normal random numbers in pairs we save the spare one in the buffer member of the state object so we can return it on the next call.

Program 4 compares a histogram of values returned by ak.normalRnd with a suitably scaled graph of the normal PDF and will hopefully satisfy you that they are indeed normally distributed.

Program 4: Using ak.normalRnd

Note that all of these normal distribution functions can be found in NormalDistribution.js.

References

[1] What Are The Chances Of That?, www.thusspakeak.com, 2014.

[2] Hart, J., Algorithm 5666 For The Error Function, Computer Approximations, Wiley, 1968.

[3] West, G., Better Approximations To Cumulative Normal Functions, Wilmott Magazine 70–76, 2009.

[4] Wichura, M., Algorithm AS 241: The Percentage Points Of The Normal Distribution, Applied Statistics, Vol. 3, No. 3, pp. 477-484, Royal Statistical Society, 1988.

[5] Box, G. & Muller, M., A Note On The Generation Of Random Normal Deviates, Annals Of Mathematical Statistics, Vol. 29, No. 2, pp. 610-611, Institute Of Mathematical Statistics, 1958.

[6] Bell, J., Algorithm 334: Normal random deviates, Communications of the ACM, Vol. 11, No. 7, ACM, 1968.

Leave a comment