Characteristically Tricksy

| 0 Comments

In the last post[1] I introduced the uniform distribution and some of the functions associated with it. I specifically advised you to ignore the ak.uniformCF function in UniformDistribution.js, deferring an explanation to this post.
So here we go...

The function in question is an implementation of the characteristic function of the uniform distribution. The characteristic function, or CF, of a distribution is the expected value of \(e^{itX}\) for a random variable \(X\) with that distribution and is given by
\[ \hat{f}(t) = \mathrm{E}\left[e^{itX}\right] = \int_{-\infty}^{+\infty} e^{itx} f(x) \mathrm{d}x \]
where \(f\) is the probability density function, or PDF, of the distribution and \(i\) is the square root of minus one. For the uniform distribution it equals
\[ \begin{align*} \hat{u}_{a,b}(t) &= \int_{-\infty}^{+\infty} e^{itx} u_{a,b}(x) \mathrm{d}x\\ &= \int_{a}^{b} e^{itx} u_{a,b}(x) \mathrm{d}x\\ &= \int_{a}^{b} e^{itx} \frac{1}{b-a} \mathrm{d}x\\ &= \left[\frac{1}{it} e^{itx} \frac{1}{b-a}\right]_a^b\\ &= \frac{e^{itb} - e^{ita}}{it(b-a)} \end{align*} \]
Derivation 1: The Limiting Case
Given
\[ \begin{align*} \mu &= \frac{ta+tb}{2}\\ \delta &= \frac{tb-ta}{2} \end{align*} \]
then
\[ \begin{align*} ta &= \mu-\delta\\ tb &= \mu+\delta \end{align*} \]
and
\[ \hat{u}_{a,b}(t) = \frac{e^{itb} - e^{ita}}{it(b-a)} = \frac{e^{i\mu + i\delta} - e^{i\mu - i\delta}}{2i\delta} = e^{i\mu} \frac{e^{i\delta} - e^{-i\delta}}{2i\delta} \]
Applying L'Hopital's rule as \(\delta\) tends to zero yields
\[ \begin{align*} \lim_{\delta \to 0} \; \hat{u}_{a,b}(t) &= e^{i\mu} \frac{ie^{i\delta} + ie^{-i\delta}}{2i} = e^{i\mu} \frac{e^{i\delta} + e^{-i\delta}}{2}\\ &= \frac{e^{i\mu+i\delta} + e^{i\mu-i\delta}}{2} = \frac{e^{itb} + e^{ita}}{2} \end{align*} \]
In this form it will yield NaN when \(t\) equals zero, when it should instead yield one. This can easily be fixed by treating zero as a special case, but unfortunately when \(ta\) and \(tb\) are almost equal then the more significant figures in the floating point numbers representing the exponentials in the numerator will cancel out, introducing a potentially significant loss of precision.
Fortunately we can fix this too by deriving another formula for the characteristic function that's only accurate when \(ta\) and \(tb\) are almost equal. Specifically, if we define
\[ \delta = \frac{tb-ta}{2} \]
then as it tends to zero we have
\[ \lim_{\delta \to 0} \; \hat{u}_{a,b}(t) = \frac{e^{itb} + e^{ita}}{2} \]
as proven in derivation 1 and used in listing 1 when the difference between \(ta\) and \(tb\) is small enough to require it.

Listing 1: ak.uniformCF
var cf_eps = Math.sqrt(ak.EPSILON);

function cf(a, b, t) {
 var ta = t*a;
 var tb = t*b;
 var di = tb-ta;
 
 if(!isFinite(di)) return isNaN(t) ? ak.complex(ak.NaN, ak.NaN) : ak.complex(0);
 return Math.abs(di)>cf_eps ? ak.complex((Math.sin(tb)-Math.sin(ta))/di,
                                         (Math.cos(ta)-Math.cos(tb))/di)
                            : ak.complex((Math.cos(tb)+Math.cos(ta))/2,
                                         (Math.sin(tb)+Math.sin(ta))/2);
}

ak.uniformCF = function() {
 var state = {a: 0, b: 1};
 var arg0  = arguments[0];
 var a, b, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(t){return cf(state.a, state.b, t);};
 f.a = function(){return state.a;};
 f.b = function(){return state.b;};

 return Object.freeze(f);
};

Note that we're exploiting the fact that the exponentials of purely imaginary numbers can be defined in terms of trigonometric functions with
\[ e^{i\theta} = \cos \theta + i \sin \theta \]
Furthermore, for the non-limiting case, the denominator is also purely imaginary and so we must rearrange the sines and cosines along the lines of
\[ \frac{e^{i\theta}}{i} = \frac{\cos \theta + i \sin \theta}{i} = \frac{i}{i^2} \cos \theta + \sin \theta = \sin \theta - i \cos \theta \]
We are also taking care to treat infinite arguments as a special case since both Math.cos and Math.sin will return NaN for them which would yield a result of NaN rather than the correct result of zero.

Finally, we're once again we're using the ak.uniformRnd constructors to initialise the state of ak.uniformCF.

Program 1 plots the magnitude of the results of the uniform CF for a range of arguments.

Program 1: ak.uniformCF

Now this is all well and good but it doesn't really explain why we should be interested in such functions. If you think back a few posts[2] you'll recognise that the CF is rather similar to the Fourier transform. If we have a pair of random variables \(X\) and \(Y\), then the PDF of their sum is given by a convolution of their PDFs, \(f_X\) and \(f_Y\)
\[ \int_{-\infty}^{+\infty} f_X(x-y) f_Y(y) \mathrm{d}y \]
and by the properties of the Fourier transform, the CF of the sum of a pair of random variables is therefore equal to the product of their CFs, as demonstrated by program 2.

Program 2: CFs Of Sums

Approximate PDFs From CFs

Given that the CF \(\hat{f}\) takes the form of a Fourier transform, a tempting idea is to use a discrete Fourier transform, or DFT, to recover a sample of the PDF \(f\) which we can use as a discrete approximation of it.
Specifically, if for some sample midpoint \(\bar{x}\), number of samples \(N\) and distance between them \(\delta\) we choose
\[ \begin{align*} x_j &= \bar{x} + \delta\left(j-\tfrac{1}{2}(N-1)\right)\\ z_k &= e^{-2 \pi i \left(\tfrac{\bar{x}}{\delta}-\tfrac{1}{2}(N-1)\right) k/N} \hat{f}\left(\frac{2\pi}{N\delta}k\right)\\ f^\prime_j &= \Re\left(\mathrm{DFT}(z)_j\right)\\ f_j &= \frac{f^\prime_j}{\sum_{k=0}^{N-1} f^\prime_k} \end{align*} \]
where \(\sum\) is the summation sign and \(\Re\) is the real part of its argument, we have
\[ \forall x \in \left[x_j - \tfrac{1}{2}\delta, x_j + \tfrac{1}{2}\delta\right) \quad f(x) \approx \frac{f_j}{\delta} \]
where \(\forall\) means for all and \(\in\) means the value to its left is in the interval to its right, as shown in derivation 2.

Derivation 2: Approximating The PDF
By the properties of the Fourier transform we have
\[ f(x) = \frac{1}{2\pi}\int_{-\infty}^{+\infty} e^{-itx} \hat{f}(t) \mathrm{d}t \]
By definition \(f(x)\) is real valued, so the imaginary part of the integral is zero and the real part of the integrand is symmetric about zero and consequently
\[ f(x) = \frac{1}{\pi}\Re\left(\int_{0}^{+\infty} e^{-itx} \hat{f}(t) \mathrm{d}t\right) \approx \frac{1}{\pi}\Re\left(\sum_{k=0}^{N-1} e^{-i k \Delta x} \hat{f}(k \Delta) \Delta\right) = \frac{\Delta}{\pi}\Re\left(\sum_{k=0}^{N-1} e^{-i k \Delta x} \hat{f}(k \Delta)\right) \]
Defining
\[ \begin{align*} x_j &= \bar{x} + \delta\left(j-\tfrac{1}{2}(N-1)\right)\\ f^\prime_j &= f\left(x_j\right) \approx \frac{\Delta}{\pi}\Re\left(\sum_{k=0}^{N-1} e^{-i k \Delta \left(\bar{x} + \delta\left(j-\tfrac{1}{2}(N-1)\right)\right)} \hat{f}(k \Delta)\right)\\ &\phantom{= f\left(x_j\right)} = \frac{\Delta}{\pi}\Re\left(\sum_{k=0}^{N-1} e^{-i k j \Delta \delta} e^{-i k \Delta \left(\bar{x} - \tfrac{1}{2}(N-1)\delta\right)} \hat{f}(k \Delta)\right) \end{align*} \]
If we then choose
\[ \begin{align*} \Delta &= \frac{2\pi}{N\delta}\\ z_k &= e^{-ik\Delta \left(\bar{x} - \tfrac{1}{2}(N-1)\delta\right)}\hat{f}(k\Delta) = e^{-2 \pi i \left(\tfrac{\bar{x}}{\delta} - \tfrac{1}{2}(N-1)\right) k/N}\hat{f}\left(\frac{2\pi}{N\delta}k\right) \end{align*} \]
we have
\[ f^\prime_j = \frac{2}{N\delta}\Re\left(\sum_{k=0}^{N-1} e^{-2 \pi i j k /N}z_k\right) = \frac{2}{N\delta}\Re\left(\mathrm{DFT}(z)_j\right) \]
Treating the distribution as if it has constant density within a width \(\delta\) interval centred on an \(x_j\), we find that the probability that an observation will fall within that interval is approximately
\[ f^\prime_j \times \delta = \frac{2}{N}\Re\left(\mathrm{DFT}(z)_j\right) \]
If the intervals are to represent a valid distribution their probabilities must sum to one and we must consequently normalise the terms, effectively ignoring observations that fall outside of the sample range and allowing us to drop the shared constant factor in \(f^\prime_j\)
\[ \begin{align*} f^\prime_j &= \Re\left(\mathrm{DFT}(z)_j\right)\\ f_j &= \frac{f^\prime_j}{\sum_{k=0}^{N-1} f^\prime_k} \end{align*} \]
This yields the approximate probability that an observation will fall within the interval, conditional upon it falling within the sample range, from which the result trivially follows.

Strictly speaking \(f_j\) are samples of the distribution that results from ignoring observations outside of the sample range, known as a conditional distribution. Note that there is an implicit trade off between the width of the intervals, \(\delta\), and the accuracy of the approximate probability of each interval, controlled by the step width of the integral approximation in derivation 1, \(\frac{2\pi}{N\delta}\).
Furthermore, since we are effectively discarding values that fall outside of the sample range we also want to minimise the probability that such values would have been observed.
The values of \(\bar{x}\), \(\delta\) and \(N\) must be chosen carefully so as to minimise the errors that result from these approximations. We should most certainly want \(\delta\) and \(\Delta\) to be roughly the same order of magnitude, so it will be convenient to choose
\[ \delta = \frac{w}{\sqrt{N}} \]
for some \(w\) related to the width of the range of the PDF that we wish to recover.

Approximate CDFs From CFs

Now that we have an approximation of the PDF \(f\), approximating the associated cumulative distribution function, or CDF, \(F\) is simply a matter of integrating it. The integral is particularly straightforward since the approximate PDF is constant within each interval.
\[ F(x) \approx \begin{cases} 0\quad & x < x_0-\tfrac{1}{2}\delta\\ \sum_{j=0}^{i-1}f_j + \left(x-\left(x_i-\tfrac{1}{2}\delta\right)\right)\frac{f_i}{\delta}\quad & x_i-\tfrac{1}{2}\delta \leqslant x < x_i+\tfrac{1}{2}\delta\\ 1\quad & x ⩾ x_{N-1}+\tfrac{1}{2}\delta \end{cases} \]

ak Fourier Distribution Functions

Having derived approximations for the PDF and CDF in terms of the CF it is a relatively simple task to implement functions to calculate them. The first thing we need is a function to compute the terms of \(f\), given in listing 2.

Listing 2: ak Fourier Distribution Constructor
function constructor(cf, x, w, n) {
 var state = {cf: cf, x: Number(x), w: Number(w), n: ak.floor(n)};
 var N = Math.pow(2, state.n);
 var z = new Array(N);
 var eps = 1/Math.sqrt(N);
 var dx, dt, df, k, t, y, s, m, l, u, f;

 if(ak.nativeType(cf)!==ak.FUNCTION_T) {
  throw new TypeError('invalid CF in ak fourier distribution');
 }

 if(!isFinite(state.x)) {
  throw new TypeError('invalid mid point in ak fourier distribution');
 }

 if(!isFinite(state.w) || state.w<=0) {
  throw new TypeError('invalid range in ak fourier distribution');
 }

 if(!isFinite(N) || N<2) {
  throw new TypeError('invalid bit count in ak fourier distribution');
 }

 x  = state.x;
 dx = state.w * eps;

 dt = -2*ak.PI*(x/dx - (N-1)/2)/N;
 df =  2*ak.PI/(N*dx);

 for(k=0;k<N;++k) {
  t = k*dt;
  y = ak.complex(Math.cos(t), Math.sin(t));
  z[k] = ak.mul(y, cf(df*k));
 }
 f = ak.fft(z);

 m = 0;
 for(k=0;k<N;++k) {
  y = Math.max(f[k].re(), 0);
  f[k] = y;
  if(y>m) m = y;
 }

 s = 0;
 for(k=0;k<N;++k) {
  y = f[k];
  if(y<m*eps) f[k] = 0;
  else        s += y;
 }

 for(k=0;k<N;++k) f[k] /= s;

 for(l=0;l<N-1 && f[l]===0;++l);
 for(u=N-1;u>l && f[u]===0;--u);

 if(l===u) throw new TypeError('empty terms in ak fourier distribution');

 state.dx = dx;
 state.f  = f.slice(l, u+1);
 state.o  = x - N*dx/2 + l*dx;

 return state;
}

This function acts as the core constructor for the distribution functions, populating the state object with the input parameters and the terms of \(f\). Since we will always need the CF \(\hat{f}\), the midpoint \(\bar{x}\), the range width \(w\) and the number of samples \(N\), there's no need to use our usual dispatch mechanism.
The first three are given by the arguments cf, x and w but the last is defined as the power of two of the integer part of n so as to avoid wasted effort in the calculation of the DFT. Once again we are explicitly calculating the exponential term in \(z_k\) with trigonometric functions since its argument is purely imaginary.
Note that we're performing an additional operation on the result of ak.fft in which we truncate all terms smaller than eps times the maximum value to zero. The reason for this is that approximation error will result in regions that should have zero, or extremely small, probabilities instead having small probabilities which must therefore be subtracted from regions that have non-zero, or relatively large, probabilities. Since the latter are those that we are actually interested in, a pragmatic approach is to zero out those that might be purely a result of approximation error.
Having done this we then strip leading and trailing zeros by slicing from the first to the last non-zero term, which both saves memory and simplifies the implementation of the distribution functions, after which we store the lower bound of the new first term, given by \(x_0 - \tfrac{1}{2}N\delta\ + l\delta\), in state.o.

ak.fourierPDF

Having so computed the terms of \(f\) the implementation of the PDF is trivial, as shown in listing 3.

Listing 3: ak.fourierPDF
function pdf(state, x) {
 var i = ak.floor((x-state.o)/state.dx);
 return i>=0 && i<state.f.length ? state.f[i] : 0;
}

ak.fourierPDF = function(cf, x, w, n) {
 var state = constructor(cf, x, w, n);
 var dx = state.dx;
 var f;

 state.f.forEach(function(x, i, a){a[i] = x/dx;});

 f = function(x){return pdf(state, x);};
 f.cf = function(){return state.cf;};
 f.x  = function(){return state.x;};
 f.w  = function(){return state.w;};
 f.n  = function(){return state.n;};
 return Object.freeze(f);
};

Note that after the state is initialised with the terms of \(f\) we divide them by \(\delta\) so that we don't need to do so in the pdf function. Also, much as we did with the uniform distribution functions, we provide access to the properties of the PDF via methods added to the function f.

To illustrate ak.fourierPDF we'll use it to approximate the PDF of the sum of two uniform random numbers drawn from \(U(a_0,b_0)\) and \(U(a_1,b_1)\) given by
\[ p(x) = \begin{cases} 0\quad & x < a_0+a_1\\ \dfrac{\min(x-a_0, b_1) - \max(x-b_0, a_1)}{\left(b_0-a_0\right)\left(b_1-a_1\right)}\quad & a_0+a_1\leqslant x ⩽ b_0+b_1\\ 0\quad & x > b_0+b_1 \end{cases} \]
as demonstrated by derivation 3.

Derivation 3: The PDF Of The Sum Of Two Uniforms
The PDF of the sum of a pair of random variables drawn from \(U(a_0,b_0)\) and \(U(a_1,b_1)\) is trivially zero for values outside the interval \(\left[a_0+a_1, b_0+b_1\right]\). Within this interval, it is given by the convolution of the PDFs of the summed variables
\[ p(x) = \int_{-\infty}^{+\infty} u_{a_0,b_0}(x-y) u_{a_1,b_1}(y) \mathrm{d}y \]
Since the uniform PDF is zero outside of its range, the integrand must be zero unless
\[ \begin{array}{ccccc} a_0 &\leqslant &x-y &\leqslant &b_0\\ a_1 &\leqslant &y &\leqslant &b_1 \end{array} \]
or, equivalently
\[ \begin{array}{ccccc} x-b_0 &\leqslant &y &\leqslant &x-a_0\\ a_1 &\leqslant &y &\leqslant &b_1 \end{array} \]
and therefore
\[ \max(x-b_0, a_1) \leqslant y \leqslant \min(x-a_0, b_1) \]
giving
\[ \begin{align*} p(x) &= \int_{\max(x-b_0, a_1)}^{\min(x-a_0, b_1)} \frac{1}{b_0-a_0} \frac{1}{b_1-a_1} \mathrm{d}y =\left[\frac{y}{\left(b_0-a_0\right)\left(b_1-a_1\right)}\right]_{\max(x-b_0, a_1)}^{\min(x-a_0, b_1)}\\ &=\frac{\min(x-a_0, b_1) - \max(x-b_0, a_1)}{\left(b_0-a_0\right)\left(b_1-a_1\right)} \end{align*} \]

Armed with this formula we can compare it with an ak.fourierPDF initialised with the product of the CFs of the two uniform distributions, as is done in program 3.

Program 3: ak.fourierPDF

Clearly the ak.fourierPDF approximates the actual PDF. If you have the patience, try setting n to 16 to see just how good an approximation it yields.

ak.fourierCDF

Recall that to approximate the CDF from the CF we simply integrate the approximate PDF. In the formulation above this involves summing the probabilities of the intervals strictly below the argument and adding the fraction of the interval containing it that lies below it. We can improve the performance of this by calculating the partial sums in advance and using the difference between the sums for an interval and the one before it to recover the former's probability, as shown in listing 4.

Listing 4: ak.fourierCDF
function accumulate(state) {
 var f  = state.f;
 var n1 = f.length-1;
 var s  = 0;
 var i;

 for(i=0;i<n1;++i) {
  s += f[i];
  f[i] = s;
 }
 f[n1] = 1;

 return state;
}

function cdf(state, x) {
 var i = ak.floor((x-state.o)/state.dx);
 var x0, f0, f1;

 if(i<0) return 0;
 if(i>=state.f.length) return 1;

 x0 = state.o + i*state.dx;
 f0 = i>0 ? state.f[i-1] : 0;
 f1 = state.f[i];

 return f0 + (x-x0)*(f1-f0)/state.dx;
}

ak.fourierCDF = function(cf, x, w, n) {
 var state = accumulate(constructor(cf, x, w, n));
 var f = function(x){return cdf(state, x);};

 f.cf = function(){return state.cf;};
 f.x  = function(){return state.x;};
 f.w  = function(){return state.w;};
 f.n  = function(){return state.n;};
 return Object.freeze(f);
};

Here the accumulate function computes the partial sums and the cdf function calculates the approximation as described above.
Program 4 compares the ak.fourierCDF of an ak.uniformCF with the associated ak.uniformCDF.

Program 4: ak.fourierCDF

Once again, not too shabby!

ak.fourierInvCDF

Next up is the inverse CDF. We could numerically invert the approximate CDF, but this would be rather computationally expensive. Instead, we can exploit the fact that for a linear function the inverse is also linear; specifically if
\[ f(x) = ax + b \]
then
\[ f^{-1}(y) = \frac{1}{a}y - \frac{b}{a} \]
as is trivially demonstrated with
\[ \begin{align*} f^{-1}(f(x)) &= \frac{1}{a}(ax+b) - \frac{b}{a}\\ &= x + \frac{b}{a} - \frac{b}{a}\\ &= x \end{align*} \]
We can therefore find the approximate inverse of the CDF for a given probability by working out how far into a particular interval we must go and then linearly interpolate. This enables us to equally space the intervals of the inverse CDF for fast lookup, although we must of course accept that this means that the approximate inverse CDF is not identically equal to the inverse of the approximate CDF since the latter will not generally have equally spaced results.
Listing 5 illustrates the calculation of the samples of the approximate inverse CDF.

Listing 5: Inverting The Approximate CDF
function invert(state) {
 var f  = state.f;
 var n1 = f.length-1;
 var o  = state.o;
 var dx = state.dx;
 var c0 = f[0];
 var c1 = f[1];
 var x  = new Array(n1+1);
 var i, j, c;

 x[0] = state.o;

 for(i=1,j=1;i<n1;++i) {
  c = i/n1;
  while(c1<=c) {
   c0 = c1;
   c1 = f[++j];
  }
  x[i] = c1!==c0 ? o+j*dx + dx*(c-c0)/(c1-c0) : o+j*dx + dx/2;
 }

 x[n1] = state.o + (n1+1)*dx;

 state.f 2= x;
 return state;
}

Note that here the first element of x is set to the lower bound and the last is set to the upper bound. As we initialise the intermediate elements we iterate over the approximate CDF, stored in state.f, looking for the interval which contains the probability in question. Finally we assign the approximate inverse of the CDF value to that fraction of the interval that falls below that probability.
Having replaced the approximate CDF with its approximate inverse, we can compute ak.fourierInvCDF in much the same fashion as we did for ak.fourierCDF, as shown in listing 6.

Listing 6: ak.fourierInvCDF
function invCDF(state, c) {
 var f = state.f;
 var n = f.length;
 var d = c*(n-1);
 var j = ak.floor(d);

 d -= j;

 if(c<=0) return f[0];
 if(c>=1) return f[n-1];

 return (1-d)*f[j] + d*f[j+1];
}

ak.fourierInvCDF = function(cf, x, w, n) {
 var state = invert(accumulate(constructor(cf, x, w, n)));
 var f = function(c){return invCDF(state, c);};

 f.cf = function(){return state.cf;};
 f.x  = function(){return state.x;};
 f.w  = function(){return state.w;};
 f.n  = function(){return state.n;};
 return Object.freeze(f);
};

Program 5 compares the ak.fourierInvCDF of an ak.uniformCF with the associated ak.uniformInvCDF.

Program 5: ak.fourierInvCDF

Three for three, I'd say!

ak.fourierCF

Um. The CF is an input and therefore must already be defined.
Moving on...

ak.fourierRnd

So the final distribution function we need to implement is a random number generator. We shall do so with a trick that is one of the most useful applications of the inverse CDF. Since it returns the value below which an observation will lie with a given probability, we can simply pass it a random probability to get a correctly distributed random number and, since such probabilities must lie between zero and one, we can generate them with standard uniform generator like Math.random.
Listing 7 gives just such an implementation.

Listing 7: ak.fourierRnd
ak.fourierRnd = function(cf, x, w, n, rnd) {
 var inv = ak.fourierInvCDF(cf, x, w, n);
 var f;

 if(ak.nativeType(rnd)===ak.UNDEFINED_T) rnd = Math.random;
 if(ak.nativeType(rnd)!==ak.FUNCTION_T) {
  throw new TypeError('invalid RNG in ak.fourierRnd');
 }

 f = function(){return inv(rnd());};
 f.cf  = inv.cf;
 f.x   = inv.x;
 f.w   = inv.w;
 f.n   = inv.n;
 f.rnd = function(){return rnd;};
 return Object.freeze(f);
};

Note that if the rnd argument is missing we replace it with Math.random and that otherwise it is assumed to be a source of uniform random numbers between zero and one.
The methods that return the parameters of ak.fourierRnd are trivially identical to those of the inverse CDF with which it's implemented, apart from rnd which simply returns the source of the uniform random numbers.

Now these functions, which can be found in FourierDistribution.js, are a supremely neat use of characteristic functions, but they are by no means the most important, as we shall see in the next post.

References

[1] What Are The Chances Of That?, www.thusspakeak.com, 2014.

[2] What's The Frequency, Joseph?, www.thusspakeak.com, 2014.

Leave a comment