Kth Time's The Charm

Last time[1] we proved that if the waiting time for an event does not depend upon how long we have already been waiting then it must be exponentially distributed. I promised that there was a lot more to say about such memoryless waiting time processes and in this post we shall begin to investigate them further. Specifically, we shall ask how long must we wait for more than one event to occur?

First off, note that the waiting time for a second exponentially distributed event is simply the sum of two independent observations of an exponential random variable.
$X_0 + X_1 | X_0 \sim Exp(\lambda), X_1 \sim Exp(\lambda)$
where the vertical bar means given and the tilde means drawn from. The PDF of the sum of two independent observations of a random variable is given by the convolution
Derivation 1: The Waiting Time For Two Events
Firstly, we have
$p_{2,\lambda}(x) = \int_{-\infty}^{+\infty} p_\lambda(y) \; p_\lambda(x-y) \;\mathrm{d}y$
The exponential PDF is zero for negative arguments, so the integrand is zero when $$y$$ is less than zero or greater than $$x$$ and the integral can be simplified to
\begin{align*} p_{2,\lambda}(x) &= \int_0^x p_\lambda(y) \; p_\lambda(x-y) \;\mathrm{d}y\\ &= \int_0^x \lambda e^{-\lambda y} \times \lambda e^{-\lambda(x-y)}\;\mathrm{d}y\\ &= \lambda^2 e^{-\lambda x}\int_0^x 1\;\mathrm{d}y\\ &= \lambda^2 x e^{-\lambda x} \end{align*}

Derivation 2: The Waiting Time For Three Events
Since neither waiting time can be negative, both PDFs must be zero for negative arguments and so the bounds of the convolution integral are again from zero to $$x$$

\begin{align*} p_{3,\lambda}(x) &= \int_0^x p_{2,\lambda}(y) \; p_\lambda(x-y) \;\mathrm{d}y\\ &= \int_0^x \lambda^2 y e^{-\lambda y} \times \lambda e^{-\lambda (x-y)} \;\mathrm{d}y\\ &= \lambda^3 e^{-\lambda x} \int_0^x y \;\mathrm{d}y\\ &= \tfrac{1}{2}\lambda^3 x^2 e^{-\lambda x} \end{align*}

Derivation 3: The General Formula
For the same reason as before, the bounds of the convolution integral are zero and $$x$$
\begin{align*} p_{k+1,\lambda}(x) &= \int_0^x p_{k,\lambda}(y) \; p_\lambda(x-y) \;\mathrm{d}y\\ &= \int_0^x \frac{\lambda^k y^{k-1} e^{-\lambda y}}{(k-1)!} \times \lambda e^{-\lambda(x-y)} \;\mathrm{d}y\\ &= \frac{\lambda^{k+1} e^{-\lambda x}}{(k-1)!} \int_0^x y^{k-1} \;\mathrm{d}y\\ &= \frac{\lambda^{k+1} e^{-\lambda x}}{(k-1)!} \times \frac{x^k}{k}\\ &= \frac{\lambda^{k+1} x^k e^{-\lambda x}}{k!} \end{align*}
$p_2(x) = \int_{-\infty}^{+\infty} p(y) \times p(x-y) \;\mathrm{d}y$
where $$p$$ is the PDF of that variable. For the exponential distribution, this yields
$p_{2,\lambda}(x) = \lambda^2 x e^{-\lambda x}$
as shown by derivation 1.

To find the distribution for the waiting time for three events, we simply note that its random variable is the sum of the two event random variable and another exponential random variable and so its PDF is given by the convolution
\begin{align*} p_{3,\lambda}(x) &= \int_0^x p_{2,\lambda}(y) \times p_\lambda(x-y) \;\mathrm{d}y\\ &= \tfrac{1}{2} \lambda^3 x^2 e^{-\lambda x} \end{align*}
as proven in derivation 2.

If we carry on in the same fashion we'll notice a pattern emerge
$p_{k,\lambda}(x) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{(k-1)!}$
where the exclamation mark represents the factorial.
For a single event we recover the exponential PDF
$p_{1,\lambda}(x) = \frac{\lambda x^{0} e^{-\lambda x}}{0!} = \lambda e^{-\lambda x} = p_\lambda(x)$
since both the zeroth power and the zeroth factorial equal one, and, given that derivation 3 proves that if it is true for $$k$$ events then it is true for $$k+1$$, it must be true for any positive number of events.

This is known as the Erlang distribution and is a special case of the gamma distribution, $$Gamma(k, \lambda)$$, for integer $$k$$.

The Gamma Distribution

If we replace the factorial in the PDF with the gamma function[2] we have
$\gamma_{k,\lambda}(x) = \begin{cases} \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)} & x \geqslant 0\\ 0 & x < 0 \end{cases}$
which is a valid PDF for positive $$\lambda$$ and any positive $$k$$, integer or not. We can prove this by defining
$\Gamma_{k,\lambda}(x) = \begin{cases} \frac{\gamma(k, \lambda x)}{\Gamma(k)} & x \geqslant 0\\ 0 & x < 0 \end{cases}$
where $$\gamma(s, x)$$ is the lower incomplete gamma function, satisfying
\begin{align*} \gamma(s, x) &= \int_0^x t^{s-1}e^{-t} \;\mathrm{d}t\\ \gamma(s, 0) &= 0\\ \gamma(s, \infty) &= \Gamma(s)\\ x_2 \geqslant x_1 &\implies \gamma(s, x_2) \geqslant \gamma(s, x_1) \end{align*}
The last three properties mean that $$\Gamma_{k,\lambda}(x)$$ meets the requirements for a CDF and if we differentiate it with respect to $$x$$ we'll get its associated PDF
\begin{align*} \frac{\mathrm{d}}{\mathrm{d}x} \Gamma_{k,\lambda}(x) &= \frac{1}{\Gamma(k)} \frac{\mathrm{d}}{\mathrm{d}x} \int_0^{\lambda x} t^{k-1}e^{-t} \;\mathrm{d}t = \frac{1}{\Gamma(k)} \frac{\mathrm{d \lambda x}}{\mathrm{d}x} \frac{\mathrm{d}}{\mathrm{d}\lambda x} \int_0^{\lambda x} t^{k-1}e^{-t} \;\mathrm{d}t\\ &= \frac{1}{\Gamma(k)} \lambda (\lambda x)^{k-1}e^{-\lambda x} = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)} = \gamma_{k,\lambda}(x) \end{align*}
Note that the CDF is simply a regularised lower incomplete gamma function, which we have already implemented with ak.gammaP, as used by ak.gammaCDF in listing 1.

Listing 1: ak.gammaCDF
ak.gammaCDF = function() {
var state = {k: 1, lambda: 1};
var arg0  = arguments[0];
var f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);

f = function(x){return x<=0 ? 0 : ak.gammaP(state.k, state.lambda*x);};
f.k = function(){return state.k;};
f.lambda = function(){return state.lambda;};
return Object.freeze(f);
};

var constructors = {};


As usual, we initialise the state object containing the parameters of the distribution by dispatching to a constructors object, the relevant parts of which for the CDF are given in listing 2.

Listing 2: The constructors Object
constructors[ak.UNDEFINED_T] = function() {
};

constructors[ak.NUMBER_T] = function(state, k, args) {
var arg1 = args[1];
constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1, args);

state.k = Number(k);
if(state.k<=0 || !isFinite(state.k)) {
throw new Error('invalid k in ak.gamma distribution');
}
};

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, lambda, args) {
var arg2 = args[2];
constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, arg2);

state.lambda = Number(lambda);
if(state.lambda<=0 || !isFinite(state.lambda)) {
throw new Error('invalid lambda in ak.gamma distribution');
}
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
};


Here we default to the initial state of $$k$$ and $$\lambda$$ both equal to one, with one argument defining $$k$$ and two defining first $$k$$ and then $$\lambda$$.

With the PDF
$\gamma_{k,\lambda}(x) = \frac{\lambda^k x^{k-1} e^{-\lambda x}}{\Gamma(k)}$
we run the risk, for large $$k$$, of both the numerator and the denominator overflowing when they would have cancelled out to something small enough to fit in a floating point number. It was for just such situations that we implemented the logarithm of the gamma function with ak.logGamma, which we can exploit by rewriting the PDF as
$\gamma_{k,\lambda}(x) = e^{k \ln \lambda + (k-1) \ln x -\lambda x - \ln \Gamma(k)}$
so that such terms cancel out on a logarithmic scale, well before overflow sets in. We have to be slightly careful when $$k$$ equals one since, when $$x$$ equals zero, the term
$(k-1) \ln x = 0 \times -\infty;$
is undefined or, in the language of IEEE floating point, NaN. We therefore treat it as a special case, implementing the PDF with
$\gamma_{k,\lambda}(x) = \lambda e^{-\lambda x}$
instead, as illustrated by listing 3.

Listing 3: ak.gammaPDF
ak.gammaPDF = function() {
var state = {k: 1, lambda: 1};
var arg0  = arguments[0];
var ck, f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);

if(state.k===1) {
f = function(x){return x<0 ? 0 : state.lambda * Math.exp(-state.lambda*x);};
}
else {
ck = state.k * Math.log(state.lambda) - ak.logGamma(state.k);
f = function(x) {
return x<0 ? 0 : Math.exp((state.k-1)*Math.log(x) - state.lambda*x + ck);
};
}

f.k = function(){return state.k;};
f.lambda = function(){return state.lambda;};
return Object.freeze(f);
};


Note that this also means that the PDF at zero switches from $$\infty$$ to zero as $$k$$ goes from less than one to more than one, as illustrated by program 1.

Program 1: The Gamma PDF

The Characteristic Function

The gamma distribution's characteristic function is
$\hat\gamma_{k,\lambda}(t) = \left(1-\frac{it}{\lambda}\right)^{-k}$
which, for integer $$k$$, follows trivially from the exponential distribution's CF
$\hat{p}_\lambda(t) = \left(1-\frac{it}{\lambda}\right)^{-1}$
if we note that the CF of a sum of random variables is simply the product of their CFs. Proving that it holds for non-integer $$k$$ is rather more convoluted and I shall ask that you take it as read, or at least read up on it if you don't trust me!

The implementation of the CF is unsurprisingly straightforward, as shown by listing 4.

Listing 4: ak.gammaCF
function cf(k, l, t) {
var tl = t/l;
var den = Math.pow(1+tl*tl, k);

if(!isFinite(den)) return isNaN(t) ? ak.complex(ak.NaN, ak.NaN) : ak.complex(0);
return ak.pow(ak.complex(1, -tl), -k);
}

ak.gammaCF = function() {
var state = {k: 1, lambda: 1};
var arg0  = arguments[0];
var f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);

f = function(t){return cf(state.k, state.lambda, t);};
f.k = function(){return state.k;};
f.lambda = function(){return state.lambda;};
return Object.freeze(f);
};


Note that we are again using the identity
$\frac{1}{x + iy} = \frac{x - iy}{x^2 + y^2}$
to compute whether the denominator of the result overflows by checking that
$\left(1 + \left(\frac{t}{\lambda}\right)^2\right)^k$
is finite.

Random Number Generation

The simplest way to generate random numbers from a given distribution is to pass a standard uniform random number to its inverse CDF, a trick we used for the exponential distribution.
Unfortunately, the inverse of the gamma distribution cannot be expressed with a simple formula and we must instead numerically approximate it. We do at least have its derivative with the PDF and so can use our implementation of Newton's method[3], ak.newtonInverse, as shown in listing 5.

Listing 5: ak.gammaInvCDF
function invCDF(inv, cdf, b1, c) {
var b0 = 0;

if(c<=0) return 0;
if(c>=1) return ak.INFINITY;
if(isNaN(c)) return ak.NaN;

while(cdf(b1)<c) {
b0 = b1;
b1 *= 2;
}
return isFinite(b1) ? inv(c, [b0, b1]) : b1;
}

ak.gammaInvCDF = function() {
var state = {k: 1, lambda: 1, eps: Math.pow(ak.EPSILON, 0.75)};
var arg0  = arguments[0];
var pdf, cdf, fdf, inv, mean, f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);

pdf = ak.gammaPDF(state.k, state.lambda);
cdf = ak.gammaCDF(state.k, state.lambda);
fdf = function(x){return [cdf(x), pdf(x)];};
inv = ak.newtonInverse(fdf, state.eps);
mean = state.k/state.lambda;

f = function(c){return invCDF(inv, cdf, mean, c);};
f.k = function(){return state.k;};
f.lambda = function(){return state.lambda;};
return Object.freeze(f);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.NUMBER_T] = function(state, eps) {
state.eps = Number(eps);
if(isNaN(state.eps)) {
throw new Error('invalid convergence threshold in ak.gamma distribution');
}
};


Here we are exploiting the fact that the distribution is only non-zero for positive arguments when we search for an initial bracket for the inverse, b, rather than defer to the default scheme used by ak.newtonInverse. We also allow the user to supply a convergence threshold for the inversion as a third argument to the constructor.
Using this to generate random numbers would be horribly inefficient, but fortunately there's a better way.

Rejection Sampling

Derivation 4: Rejection Sampling
The probability that an accepted observation of $$Q$$ is less than $$x$$ is given by
\begin{align*} &\Pr\left(X \leqslant x \;\bigg|\; U \leqslant \frac{p(x)}{cq(x)}\right)\\ &\quad= \frac{\Pr\left(X \leqslant x \wedge U \leqslant \frac{p(x)}{cq(x)}\right)}{\Pr\left(U \leqslant \frac{p(x)}{cq(x)}\right)}\\ &\quad= \frac{\int_{-\infty}^x q(y) \times \frac{p(y)}{cq(y)} \;\mathrm{d}y}{\int_{-\infty}^{+\infty} q(y) \times \frac{p(y)}{cq(y)} \;\mathrm{d}y}\\ &\quad= \frac{\int_{-\infty}^x \frac{p(y)}{c} \;\mathrm{d}y}{\int_{-\infty}^{+\infty} \frac{p(y)}{c} \;\mathrm{d}y}\\ &\quad= \frac{\frac{1}{c}\int_{-\infty}^x p(y) \;\mathrm{d}y}{\frac{1}{c} \int_{-\infty}^{+\infty} p(y) \;\mathrm{d}y} = \int_{-\infty}^x p(y) \;\mathrm{d}y \end{align*}
and so $$X$$ is distributed as $$P$$.
The trick is to generate random numbers from an easier distribution and throw away a selected few to leave a sample that is gamma distributed. We've already seen an example of this with the Box-Muller transform[4] for generating normally distributed random numbers[5].
A general approach for generating random numbers with a distribution $$P$$, having a PDF $$p$$, using ones from a distribution $$Q$$, having a PDF $$q$$, is to first find a finite constant $$c$$ such that
$\frac{p(x)}{q(x)} \leqslant c$
for all $$x$$ for which either PDF is non-zero. If we then generate random numbers from $$Q$$ but only keep them with probability
$\frac{p(x)}{c q(x)}$
Figure 1: The Gap Between p(x) And cq(x)
the resulting sample will be distributed as $$P$$. This is proven in derivation 4, but a simpler way to get a feel for this remarkable result is to plot the graphs of $$p(x)$$ and $$cq(x)$$ and note that the area between them represents the observations that we reject, leaving behind the PDF of $$P$$, as illustrated by figure 1 in which areas of rejection and acceptance are coloured red and green respectively. If we draw a vertical line from any given $$x$$ to the top of the red region, then the proportion of it that falls in the green region is exactly our probability of acceptance.

Derivation 5: The Expected Number Of Qs
First note that the expected probability of acceptance, $$p$$, is given by
\begin{align*} p &= \int_{-\infty}^{+\infty} q(x) \times \frac{p(x)}{cq(x)} \, \mathrm{d}x\\ &= \frac{1}{c} \int_{-\infty}^{+\infty} p(x) \, \mathrm{d}x\\ &= \frac{1}{c} \end{align*}
The expected number of $$Q$$ distributed random numbers per $$P$$ distributed random number is therefore
$\frac{1}{p} = c$
Derivation 5 shows that we should expect to generate $$c$$ random numbers from $$Q$$ for each one that we accept as being distributed as $$P$$. So, whilst this approach is simple enough in principle, in practice we should like $$c$$ to be as close to one as possible so as to minimise the number of rejections, somewhat complicating the matter.
Note that it cannot be less than one since that would imply that $$q(x)$$ is everywhere greater than $$p(x)$$ and would therefore integrate to more than one and wouldn't be a valid PDF.

Kundu and Gupta have found a rejection algorithm for the gamma distribution with a very low rejection rate for $$\lambda$$ equal to one and $$k$$ less than one[6]. We can exploit the properties of the gamma distribution to use it to generate gamma distributed random numbers with any parameters by firstly observing that if
\begin{align*} X_1 &\sim Gamma(k_1, \lambda)\\ X_2 &\sim Gamma(k_2, \lambda)\\ \end{align*}
then
$X_1 + X_2 \sim Gamma\left(k_1+k_2, \lambda\right)$
which follows from the CF
$\hat{\gamma}_{k_1,\lambda}(t) \times \hat{\gamma}_{k_2,\lambda}(t) = \left(1-\frac{it}{\lambda}\right)^{-k_1} \times \left(1-\frac{it}{\lambda}\right)^{-k_2} = \left(1-\frac{it}{\lambda}\right)^{-\left(k_1+k_2\right)} = \hat{\gamma}_{k_1+k_2,\lambda}(t)$
For integer $$n$$ and $$k$$ strictly less than one, this means that
\begin{align*} X_i &\sim Exp(\lambda)\\ Y &\sim Gamma(k, \lambda)\\ Y + \sum_{i=1}^n X_i &\sim Gamma(n+k, \lambda) \end{align*}
where $$\sum$$ is the summation sign. Secondly, we have
\begin{align*} X &\sim Gamma(k, \lambda_1)\\ \frac{X}{\lambda_2} &\sim Gamma(k, \lambda_1 \lambda_2)\\ \end{align*}
which follows from the CDF
$\Pr\left(\frac{X}{\lambda_2} \leqslant x\right) = \Pr\left(X \leqslant \lambda_2 x\right) = \Gamma_{k, \lambda_1}\left(\lambda_2 x\right) = \frac{\gamma\left(k, \lambda_1 \lambda_2 x\right)}{\Gamma(k)} = \Gamma_{k, \lambda_1 \lambda_2}(x)$
Listing 6 implements Kundu and Gupta's algorithm with the addition of exponential random numbers to extend it to $$k$$ greater than or equal to one.

Listing 6: Kundu And Gupta's Algorithm
function kunduRnd(k, d, a, b, rnd) {
var c = a+b;
var o = 0;
var u, v, x;

while(k>=1) {
o -= Math.log(1-rnd());
--k;
}
if(k===0) return o;

while(1) {
u = rnd();
v = rnd();

x = u<=a/c ? -2*Math.log(1-Math.pow(c*u, 1/k)/2)
: -Math.log(c*(1-u)/(k*Math.pow(d, k-1)));

if(x<=d && v<=Math.pow(x/(2*(1-Math.exp(-x/2))), k-1)*Math.exp(-x/2)) return o+x;
f(x>d && v<=Math.pow(d/x, 1-k)) return o+x;
}
}


Note that d, a and b are the constants
\begin{align*} d &= 1.0334 - 0.0766\,e^{2.2942\,k^\ast}\\ a &= \left(2-2\,e^{-\frac{d}{2}}\right)^{k^\ast}\\ b &= k^\ast\,d^{k^\ast-1}e^{-d} \end{align*}
where $$k^\ast$$ is the fractional part of $$k$$ and which, seeing as they only depend on $$k$$, will be calculated in the constructor of our random number generator. Note also that it will be responsible for dividing the random numbers by the distribution's $$\lambda$$.

Now this works, but for large $$k$$ will be even less efficient than numerically inverting the CDF. Fortunately, there are also a number of rejection algorithms for the gamma distribution with $$k$$ greater than or equal to one. We shall use Best's algorithm[7], as implemented in listing 7.

Listing 7: Best's Algorithm
function bestRnd(k, a, b, rnd) {
var u, v, w, x, y, z;

while(1) {
u = rnd();
v = rnd();
w = u*(1-u);
y = Math.sqrt(b/w)*(u-0.5);
x = a+y;

if(x>=0) {
z = 64*w*w*w*v*v;
if(z<=1-2*y*y/x || Math.log(z)<=2*(a*Math.log(x/a)-y)) return x;
}
}
}


This time a and b are the constants
\begin{align*} a &= k-1\\ b &= 3k - \tfrac{3}{4} \end{align*}
deferred again to the constructor. Now this doesn't have as good a rejection rate as Kundu and Gupta's algorithm, so we shall only use it for $$k$$ greater than two, as shown by listing 8.

Listing 8: ak.gammaRnd
ak.gammaRnd = function() {
var state = {k: 1, lambda: 1, rnd: Math.random};
var arg0  = arguments[0];
var k, d, a, b, f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);

if(state.k<=2) {
if(k!==1 && k!==2) {
k = state.k>1 ? state.k-1 : state.k;
d = 1.0334 - 0.0766 * Math.exp(2.2942*k);
a = Math.pow(2-2*Math.exp(-d/2), k);
b = k * Math.pow(d, k-1) * Math.exp(-d);
}

f = function(){return kunduRnd(state.k, d, a, b, state.rnd)/state.lambda;};
}
else {
a = state.k-1;
b = 3*state.k-0.75;

f = function(){return bestRnd(state.k, a, b, state.rnd)/state.lambda;};
}

f.k = function(){return state.k;};
f.lambda = function(){return state.lambda;};
f.rnd = function(){return state.rnd;};
return Object.freeze(f);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
state.rnd = rnd;
};


Note that we use Math.random as the default uniform random number generator and add a new constructor to allow the user to pass another as the third argument after $$k$$ and $$\lambda$$ should they wish to use it instead.
As described above, we switch between Kundu and Gupta's algorithm for $$k$$ less than or equal to two and Best's for $$k$$ greater than two, calculating the constants in advance and dividing by $$\lambda$$ in f in both cases. For $$k$$ equal to one or two we will bail out of kunduRnd before we need its constants, so we leave them undefined for those cases.

Program 2 compares a histogram of observations of a gamma distributed random variable with a graph of what the CDF predicts that we should expect to observe.

Program 2: ak.gammaRnd

As usual we have a pretty good match, but you should of course try it out for some different values of k and lambda, if for no other reason than to test kunduRnd as well as bestRnd.

There is yet more to say about memoryless processes, but in the meantime these gamma distribution functions can be found in GammaDistribution.js.

References

[1] How Long Must I Wait?, www.thusspakeak.com, 2015.

[2] Gamma Gamma Hey!, www.thusspakeak.com, 2015.

[3] On The Shoulders Of Gradients, www.thusspakeak.com, 2014.

[4] Box, G. & Muller, M., A Note On The Generation Of Random Normal Deviates, Annals Of Mathematical Statistics, Vol. 29, No. 2, pp. 610-611, Institute Of Mathematical Statistics, 1958.

[5] Define Normal, www.thusspakeak.com, 2014.

[6] Kundu, D. and Gupta, R., A Convenient Way of Generating Gamma Random Variables Using Generalized Exponential Distribution.

[7] Best, D., Letter to the Editors, Applied Statistics, Vol. 29, pp. 181-182, 1978.