The Longest Wait

| 0 Comments

We have seen how the waiting time for an event in a memoryless process, being one in which the probability that we must wait some given time doesn't change no matter how long we've already been waiting, must be exponentially distributed[1], that the waiting time for the \(k^\mathrm{th}\) such event must be gamma distributed[2] and that the number of such events occurring in one unit of time must be Poisson distributed[3].
This time I'd like to ask how long we must wait for the first and the last of several such processes that are happening at the same time to trigger an event[3].

And The Winner Is...

Finding the waiting time for the first process turns out to be rather easy if we note that for a single process the probability that we must wait longer than \(x\) for an event is given by
\[ \Pr(X > x) = 1 - \Pr(X \leqslant x) = 1 - P_\lambda(x) = e^{-\lambda x} \]
where \(P_\lambda\) is the exponential CDF. Now, if we have \(n\) independent processes, each having rate \(\lambda_i\) then the probability that we must wait longer than \(x\) before any of them trigger an event is simply
\[ \begin{align*} \Pr\left(\min_{1 \leqslant i \leqslant n} X_i > x\right) &= \Pr(X_1 > x \wedge X_2 > x \wedge \dots \wedge X_n > x)\\ &= \Pr(X_1 > x) \times \Pr(X_2 > x) \times \dots \times \Pr(X_n > x)\\ &= e^{-\lambda_1 x} \times e^{-\lambda_2 x} \times \dots \times e^{-\lambda_n x}\\ &= e^{-\sum_{i=1}^n \lambda_i x} \end{align*} \]
where \(\wedge\) means and and \(\sum\) is the summation sign. The probability that at least one of them will trigger an event no later than \(x\) is therefore
\[ \Pr\left(\min_{1 \leqslant i \leqslant n} X_i \leqslant x\right) = 1 - \Pr\left(\min_{1 \leqslant i \leqslant n} X_i > x\right) = 1 - e^{-\sum_{i=1}^n \lambda_i x} \]
and so is also exponentially distributed with rate
\[ \lambda = \sum_{i=1}^n \lambda_i \]
as demonstrated by program 1.

Program 1: The Minimum Waiting Time

This a useful result for reasoning about systems in which queues of jobs are handled by multiple servers, such as you might find at a supermarket when the first customer in the queue goes to the first free till.

The Devil Take The Hindmost

Unfortunately, figuring out the distribution of the waiting time until all \(n\) of the memoryless processes have triggered an event is rather more complicated. By which I mean that there is no simpler equation than
\[ \begin{align*} \Pr\left(\max_{1 \leqslant i \leqslant n} X_i \leqslant x\right) &= \Pr(X_1 \leqslant x \wedge X_2 \leqslant x \wedge \dots \wedge X_n \leqslant x)\\ &= \Pr(X_1 \leqslant x) \times \Pr(X_2 \leqslant x) \times \dots \times \Pr(X_n \leqslant x)\\ &= \prod_{i=1}^n \left(1 - e^{-\lambda_i x}\right) \end{align*} \]
where \(\prod\) is the product sign.

What we can do, however, is find a limiting distribution for the maximum waiting time for \(n\) independent process all having the rate \(\lambda\) as \(n\) tends to infinity by first noting that
\[ \Pr\left(\max_{1 \leqslant i \leqslant n} X_i \leqslant x\right) = \left(1 - e^{-\lambda x}\right)^n = \left(1 - \frac{n \, e^{-\lambda x}}{n}\right)^n = \left(1 - \frac{e^{-\lambda x + \ln n}}{n}\right)^n \]
Now, the exponential function can be defined as
\[ e^x = \lim_{n \to \infty} \left(1+\frac{x}{n}\right)^n \]
and so we have
\[ \lim_{n \to \infty} \Pr\left(\max_{1 \leqslant i \leqslant n} X_i \leqslant x\right) = \exp\left(-e^{-\lambda x + \ln n}\right) \]
where \(\exp(x)\) is an alternative way of writing \(e^x\).

Program 2 demonstrates that this is a reasonable approximation for large, but still finite, \(n\).

Program 2: The Maximum Waiting Time

Note that we're setting the upper bound by explicitly inverting the formula for a probability of 99.9%.

Taking It To The Extreme!

This is an example of the Gumbel distribution, \(Gumbel(\mu, \sigma)\), having the CDF
\[ P_{\mu,\sigma}(x) = \exp\left(-e^{-(x-\mu)/\sigma}\right) \]
for positive \(\sigma\). The maximum of \(n\) observations of the exponential distribution is therefore approximately distributed as \(Gumbel\left(\frac{\ln n}{\lambda}, \frac{1}{\lambda}\right)\), for sufficiently large \(n\).

Derivation 1: The Maximum Of n Gumbel Observations
Given
\[ X_i \sim Gumbel(\mu, \sigma) \]
then
\[ \begin{align*} \Pr\left(\max_{1 \leqslant i \leqslant n} X_i \leqslant x\right) &= P_{\mu,\sigma}(x)^n\\ &= \exp\left(-e^{-(x-\mu)/\sigma}\right)^n\\ &= \exp\left(-n\,e^{-(x-\mu)/\sigma}\right)\\ &= \exp\left(-e^{\ln n - (x-\mu)/\sigma}\right)\\ &= \exp\left(-e^{-(x-(\mu + \sigma \ln n))/\sigma}\right)\\ &= P_{\mu + \sigma \ln n,\sigma}(x) \end{align*} \]
An important property of the Gumbel distribution is that the maximum of \(n\) observations of it is also Gumbel distributed, but with a different value for \(\mu\), as shown by derivation 1.
Given that the maximum of \(n\) observations of the maximum of \(m\) observations of a random variable is trivially the maximum of \(n \times m\) observations of that random variable, this similarity property is exactly the sort of thing that we should expect of a distribution of maximal, or extreme, values.

Noting that if \(X\) is distributed as \(Gumbel(\mu, \sigma)\) then \(X+c\) is distributed as \(Gumbel(\mu+c, \sigma)\), we have
\[ \begin{align*} X_i &\sim Gumbel(\mu, \sigma)\\ \max_{1 \leqslant i \leqslant n} X_i &\sim Gumbel(\mu, \sigma) + \sigma \ln n \end{align*} \]
The Gumbel distribution is unique in this respect; if the maximum of \(n\) independent random variables with the same distribution can be simulated by adding a constant to another one, then those random variables must be Gumbel distributed, as proven in derivation 2.

Derivation 2: The Implication Of A Constant Offset
If the \(n\) independent random variables \(X_i\) and single independent random variable \(X\) all have the same CDF, \(P(x)\), then the maximum observation of those \(X_i\) having the same distribution as an observation of \(X\) plus a constant \(c_n\) implies that
\[ \begin{align*} \Pr\left(\max_{1 \leqslant i \leqslant n} X_i \leqslant x\right) &= \Pr\left(X+c_n \leqslant x\right) = \Pr\left(X \leqslant x-c_n\right)\\ P(x)^n &= P(x-c_n) \end{align*} \]
Similarly, the maximum of \(m\) observations of that must satisfy both of
\[ \begin{align*} P(x)^{nm} &= P(x-c_{nm})\\ P(x)^{nm} &= \left(P(x)^n\right)^m = P(x-c_n)^m = P(x-c_n-c_m) \end{align*} \]
and so \(c_{nm}\) must equal \(c_n + c_m\). To turn a product into a sum requires logarithms, so this implies that \(c_n = \sigma \ln n\) for some constant \(\sigma\).

Next, we have
\[ \begin{align*} P(x)^n &= P(x - \sigma \ln n)\\ - \ln P(x)^n &= - \ln P(x - \sigma \ln n)\\ - n \ln P(x) &= - \ln P(x - \sigma \ln n)\\ \ln n + \ln \left(-\ln P(x)\right) &= \ln \left(-\ln P(x - \sigma \ln n)\right)\\ \ln \left(-\ln P(x)\right) &= \ln \left(-\ln P(x - \sigma \ln n)\right) - \ln n \end{align*} \]
This must hold for all \(x\), including \(x = \sigma \ln n\) at which
\[ \ln (-\ln P(x)) = \ln(-\ln P(\sigma \ln n - \sigma \ln n)) - \ln n = \ln(-\ln P(0)) - \frac{x}{\sigma} \]
Finally, defining \(\mu = \sigma \ln(-\ln P(0))\) yields
\[ \begin{align*} \ln (-\ln P(x)) &= -\frac{x-\mu}{\sigma}\\ -\ln P(x) &= e^{-(x-\mu)/\sigma}\\ P(x) &= \exp\left(-e^{-(x-\mu)/\sigma}\right) \end{align*} \]

As a result, it is often a reasonable choice for the distribution of the extreme values of a random variable and is consequently also known as the type 1 extreme value distribution.

The Gumbel Suite

Derivation 3: The Gumbel PDF
Applying the chain rule, we have
\[ p_{\mu,\sigma}(x) = \frac{\mathrm{d}}{\mathrm{d}x} \exp\left(-e^{-(x-\mu)/\sigma}\right)\\ \;= \exp\left(-e^{-(x-\mu)/\sigma}\right) \times \frac{\mathrm{d}}{\mathrm{d}x} -e^{-(x-\mu)/\sigma}\\ \;= \exp\left(-e^{-(x-\mu)/\sigma}\right) \times -e^{-(x-\mu)/\sigma} \times \frac{\mathrm{d}}{\mathrm{d}x} -\frac{x-\mu}{\sigma}\\ \;= \exp\left(-e^{-(x-\mu)/\sigma}\right) \times -e^{-(x-\mu)/\sigma} \times -\frac{1}{\sigma}\\ \;= \frac{1}{\sigma} \exp\left(-\tfrac{x-\mu}{\sigma} - e^{-(x-\mu)/\sigma}\right) \]

Derivation 4: The Gumbel CF
Noting that if \(X\) is distributed as \(Gumbel(0,1)\) then \(\mu + \sigma X\) is distributed as \(Gumbel(\mu,\sigma)\), the CF is given by the expectation
\[ \begin{align*} \hat{p}_{\mu,\sigma}(t) &= \mathrm{E}\left[e^{it(\mu + \sigma X)}\right] = e^{it\mu} \, \mathrm{E}\left[e^{it\sigma X}\right]\\ &= e^{it\mu} \, \int_{-\infty}^{+\infty}e^{it\sigma x} \times \exp\left(-x - e^{-x}\right) \, \mathrm{d}x \end{align*} \]
Substituting \(z=e^{-x}\), so that \(\mathrm{d}z = -z \, \mathrm{d}x\), yields
\[ \begin{align*} \hat{p}_{\mu,\sigma}(t) &= e^{it\mu} \, \int_{\infty}^{0}z^{-it\sigma} \times z \, \exp(-z) \times -\frac{\mathrm{d}z}{z}\\ &= e^{it\mu} \, \int_{0}^{\infty}z^{-it\sigma} e^{-z} \, \mathrm{d}z = e^{it\mu} \, \Gamma(1 - it\sigma) \end{align*} \]
To find the inverse of the CDF, we simply take logarithms
\[ \begin{align*} P_{_\mu,\sigma}(x) &= \exp\left(-e^{-(x-\mu)/\sigma}\right) = c\\ -e^{-(x-\mu)/\sigma} &= \ln c\\ -\frac{x-\mu}{\sigma} &= \ln (-\ln c)\\ x & = \mu - \sigma \ln (-\ln c) \end{align*} \]
Note that we can trivially generate Gumbel distributed random numbers by applying this to standard uniform distributed random numbers.

We can also easily recover the PDF of the Gumbel distribution by differentiating the CDF to yield
\[ p_{\mu,\sigma}(x) = \frac{1}{\sigma} \exp\left(-\tfrac{x-\mu}{\sigma} - e^{-(x-\mu)/\sigma}\right) \]
as shown by derivation 3.

Finally, the CF is given by
\[ \hat{p}_{\mu,\sigma}(t) = e^{it\mu} \, \Gamma(1 - it\sigma) \]
where \(\Gamma\) is the complex gamma function, as shown by derivation 4.

ak Gumbel Functions

Given how simple the Gumbel distribution functions are, it should come as no surprise that their implementations are pretty simple too. We shall use the same dispatch to a constructors object that we always do to initialise them, as illustrated by ak.gumbelPDF in listing 1.

Listing 1: ak.gumbelPDF
ak.gumbelPDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(x) {
  var z = (x-state.mu)/state.sigma;
  return Math.exp(-z-Math.exp(-z))/state.sigma;
 };
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

var constructors = {};

The members of the constructors object that are required for ak.gumbelPDF are given in listing 2.

Listing 2: The constructors Object
constructors[ak.UNDEFINED_T] = function() {
};

constructors[ak.NUMBER_T] = function(state, x, args) {
 var arg1 = args[1];
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, x, arg1, args);

 if(!isFinite(state.mu)) {
  throw new Error('invalid mu in ak.gumbel distribution');
 }

 if(state.sigma<=0 || !isFinite(state.sigma)) {
  throw new Error('invalid sigma in ak.gumbel distribution');
 }
};

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state, x) {
 state.sigma = Number(x);
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, x0, x1, args) {
 var arg2 = args[2];

 state.mu = Number(x0);
 state.sigma = Number(x1);

 constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, arg2);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function() {
};

If a single number is passed to the constructor it defines \(\sigma\) and if two are passed they define \(\mu\) and \(\sigma\) respectively. The third argument is, as usual, a place holder for a uniform random number generator that we shall use when it comes to implementing the Gumbel random number generator. Note how we enforce that \(\sigma\) is positive and finite and that \(\mu\) is finite, throwing exceptions if they are not.

Program 3 shows how the Gumbel PDF changes shape with \(\sigma\).

Program 3: The Gumbel PDF

Next, the implementation of the Gumbel CDF is given in listing 2.

Listing 2: ak.gumbelCDF
ak.gumbelCDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(x){return Math.exp(-Math.exp(-(x-state.mu)/state.sigma));};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

The inverse CDF and CF are just as simple, as shown in listing 3.

Listing 3: ak.gumbelCDF And ak.gumbelCF
ak.gumbelInvCDF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(c){return state.mu - state.sigma*Math.log(-Math.log(c));};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

function cf(m, s, t) {
 var ts = t*s;

 if(!isFinite(ts)) return isNaN(t) ? ak.complex(ak.NaN, ak.NaN) : ak.complex(0);
 return ak.mul(ak.exp(ak.complex(0, t*m)), ak.gamma(ak.complex(1, -ts)));
}

ak.gumbelCF = function() {
 var state = {mu: 0, sigma: 1};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(t){return cf(state.mu, state.sigma, t);};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 return Object.freeze(f);
};

Note that we're using our complex overload of the gamma function[4], ak.gamma, in the implementation of the CF.

Finally, the random number generator, ak.gumbelRnd, is given in listing 4.

Listing 4: ak.gumbelRnd
ak.gumbelRnd = function() {
 var state = {mu: 0, sigma: 1, rnd: Math.random};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(){return state.mu - state.sigma*Math.log(-Math.log(state.rnd()));};
 f.mu = function(){return state.mu;};
 f.sigma = function(){return state.sigma;};
 f.rnd = function(){return state.rnd;};
 return Object.freeze(f);
};

constructors[ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

constructors[ak.NUMBER_T][ak.FUNCTION_T] = function(state, x, rnd) {
 state.sigma = Number(x);
 state.rnd = rnd;
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

Here we add new functions to the constructors object to allow the user to provide their own uniform random number generator as the last constructor argument, in case they don't want to use the default choice of Math.random.

Program 4 compares a histogram of ak.gumbelRnd generated random numbers against what we should expect from the Gumbel CDF.

Program 4: ak.gumbelRnd

This is pretty much in line with the results that we saw in program 2 and demonstrates that ak.gumbelCDF and ak.gumbelRnd, both of which can be found in GumbelDistribution.js along with the other Gumbel distribution functions, are at least consistent.

And with this we are done with the statistical properties of memoryless processes; I hope that you have been as impressed as I was at how easy it was to figure them out!

References

[1] How Long Must I Wait?, www.thusspakeak.com, 2015.

[2] kth Time's The Charm, www.thusspakeak.com, 2015.

[3] ...And Then Three Came Along At Once!, www.thusspakeak.com, 2015.

[4] Gamma Gamma Hey!, www.thusspakeak.com, 2015.

Leave a comment