Bad Luck Comes In Ks


Lately we have been looking at Bernoulli processes[1] which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time[2] we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution[3] in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting.
This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur[4].

The Negative Binomial Distribution?

If we are waiting for \(r\) successes then the last trial must be successful. Since failures are indistinguishable from each other, as are successes, the number of ways that we can arrange \(k\) failures and \(r-1\) successes is simply the combination
\[ {}^{k+r-1}C_k = \frac{(k+r-1)!}{k! \times ((k+r-1)-(k))!} = \frac{(k+r-1)!}{k! \times (r-1)!} \]
where the exclamation mark represents the factorial of the term before it. The probability of doing so is therefore
\[ p \times \left(\frac{(k+r-1)!}{k! \times (r-1)!} \times (1-p)^k \times p^{r-1}\right) = \frac{(k+r-1)!}{k! \times (r-1)!} \times (1-p)^k \times p^r \]
Noting that, for positive integer arguments, the gamma function is related to the factorial by
\[ \Gamma(n+1) = n! \]
we can generalise this to positive real values of \(r\) to yield the probability mass function, or PMF, of the negative binomial distribution
\[ f_{r,p}\,(k) = \frac{\Gamma(k+r)}{k! \times \Gamma(r)} \times (1-p)^k \times p^r \]
Its name is derived from the binomial theorem which states that the coefficient of the \(r^\mathrm{th}\) power of \(x\) in the expansion of \((1+x)^n\) is the combination \({}^nC_r\). We might therefore suppose that the coefficient of the \(k^\mathrm{th}\) power of \(q\) in that of \((1-q)^{-r}\), where \(r\) is a positive integer, is \((-1)^k \times {}^{-r}C_k\), but in fact the \(k^\mathrm{th}\) term in its Taylor series is
\[ a_k \times q^k = \frac{(k+r-1)!}{k! \times (r-1)!} \times q^k \]
as shown by derivation 1.

Derivation 1: Negative Binomial Coefficients
\[ \begin{align*} f(x) &= x^{-r} & f(1) &= 1\\ f^\prime(x) &= -r \times x^{-r-1} & f^\prime(1) &= -r\\ f^{\prime\prime}(x) &= -r \times (-r-1) \times x^{-r-2} & f^{\prime\prime}(1) &= -r \times (-r-1)\\ \dots\\ f^{(k)}(x) &= \left(\prod_{j=-r-k+1}^{-r} j\right) \times x^{-r-k} & f^{(k)}(1) &= \prod_{j=-r-k+1}^{-r} j \end{align*} \]
where \(f^{(k)}\) is the \(k^\mathrm{th}\) derivative of \(f\) and \(\prod\) is the product sign, we have
\[ (1-q)^{-r} = \sum_{k=0}^\infty \frac{1}{k!} \times (-q)^k \times f^{(k)}(1) = \sum_{k=0}^\infty \frac{(-1)^k}{k!} \times f^{(k)}(1) \times q^k = \sum_{k=0}^\infty a_k \times q^k \]
where \(\sum\) is the summation sign.
Noting that if the lower bound of product's index is greater than its upper bound then its value is one, the \(k^\mathrm{th}\) coefficient is therefore
\[ a_k =\frac{(-1)^k}{k!} \times \prod_{j=-r-k+1}^{-r} j = \frac{1}{k!} \times \prod_{j=-r-k+1}^{-r} -j = \frac{1}{k!} \times \prod_{j=r}^{r+k-1} j = \frac{1}{k!} \times \frac{(r+k-1)!}{(r-1)!} \]

Setting \(q\) to \(1-p\), we can express the PMF in terms of the supposed negative binomial coefficient with
\[ f_{r,p}\,(k) = \frac{(k+r-1)!}{k! \times (r-1)!} \times (1-p)^k \times p^r = \left(a_k \times (1-p)^k\right) \times p^r \]
and hence the name. As it happens, we can define
\[ {}^{-r}C_k = (-1)^k \times \frac{(k+r-1)!}{k! \times (r-1)!} \]
provided that we're willing to play a little fast and loose with the concept of limits.

The characteristic function, or CF, is given by
\[ \hat{f}_{r,p}\,(t) = E\left[e^{itK}\right] = \left(\frac{p}{1-(1-p) \times e^{it}}\right)^r \]
where \(E\) is the expectation and \(i\) is the square root of minus one, as proven in derivation 2.

Derivation 2: The CF
\[ \begin{align*} f_r(x) &= (1-x)^{-r}\\ g_r(x) &= \sum_{k=0}^\infty \tfrac{\Gamma(k+r)}{k! \times \Gamma(r)} \times x^k \end{align*} \]
where we mean that the first term of the sum in \(g_r\) is a constant, we have
\[ \begin{align*} f_r(0) &= 1^{-r} = 1\\ g_r(0) &= \tfrac{\Gamma(0+r)}{0! \times \Gamma(r)} = 1 \end{align*} \]
Taking the derivative of \(f_r\) yields the recurrence
\[ f_r^\prime(x) = r \times (1-x)^{-(r+1)} = r \times f_{r+1}(x) \]
which is shared by \(g_r\)
\[ \begin{align*} g_r^\prime(x) &= \sum_{k=1}^\infty \tfrac{\Gamma(k+r)}{k! \times \Gamma(r)} \times k \times x^{k-1} = \sum_{k=0}^\infty \tfrac{\Gamma(k+1+r)}{(k+1)! \times \Gamma(r)} \times (k+1) \times x^k = \sum_{k=0}^\infty \tfrac{\Gamma(k+1+r)}{k! \times \Gamma(r)} \times x^k\\ r \times g_{r+1}(x) &= r \times \sum_{k=0}^\infty \tfrac{\Gamma(k+r+1)}{k! \times \Gamma(r+1)} \times x^k = r \times \sum_{k=0}^\infty \tfrac{\Gamma(k+r+1)}{k! \times r \times \Gamma(r)} \times x^k = \sum_{k=0}^\infty \tfrac{\Gamma(k+r+1)}{k! \times \Gamma(r)} \times x^k = g_r^\prime(x) \end{align*} \]
so that
\[ \begin{align*} f^\prime_r(0) &= r \times f_{r+1}(0) = r\\ g^\prime_r(0) &= r \times g_{r+1}(0) = r \end{align*} \]
By induction all of the derivatives of \(f_r\) and \(g_r\) must be equal at zero and consequently their Taylor series expansions about zero are identical, implying that they are too.

Finally, setting \(q = (1-p) \times e^{it}\) we have
\[ \begin{align*} \left(\tfrac{p}{1-(1-p) \times e^{it}}\right)^r &= p^r \times (1-q)^{-r} = p^r \times \sum_{k=0}^\infty \tfrac{\Gamma(k+r)}{k! \times \Gamma(r)} \times q^k = \sum_{k=0}^\infty \tfrac{\Gamma(k+r)}{k! \times \Gamma(r)} \times p^r \times \left((1-p) \times e^{it}\right)^k\\ &= \sum_{k=0}^\infty \tfrac{\Gamma(k+r)}{k! \times \Gamma(r)} \times (1-p)^k \times p^r \times e^{itk} = \sum_{k=0}^\infty f_{r,p}\,(k) \times e^{itk} = E\left[e^{itK}\right] \end{align*} \]

The cumulative distribution function, or CDF, is given by
\[ F_{r,p}\,(k) = I_p(r, \lfloor k \rfloor + 1) \]
where \(I_x(a,b)\) is the incomplete beta function[5] and the odd shaped brackets represent the floor of the term between them, being the largest integer that is no greater than it, which is proven in derivation 3.

Derivation 3: The CDF
Firstly, we have
\[ I_x(a, b) = \frac{B_x(a,b)}{B(a,b)} = \frac{\int_0^x t^{a-1} \times (1-t)^{b-1} \,\mathrm{d}t}{\tfrac{\Gamma(a) \times \Gamma(b)}{\Gamma(a+b)}} = \frac{\Gamma(a+b) \times \int_0^x t^{a-1} \times (1-t)^{b-1} \,\mathrm{d}t}{\Gamma(a) \times \Gamma(b)} \]
and so for integer \(k\) greater than zero
\[ \begin{align*} I_p(r, k+1) &= \tfrac{\Gamma(r+k+1) \times \int_0^p t^{r-1} \times (1-t)^k \,\mathrm{d}t}{\Gamma(r) \times \Gamma(k+1)} = \tfrac{\Gamma(r+k)}{k! \times \Gamma(r)} \times (r+k) \times \int_0^p t^{r-1} \times (1-t)^k \,\mathrm{d}t\\ \\ I_p(r, k) &= \tfrac{\Gamma(r+k) \times \int_0^p t^{r-1} \times (1-t)^{k-1} \,\mathrm{d}t}{\Gamma(r) \times \Gamma(k)} = \tfrac{\Gamma(r+k)}{(k-1)! \times \Gamma(r)} \times \int_0^p t^{r-1} \times (1-t)^{k-1} \,\mathrm{d}t\\ &= \tfrac{\Gamma(r+k)}{k! \times \Gamma(r)} \times k \times \int_0^p t^{r-1} \times (1-t)^{k-1} \,\mathrm{d}t \end{align*} \]
Next, we have
\[ I_p(r, k+1) - I_p(r, k) = \tfrac{\Gamma(r+k)}{k! \times \Gamma(r)} \times \int_0^p f(t) \, \mathrm{d}t \]
\[ \begin{align*} f(t) &= (r+k) \times t^{r-1} \times (1-t)^k - k \times t^{r-1} \times (1-t)^{k-1}\\ &= r \times t^{r-1} \times (1-t)^k + (k \times (1-t) - k) \times t^{r-1} \times (1-t)^{k-1}\\ &= r \times t^{r-1} \times (1-t)^k - k \times t^r \times (1-t)^{k-1} \end{align*} \]
By the chain and product rules
\[ \frac{\mathrm{d}}{\mathrm{d}p} (1-p)^k \times p^r = -k \times (1-p)^{k-1} \times p^r + (1-p)^k \times r \times p^{r-1} = f(p) \]
and so
\[ I_p(r, k + 1) - I_p(r, k) = \tfrac{\Gamma(r+k)}{k! \times \Gamma(r)} \times (1-p)^k \times p^r = f_{r,p}\,(k) \]
\[ F_{r,p}\,(0) = I_p(r, 1) = \tfrac{\Gamma(r+1)}{\Gamma(r) \times \Gamma(1)} \times \int_{t=0}^p t^{r-1} \, \mathrm{d}t = \tfrac{\Gamma(r+1)}{\Gamma(r)} \times \frac{p^r}{r} = p^r = f_{r,p}(0) \]

The ak Negative Binomial Distribution Functions

As usual we shall initialise our distribution functions by dispatching to a constructors object, as is done by ak.negativeBinomialPMF in listing 1.

Listing 1: ak.negativeBinomialPMF
function pmf(r, lnq, rlnp, rfac, k) {
 if(k>=0 && k===ak.floor(k)) {
  return Math.exp(k*lnq + rlnp + ak.logGamma(k+r) - ak.logGamma(k+1) - rfac);
 return isNaN(k) ? ak.NaN : 0;

ak.negativeBinomialPMF = function() {
 var state = {r: 1, p: 0.5};
 var arg0  = arguments[0];
 var lnq, rlnp, rfac, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 lnq = Math.log(1-state.p);
 rlnp = state.r*Math.log(state.p);
 rfac = ak.logGamma(state.r);

 f = function(k){return pmf(state.r, lnq, rlnp, rfac, k);};
 f.r = function(){return state.r;};
 f.p = function(){return state.p;};
 return Object.freeze(f);

var constructors = {};

Here the pmf helper function exponentiates the logarithm of the mass, using ak.logGamma and Math.log to calculate
\[ \begin{align*} \ln f_{r,p}\,(k) &= \ln \left(\frac{\Gamma(k+r)}{k! \times \Gamma(r)} \times (1-p)^k \times p^r\right) = \ln \left(\frac{\Gamma(k+r)}{\Gamma(k+1) \times \Gamma(r)} \times (1-p)^k \times p^r\right)\\ &= k \times \ln (1-p) + r \times \ln p + \ln \Gamma(k+r) - \ln \Gamma(k+1) - \ln \Gamma(r) \end{align*} \]
with ak.negativeBinomialPMF pre-calculating those terms that don't depend on \(k\).

The constructors, given in listing 2, allow initialisation with positive \(r\) and \(p\) greater than zero and less than one.

Listing 2: The Constructors
constructors[ak.UNDEFINED_T] = function() {

constructors[ak.NUMBER_T] = function(state, r, args) {
 var arg1 = args[1];
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1, args);

 state.r = Number(r);
 if(state.r<=0 || !isFinite(state.r)) {
  throw new Error('invalid r in ak.negativeBinomial distribution');

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, p, args) {
 var arg2 = args[2];
 constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, arg2, args);

 state.p = Number(p);
 if(!(p>0 && p<1)) throw new Error('invalid p in ak.negativeBinomial distribution');

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {

Program 1 plots the PMF for \(r\) equal to two and a half and a range of success probabilities.

Program 1: The Negative Binomial PMF

The CDF is implemented by ak.negativeBinomialCDF in listing 3 in which the cdf helper function uses ak.betaP to calculate \(I_p(r, \lfloor k \rfloor +1)\).

Listing 3: ak.negativeBinomialCDF
function cdf(r, p, k) {
 if(k>=0) return ak.betaP(r, ak.floor(k)+1, p);
 return isNaN(k) ? ak.NaN : 0;

ak.negativeBinomialCDF = function() {
 var state = {r: 1, p: 0.5};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(k){return cdf(state.r, state.p, k);};
 f.r = function(){return state.r;};
 f.p = function(){return state.p;};
 return Object.freeze(f);

Program 2 demonstrates the correctness of ak.negativeBinomialCDF by comparing the difference of its results at \(k\) and \(k-1\) to that of ak.negativeBinomialPMF at \(k\).

Program 2: The PMF Versus The CDF

Noting that
\[ e^{i\theta} = \cos \theta + i \times \sin \theta \]
\[ \frac{1}{x - i \times y} = \frac{x + i \times y}{(x + i \times y) \times (x - i \times y)} = \frac{x + i \times y}{x^2 + y^2} \]
and again setting \(q\) to \(1-p\), we have
\[ \begin{align*} \frac{p}{1-q \times e^{it}} &=\frac{p}{1-q\times \cos t - i \times q \times \sin t} =p \times \frac{1-q\times \cos t + i \times q \times \sin t}{(1-q\times \cos t)^2+(q \times \sin t)^2}\\ &=p \times \frac{1-q\times \cos t + i \times q \times \sin t}{1 - 2 \times q \times \cos t + q^2 \times \cos^2 t + q^2 \times \sin^2 t} =p \times \frac{1-q\times \cos t + i \times q \times \sin t}{1+q^2 - 2 \times q \times \cos t} \end{align*} \]
as exploited by the cf helper function in listing 4 to directly initialise an ak.complex object that is raised to the power of \(r\) to calculate the CF

Listing 4: ak.negativeBinomialCF
function cf(r, p, q, t) {
 var qcost = q*Math.cos(t);
 var d = 1+q*q-2*qcost;
 var z = ak.complex(p*(1-qcost)/d, p*q*Math.sin(t)/d);
 return ak.pow(z, r);

ak.negativeBinomialCF = function() {
 var state = {r: 1, p: 0.5};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 f = function(t){return cf(state.r, state.p, 1-state.p, t);};
 f.r = function(){return state.r;};
 f.p = function(){return state.p;};
 return Object.freeze(f);

which program 3 plots with the real part in black and the imaginary part in red.

Program 3: The Negative Binomial CF

There isn't a formula for the inverse of the CDF and so we must numerically invert it. A reasonable first guess is the mean number of failures before the \(r^\mathrm{th}\) success which we can calculate by noting that after \(N\) trials we should expect \((1-p) \times N\) failures and \(p \times N\) successes so that there are
\[ \frac{(1-p) \times N}{p \times N} = \frac{1-p}{p} \]
failures per success on average, and consequently
\[ E[K] = r \times \frac{1-p}{p} \]
The invCDF helper function defined in listing 4 begins by searching for a bracket containing the inverse by starting with a lower bound of zero and an upper bound of the ceiling of the mean, being the smallest integer than is no less than it, and then iteratively sets the lower bound to the upper bound and doubles the upper bound until the latter is not less than it. Having found a bracket it then uses a simple bisection search to narrow it down until the inverse is found.
The ak.negativeBinomialInveCDF function uses it to calculate the inverse CDF, pre-calculating the ceiling of the mean, the value of the CDF at it and at zero. Note that since invCDF uses the CDF for all values of \(k\) we must use it for zero rather than the much simpler PMF to ensure that the initial lower bound has the same approximation error as it would have had.

Listing 4: ak.negativeBinomialInvCDF
function invCDF(r, p, k1, p0, p1, c) {
 var k0 = 0;
 var m;

 if(isNaN(c)) return ak.NaN;
 if(c>=1)     return ak.INFINITY;
 if(c<=p0)    return 0;

 while(p1<c) {
  k0 = k1;
  k1 *= 2;
  p1 = ak.betaP(r, k1+1, p);
 m = ak.floor(k0/2 + k1/2);

 while(m!==k0 && m!==k1) {
  if(ak.betaP(r, m+1, p)<c) k0 = m;
  else                      k1 = m;
  m = ak.floor(k0/2 + k1/2);
 return ak.betaP(r, k0+1, p)<c ? k1 : k0;

ak.negativeBinomialInvCDF = function() {
 var state = {r: 1, p: 0.5};
 var arg0  = arguments[0];
 var k1, p0, p1, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 k1 = ak.ceil(state.r*(1-state.p)/state.p);
 p0 = ak.betaP(state.r, 1, state.p);
 p1 = ak.betaP(state.r, k1+1, state.p);

 f = function(c){return invCDF(state.r, state.p, k1, p0, p1, c);};
 f.r = function(){return state.r;};
 f.p = function(){return state.p;};
 return Object.freeze(f);

Program 4 demonstrates its use by calculating the inverse of the value of the CDF at several values of \(k\).

Program 4: The Negative Binomial Inverse CDF

We have three different ways to generate observations of a negative binomially distributed random variable. Firstly, if \(r\) is an integer then we can simply sum the results of that many observations of the geometric distribution, which is reasonably efficient if it's small. Secondly, if it's not and the mean isn't too large, then we can invert the CDF at an observation of a standard uniformly distributed random variable. Thirdly, we can make an observation of a Poisson distribution with a Gamma distributed rate
\[ \begin{align*} \lambda &\sim Gamma\left(r, \frac{p}{1-p}\right)\\ K &\sim P(\lambda) \end{align*} \]
where \(\sim\) means drawn from, as proven in derivation 4.

Derivation 4: The PMF Of The Poisson With Gamma Distributed Rate
To find the mass at \(k\) we need to integrate the across all of the possible ways that it can be observed, or in other words across all of the possible rates
\[ \Pr(K=k) = \int_0^\infty \Pr(K=k|\lambda=x) \times p_\lambda(x) \, \mathrm{d}x \]
where the vertical bar means given that and \(p_\lambda\) is the PDF of the rate. Substituting the formulae for the terms in the integrand yields
\[ \begin{align*} \Pr(K=k) &= \int_0^\infty \left(\frac{x^k}{k!} \times e^{-x}\right) \times \left(\frac{\left(\frac{p}{1-p}\right)^r \times x^{r-1} \times e^{-\frac{p}{1-p}\times x}}{\Gamma(r)}\right)\,\mathrm{d}x\\ &= \frac{\left(\frac{p}{1-p}\right)^r}{k! \times \Gamma(r)} \times \int_0^\infty x^{k+r-1} \times e^{-x -\frac{p}{1-p}\times x} \,\mathrm{d}x\\ &= \frac{\left(\frac{p}{1-p}\right)^r}{k! \times \Gamma(r)} \times \int_0^\infty x^{k+r-1} \times e^{-\frac{x}{1-p}} \,\mathrm{d}x \end{align*} \]
Substituting \((1-p) \times y\) for \(x\), we recover the negative binomial PMF
\[ \begin{align*} \Pr(K=k) &= \frac{\left(\frac{p}{1-p}\right)^r}{k! \times \Gamma(r)} \times \int_0^\infty ((1-p) \times y)^{k+r-1} \times e^{-\frac{(1-p) \times y}{1-p}} \times (1-p) \, \mathrm{d}y\\ &= \frac{\left(\frac{p}{1-p}\right)^r}{k! \times \Gamma(r)} \times (1-p)^{k+r} \times \int_0^\infty y^{k+r-1} \times e^{-y} \, \mathrm{d}y = \frac{\Gamma(k+r)}{k! \times \Gamma(r)} \times (1-p)^k \times p^r \end{align*} \]
\[ \Gamma(t) = \int_0^\infty x^{t-1} \times e^{-x} \, \mathrm{d}x \]

Listing 5 shows how the ak.negativeBinomialRnd function switches between these approaches and adds functions to the constructors object to allow the specification of uniform random number generators other than the default Math.random.

Listing 5: ak.negativeBinomialRnd
function directRnd(r, grnd) {
 var k = 0;
 var i;

 for(i=0;i<r;++i) k += grnd();
 return k;

ak.negativeBinomialRnd = function() {
 var state = {r: 1, p: 0.5, rnd: Math.random};
 var arg0  = arguments[0];
 var grnd, k1, p0, p1, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 if(state.r<=4 && state.r===ak.floor(state.r)) {
  grnd = ak.geometricRnd(state.p, state.rnd);
  f = function(){return directRnd(state.r, grnd);};
 else if(state.r*(1-state.p)<=8*state.p) {
  k1 = ak.ceil(state.r*(1-state.p)/state.p);
  p0 = ak.betaP(state.r, 1, state.p);
  p1 = ak.betaP(state.r, k1+1, state.p);
  f = function(c){return invCDF(state.r, state.p, k1, p0, p1, state.rnd());};
  grnd = ak.gammaRnd(state.r, state.p/(1-state.p), state.rnd);
  f = function(){return ak.poissonRnd(grnd())();};
 f.r = function(){return state.r;};
 f.p = function(){return state.p;};
 f.rnd = function(){return state.rnd;};
 return Object.freeze(f);

constructors[ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;

constructors[ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;

Program 5 compares a histogram of the results of ak.negativeBinomialRnd to a graph of ak.negativeBinomialPMF to demonstrate its correctness.

Program 5: ak.negativeBinomialRnd

You should try changing the values of \(p\) and \(r\) to satisfy yourself that each of the approaches works as I claim and, after you've done so, you can find the implementations of these functions in NegativeBinomialDistribution.js.


[1] One Thing Or Another,, 2020.

[2] If At First You Don't Succeed,, 2020.

[3] How Long Must I Wait?,, 2015.

[4] Kth Time's The Charm,, 2015.

[5] Beta Animals,, 2020.

[6] ...And Then Three Came Along At Once!,, 2015.

Leave a comment