A Jolly Student's Tea Party

| 0 Comments

Last time[1] we took a look at the chi-squared distribution which describes the behaviour of sums of squares of standard normally distributed random variables, having means of zero and standard deviations of one.
Tangentially related is Student's t-distribution which governs the deviation of means of sets of independent observations of a normally distributed random variable from its known true mean, which we shall examine in this post.

The Distribution Of Sums Of A Normal Random Variable

For a set of observations of a normal random variable \(N(\mu, \sigma)\) having a mean of \(\mu\) and a standard deviation of \(\sigma\) we have
\[ \begin{align*} Z_i &\sim N\left(\mu, \sigma\right)\\ \mu^\prime &= \frac{1}{n} \times \sum_{i=0}^{n-1} Z_i \end{align*} \]
where \(\sim\) means drawn from, \(\sum\) is the summation sign and \(\mu^\prime\) is the mean of the sum. Furthermore the standard deviation of the sum is given by
\[ \sigma^\prime = \frac{\sigma}{\sqrt{n}} \]
We can transform these random quantities into a standard normal distribution with
\[ \frac{\mu^\prime - \mu}{\sigma^\prime} = \frac{\mu^\prime - \mu}{n^{-\frac12} \times \sigma} \sim N(0,1) \]
Now this assumes that we know the mean \(\mu\) and standard deviation \(\sigma\) of the normal random variable. If we instead approximate the former with the mean of a sample of it then the best estimate of its standard deviation is given by
\[ \begin{align*} s^2 &= \frac{1}{n-1} \times \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right)^2\\ s^\prime &= \frac{s}{\sqrt{n}} \end{align*} \]
as proven by derivation 1.

Derivation 1: The Sample Variance
Firstly, with the convention that a sum without explicit bounds implicitly ranges from the first to the last sample, exploiting the facts that the expected value of a sum is equal to the sum of the expected values of its terms and that the expected value of a constant multiple of a random variable is equal to the random variable's expected value multiplied by that constant we have
\[ \begin{align*} \small{\mathrm{E}\left[\frac{1}{n} \times \sum_{i} \left(Z_i - \mu^\prime\right)^2\right]} &= \small{\frac{1}{n} \times \sum_{i} \mathrm{E}\left[\left(Z_i - \mu^\prime\right)^2\right]} = \small{\frac{1}{n} \times n \times \mathrm{E}\left[\left(Z_i - \mu^\prime\right)^2\right]}\\ &= \small{\mathrm{E}\left[\left(Z_i - \mu^\prime\right)^2\right]} = \small{\mathrm{E}\left[\left(Z_i - \frac{1}{n} \times \sum_{j} Z_j\right)^2\right]}\\ &= \small{\mathrm{E}\left[Z_i^2 - \frac{2}{n} \times Z_i \times \sum_{j} Z_j + \left(\frac{1}{n} \times \sum_{j} Z_j\right)^2\right]}\\ &= \small{\mathrm{E}\left[Z_i^2 - \frac{2}{n} \times Z_i \times \sum_{j} Z_j + \frac{1}{n^2} \times \sum_{j} Z_j \times \sum_k Z_k\right]} \end{align*} \]
where \(\mathrm{E}\) stands for the expected value of the term between the square brackets following it and \(Z_i\) represents any one element of the sample.

Noting that
\[ \small{\sigma^2} = \small{\mathrm{E}\left[\left(Z_i - \mu\right)^2\right]} = \small{\mathrm{E}\left[Z_i^2 - 2 \times Z_i \times \mu + \mu^2\right]} = \small{\mathrm{E}\left[Z_i^2\right] - 2 \times \mathrm{E}\left[Z_i\right] \times \mu + \mu^2} = \small{\mathrm{E}\left[Z_i^2\right] - \mu^2} \]
splitting the expectation into two parts and using the convention that the sum over \(j \ne i\) includes every element of the sample except the \(i^\mathrm{th}\) yields
\[ \begin{align*} \small{\mathrm{E}\left[Z_i^2 - \frac{2}{n} \times Z_i \times \sum_{j} Z_j\right]} &= \small{\mathrm{E}\left[Z_i^2 - \frac{2}{n} \times Z_i^2 - \frac{2}{n} \times Z_i \times \sum_{j \ne i} Z_j\right]}\\ &= \small{\frac{n-2}{n} \times \mathrm{E}\left[Z_i^2\right] - \frac{2}{n} \times \mathrm{E}\left[Z_i\right] \times \sum_{j \ne i} \mathrm{E}\left[Z_j\right]}\\ &= \small{\frac{n-2}{n} \times \left(\sigma^2 + \mu^2\right) - \frac{2}{n} \times (n-1) \times \mu^2} = \small{\frac{n-2}{n} \times \sigma^2 - \mu^2} \end{align*} \]
and
\[ \begin{align*} \small{\mathrm{E}\left[\frac{1}{n^2} \times \sum_{j} Z_j \times \sum_k Z_k\right]} &= \small{\frac{1}{n^2} \times \left(\sum_{j} \mathrm{E}\left[Z_j^2\right] + \sum_{j} \mathrm{E}\left[Z_j\right] \times \sum_{k \ne j} \mathrm{E}\left[Z_k\right]\right)}\\ &= \small{\frac{1}{n^2} \times \left(n \times \left(\sigma^2 + \mu^2\right) + n \times (n-1) \times \mu^2\right)} = \small{\frac{1}{n} \times \sigma^2 + \mu^2} \end{align*} \]
Adding these components results in
\[ \small{\mathrm{E}\left[\frac{1}{n} \times \sum_{i} \left(Z_i - \mu^\prime\right)^2\right]} = \small{\left(\frac{n-2}{n} \times \sigma^2 - \mu^2\right) + \left(\frac{1}{n} \times \sigma^2 + \mu^2\right)} = \small{\frac{n-1}{n} \times \sigma^2} \]
and consequently
\[ \small{\sigma^2} = \small{\mathrm{E}\left[\frac{1}{n-1} \times \sum_{i} \left(Z_i - \mu^\prime\right)^2\right]} \]

The Deviation Of A Sample Normal Mean From An Assumed Normal Mean

The distribution of the sample mean \(\mu^\prime\) from an assumed true mean \(\mu\) is governed by Student's t-distribution
\[ \frac{\mu^\prime - \mu}{s^\prime} = \frac{\mu^\prime - \mu}{n^{-\frac12} \times s} \sim T(n-1) \]
which I shall assert without proof, on the grounds that I don't have the skill to do so but am willing to put my faith in my textbooks, has a cumulative distribution function, or CDF , of
\[ F_\nu(x) = \begin{cases} \tfrac12 + \tfrac12 \times \left(I_1\left(\tfrac{\nu}2, \tfrac12\right) - I_{\tfrac{\nu}{\nu+x^2}}\left(\tfrac{\nu}2, \tfrac12\right)\right) & x > 0\\ \tfrac12 & x = 0\\ \tfrac12 - \tfrac12 \times \left(I_1\left(\tfrac{\nu}2, \tfrac12\right) - I_{\tfrac{\nu}{\nu+x^2}}\left(\tfrac{\nu}2, \tfrac12\right)\right) & x < 0 \end{cases} \]
where \(\nu\) is its degrees of freedom, equal to \(n-1\), and \(I_x(a, b)\) is the regularised incomplete beta function[2]. Assuming that the form of the CDF is correct, it is much simpler to show that its probability density function, or PDF, is given by
\[ f_\nu(x) = \frac{\Gamma\left(\frac{\nu+1}2\right)}{\Gamma\left(\frac{\nu}2\right) \times \left(\pi \times \nu\right)^\frac12} \times \left(\frac{\nu}{\nu + x^2}\right)^\tfrac{\nu + 1}2 \]
where \(\Gamma\) is the gamma function, as done by derivation 2.

Derivation 2: Deriving The PDF From The CDF
The regularised incomplete beta function and its derivative are given by
\[ \begin{align*} I_x(a, b) &= \frac{\mathrm{B}_x(a, b)}{\mathrm{B}(a, b)} = \frac{\int_0^x t^{a-1} \times (1-t)^{b-1} \, \mathrm{d}t}{\frac{\Gamma(a) \times \Gamma(b)}{\Gamma(a+b)}}\\ \frac{\mathrm{d}}{\mathrm{d}x} I_x(a, b) &= \frac{\Gamma(a+b) \times x^{a-1} \times (1-x)^{b-1}}{\Gamma(a) \times \Gamma(b)} \end{align*} \]
Using the chain rule to differentiate the non-constant part of the CDF yields
\[ \begin{align*} \frac{\mathrm{d}}{\mathrm{d}x} I_{\tfrac{\nu}{\nu+x^2}}\left(\tfrac{\nu}2, \tfrac12\right) &= \frac{\Gamma\left(\tfrac{\nu}2+\tfrac12\right) \times \left(\tfrac{\nu}{\nu+x^2}\right)^{\tfrac{\nu}2-1} \times \left(1-\tfrac{\nu}{\nu+x^2}\right)^{\tfrac12-1}}{\Gamma\left(\tfrac{\nu}2\right) \times \Gamma\left(\tfrac12\right)} \times \frac{\mathrm{d}}{\mathrm{d}x} \frac{\nu}{\nu+x^2}\\ &= \frac{\Gamma\left(\tfrac{\nu+1}2\right) \times \left(\tfrac{\nu}{\nu+x^2}\right)^{\tfrac{\nu}2-1} \times \left(\tfrac{x^2}{\nu+x^2}\right)^{-\tfrac12}}{\Gamma\left(\tfrac{\nu}2\right) \times \pi^\frac12} \times -2 \times x \times \frac{\nu}{\left(\nu+x^2\right)^2}\\ &= -2 \times \mathrm{sgn}(x) \times \frac{\Gamma\left(\frac{\nu+1}2\right)}{\Gamma\left(\frac{\nu}2\right) \times \left(\pi \times \nu\right)^\frac12} \times \left(\frac{\nu}{\nu+x^2}\right)^{\tfrac{\nu+1}2}\\ &\quad\quad\quad \times \nu^\tfrac12 \times \left(\frac{\nu}{\nu+x^2}\right)^{-\tfrac{3}2} \times \left(\frac{1}{\nu+x^2}\right)^{-\tfrac12} \times \frac{\nu}{\left(\nu+x^2\right)^2} \end{align*} \]
where the \(\mathrm{sgn}\) function returns the sign of its argument.

Finally, we have
\[ \begin{align*} \nu^\tfrac12 &\times \left(\frac{\nu}{\nu+x^2}\right)^{-\tfrac{3}2} \times \left(\frac{1}{\nu+x^2}\right)^{-\tfrac12} \times \frac{\nu}{\left(\nu+x^2\right)^2}\\ &= \nu^\tfrac12 \times \nu^{-\tfrac32} \times \nu \times \left(\frac1{\nu+x^2}\right)^{-\tfrac32} \times \left(\frac1{\nu+x^2}\right)^{-\tfrac12} \times \left(\frac1{\nu+x^2}\right)^2 = 1 \end{align*} \]
from which the PDF trivially follows.

If we define
\[ \begin{align*} Z &\sim N(0, 1)\\ X^2 &\sim Chi^2(\nu) \end{align*} \]
where \(Chi^2(\nu)\) is a chi-squared distribution with an argument of \(\nu\), then we can observe Student's t random values with \(\nu\) degrees of freedom using
\[ Z \times \left(\frac{X^2}{\nu}\right)^{-\frac12} \sim T(\nu) \]
as informally demonstrated by derivation 3.

Derivation 3: The Student's T Random Variate
Firstly, for a pair of independent chi-squared variates with \(n-1\) and \(1\) degrees of freedom respectively, we have
\[ X^2_{n-1} + X^2_1 \sim X^2_n \]
where we're using \(\sim\) to mean distributed as.

Next, the sum of the squared differences between samples of a normal variate and its true mean may be expressed as
\[ \begin{align*} \sum_{i=0}^{n-1} \left(Z_i - \mu\right)^2 &= \sum_{i=0}^{n-1} \left(\left(Z_i - \mu^\prime\right) + \left(\mu^\prime - \mu\right)\right)^2\\ &= \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right)^2 + 2 \times \left(\mu^\prime - \mu\right) \times \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right) + \sum_{i=0}^{n-1}\left(\mu^\prime - \mu\right)^2\\ &= \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right)^2 + 2 \times \left(\mu^\prime - \mu\right) \times \left(n \times \mu^\prime - n \times \mu^\prime\right) + n \times \left(\mu^\prime - \mu\right)^2\\ &= \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right)^2 + n \times \left(\mu^\prime - \mu\right)^2 \end{align*} \]
Dividing through by its true variance yields
\[ \sum_{i=0}^{n-1} \left(\frac{Z_i - \mu}\sigma\right)^2 = \sum_{i=0}^{n-1} \left(\frac{Z_i - \mu^\prime}\sigma\right)^2 + n \times \left(\frac{\mu^\prime - \mu}\sigma\right)^2 = \sum_{i=0}^{n-1} \left(\frac{Z_i - \mu^\prime}\sigma\right)^2 + \left(\frac{\mu^\prime - \mu}{n^{-\frac12} \times \sigma}\right)^2 \]
Now the terms on the right hand side are independent chi-squared variates, which is a little complicated to prove formally but, informally, can be attributed to the fact that the sample mean and sample variance of a normal variate are independent statistics.

We may therefore conclude that
\[ \begin{align*} \sum_{i=0}^{n-1} \left(\frac{Z_i - \mu^\prime}\sigma\right)^2 &\sim X^2_{n-1}\\ \sum_{i=0}^{n-1} \left(Z_i - \mu^\prime\right)^2 &\sim \sigma^2 \times X^2_{n-1}\\ \mu^\prime - \mu &\sim n^{-\frac12} \times \sigma \times Z_{0,1} \end{align*} \]
which implies that
\[ \frac{\mu^\prime - \mu}{n^{-\frac12} \times s} \sim \left(n^{-\frac12} \times \sigma \times Z_{0,1}\right) \times \left(n^\frac12 \times \left(\frac{\sigma^2 \times X^2_{n-1}}{n-1}\right)^{-\frac12}\right) = Z_{0,1} \times \left(\frac{X^2_{n-1}}{n-1}\right)^{-\frac12} \]
as required.

Note that neither the characteristic function, or CF, defined by the integral expression
\[ \hat{f}_\nu(t) = \int_{x=-\infty}^{\infty} e^{i \times t \times x} \times f_\nu(x) \, \mathrm{d}x \]
where \(i\) is the square root of minus one, nor the inverse of the CDF can be expressed as simple closed form formulae and so we must instead resort to numerical approximation when calculating them.

The AK Implementations Of The Student's T-Distribution Functions

The first thing that we need when implementing the Student's t-distribution function is a means of setting its parameters which we shall do using our usual scheme of dispatching to a constructors object, as defined in listing 1.

Listing 1: The Constructors
var constructors = {};

constructors[ak.UNDEFINED_T] = function(state) {
};

constructors[ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

constructors[ak.NUMBER_T] = function(state, nu, args) {
 var arg1 = args[1];
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1);

 state.nu = Number(nu);
 if(state.nu<=0 || !isFinite(state.nu) || state.nu!==ak.floor(state.nu)) {
  throw new Error('invalid nu in ak.studentsT distribution');
 }
};

constructors[ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
};

constructors[ak.NUMBER_T][ak.FUNCTION_T] = function(state, rnd) {
 state.rnd = rnd;
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, eps) {
 state.eps = Math.abs(eps);
 if(isNaN(state.eps)) {
  throw new Error('invalid convergence threshold in ak.studentsT distribution');
 }
};

Note that, in addition to the optional degrees of freedom other than two, this allows for the definition of either a random number generator for the random variable or a convergence threshold for the numerical approximation of the CF and inverse CDF but not both at the same time.

First up is ak.studentsTCDF which exploits our ak.betaP implementation of the regularised incomplete beta function, as defined in listing 2.

Listing 2: ak.studentsTCDF
ak.studentsTCDF = function() {
 var state = {nu: 2};
 var arg0  = arguments[0];
 var f, a;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 a = ak.betaP(0.5*state.nu, 0.5, 1);

 f = function(x) {
  var b;
  if(x===0) return 0.5;
  b = ak.betaP(0.5*state.nu, 0.5, state.nu/(state.nu+x*x));
  return x>0 ? 0.5 + 0.5*(a-b) : 0.5 - 0.5*(a-b);
 };
 f.nu = function(){return state.nu;};
 return Object.freeze(f);
};

Next we have ak.studentsTPDF, defined in terms of our implementation of the gamma function[3], or at least its logarithm ak.logGamma, as shown by listing 3.

Listing 3: ak.studentsTPDF
ak.studentsTPDF = function() {
 var state = {nu: 2};
 var arg0  = arguments[0];
 var f, a;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 a = ak.logGamma(0.5*state.nu+0.5) - ak.logGamma(0.5*state.nu)
   - 0.5*Math.log(state.nu) - 0.5*Math.log(ak.PI);

 f = function(x) {
      return Math.exp(a - (0.5*state.nu+0.5)*Math.log(1+x*x/state.nu));
     };
 f.nu = function(){return state.nu;};
 return Object.freeze(f);
};

Program 1 demonstrates both that the Students t-distribution is more likely to generate extreme values than the standard normal distribution if \(\nu\) is small, since in such cases the densities of the former exceed those of the latter when their arguments are large and positive or large and negative, and also that it tends to that latter as \(\nu\) grows larger, plotting the former's PDF in black and the latter's in grey.

Program 1: The Student's T PDF

Listing 4 implements the Student's t random variable exactly as describe above

Listing 4: ak.studentsTRnd
ak.studentsTRnd = function() {
 var state = {nu: 2, rnd: Math.random};
 var arg0  = arguments[0];
 var f, phi, chi2;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 phi = ak.normalRnd(state.rnd);
 chi2 = ak.chiSquaredRnd(state.nu, state.rnd);

 f = function() {return phi() / Math.sqrt(chi2()/state.nu)};
 f.nu = function(){return state.nu;};
 f.rnd = function(){return state.rnd;};
 return Object.freeze(f);
};

which program 2 demonstrates by plotting a histogram of its observations.

Program 2: The Student's T Variate

For the CF we know that the complex part of its results must be zero since its PDF is symmetric. The implementation given in listing 5 consequently only bothers to integrate the real part, using our Romberg integrator. The cfBound function sets the limits of integration according to the magnitude of the PDF and the cfIntegral function implements the integrand upon which integrator operates, as shown by listing 5.

Listing 5: ak.studentsTCF
function cfBound(pdf, eps) {
 var b = 1;
 while(pdf(b)>eps) b *= 2;
 return b;
}

function cfIntegral(t, pdf, b, eps) {
 var f = function(x) {return pdf(x) * Math.cos(t*x);};
 var re = ak.rombergIntegral(f, eps);
 return ak.complex(2*re(0, b), 0);
}

ak.studentsTCF = function() {
 var state = {nu: 2, eps: Math.pow(ak.EPSILON, 0.5)};
 var arg0  = arguments[0];
 var pdf, b, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 pdf = ak.studentsTPDF(state.nu);
 b = cfBound(pdf, state.eps);

 f = function(t) {return cfIntegral(t, pdf, b, state.eps);};
 f.nu = function(){return state.nu;};
 return Object.freeze(f);
};

Program 3 plots the real part of the CF in black and its always zero complex part in red.

Program 3: The Student's T CF

Listing 6 implements the inverse CDF using the ak.newtonInverse inverter in conjunction with the PDF, CDF and the helper function invCDF

Listing 6: ak.studentsTInvCDF
ak.studentsTInvCDF = function() {
 var state = {nu: 2, eps: Math.pow(ak.EPSILON, 0.75)};
 var arg0  = arguments[0];
 var pdf, cdf, fdf, inv, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 pdf = ak.studentsTPDF(state.nu);
 cdf = ak.studentsTCDF(state.nu);
 fdf = function(x){return [cdf(x), pdf(x)];};
 inv = ak.newtonInverse(fdf, state.eps);

 f = function(c){return invCDF(c, inv, cdf);};
 f.nu = function(){return state.nu;};
 return Object.freeze(f);
};

which simply searches for an interval that contains the inverse before applying the inverter, as shown by listing 7.

Listing 7: The invCDF Helper Function
function invCDF(c, inv, cdf) {
 var b0, b1;

 if(isNaN(c)) return ak.NaN;
 if(c===0.5) return 0;
 if(c<=0) return -ak.INFINITY;
 if(c>=1) return  ak.INFINITY;

 if(c<0.5) {
  b0 = -1; b1 = 0;
  while(cdf(b0)>c) {b1 = b0; b0 *= 2;}
  return isFinite(b0) ? inv(c, [b0, b1]) : b0;
 }

 b0 = 0; b1 = 1;
 while(cdf(b1)<c) {b0 = b1; b1 *= 2;}
 return isFinite(b1) ? inv(c, [b0, b1]) : b1;
}

Program 4 demonstrates that the inverse recovers the argument of the CDF to a relatively high degree of accuracy.

Program 4: The Student's T Inverse CDF

Finally, all of these functions can be found in StudentsTDistribution.js.

References

[1] Chi Chi Again, www.thusspakeak.com, 2022.

[2] Beta Animals, www.thusspakeak.com, 2020.

[3] Gamma Gamma Hey, www.thusspakeak.com, 2015.

Leave a comment