Every Which Way Is Normal

| 0 Comments

A few months ago[1] we saw how we could generalise the concept of a random variable to multiple dimensions by generating random vectors rather than numbers. Specifically we took a look at the multivariate uniform distribution which governs random vectors whose elements are independently uniformly distributed.
Whilst it demonstrated that we can find multivariate versions of distribution functions such as the probability density function, the cumulative distribution function and the characteristic function, the uniform distribution is fairly trivial and so, for a more interesting example, this time we shall look at generalising the normal distribution[2] to multiple dimensions.

Let's Do Some Sums!

A fundamental property of the normal distribution is that sums of independent normally distributed random variables are themselves normally distributed. Specifically, if we have \(n\) normally distributed random variables \(Z_i\) having means of \(\mu_i\) and standard deviations of \(\sigma_i\), written as
\[ Z_i \sim N(\mu_i, \sigma_i) \]
where \(N(\mu,\sigma)\) represents a normal distribution with mean \(\mu\) and standard deviation \(\sigma\) and \(\sim\) means drawn from, then their sum
Derivation 1: Sums Of Normals
Recall that the characteristic function of the normal distribution is given by
\[ \hat{\phi}_{\mu,\sigma}(t) = e^{i \mu t -\frac{1}{2} \sigma^2 t^2} \]
where \(i\) is the square root of minus one.
Now the characteristic function of the sum of a set of random variables is equal to the product of their individual characteristic functions and so we have
\[ \begin{align*} \hat{\phi^\ast}(t) &= \prod_{j=0}^{n-1} \hat{\phi}_{\mu_j,\sigma_j}(t) = \prod_{j=0}^{n-1} e^{i \mu_j t -\frac{1}{2} \sigma_j^2 t^2}\\ &= e^{\sum_{j=0}^{n-1} i \mu_j t -\frac{1}{2} \sigma_j^2 t^2} = e^{i \left(\sum_{j=0}^{n-1} \mu_j\right) t -\frac{1}{2} \left(\sum_{j=0}^{n-1} \sigma_j^2\right) t^2} \end{align*} \]
where \(\prod\) is the product sign. This is trivially the characteristic function of a normal distribution with the required mean and standard deviation.
\[ Z^\ast = \sum_{i=0}^{n-1} Z_i \]
where \(\sum\) is the summation sign, will be distributed as
\[ \begin{align*} Z^\ast &\sim N(\mu^\ast, \sigma^\ast)\\ \mu^\ast &= \sum_{i=0}^{n-1} \mu_i\\ \sigma^\ast &= \sqrt{\sum_{i=0}^{n-1} \sigma_i^2} \end{align*} \]
as proven in derivation 1.

Another property is that linear transformations of normally distributed random variables are also normally distributed. Derivation 2 proves that if
\[ Z \sim N(\mu, \sigma) \]
then the linear transformation
\[ Z^\prime = a Z + b \]
is distributed as
\[ \begin{align*} Z^\prime &\sim N(\mu^\prime, \sigma^\prime)\\ \mu^\prime &= a \mu + b\\ \sigma^\prime & = |a| \sigma \end{align*} \]
where the vertical bars stand for the absolute value of the term between them.

Derivation 2: Linear Transformations Of Normals
Firstly, for positive \(a\) we have
\[ \Pr(aZ+b \leqslant z) = \Pr\left(Z \leqslant \frac{z-b}{a}\right) = \Phi_{\mu,\sigma}\left(\frac{z-b}{a}\right) \]
where \(\Phi\) is the normal CDF.
Next, we can recover the PDF of the linearly transformed variable by differentiating the above probability function with respect to \(z\) to yield
\[ \frac{\mathrm{d}}{\mathrm{d}z} \Pr(aZ+b \leqslant z) = \frac{\mathrm{d}}{\mathrm{d}z} \Phi_{\mu,\sigma}\left(\frac{z-b}{a}\right) = \frac1a \phi_{\mu,\sigma}\left(\frac{z-b}{a}\right) \]
by the chain rule, where
\[ \phi_{\mu,\sigma}(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac12 \left(\tfrac{x-\mu}{\sigma}\right)^2} \]
is the normal PDF. The PDF of the transformed variable is therefore
\[ \frac1a \times \frac{1}{\sqrt{2\pi}\sigma} e^{-\tfrac12 \left(\tfrac{\tfrac{z-b}{a}-\mu}{\sigma}\right)^2} = \frac{1}{\sqrt{2\pi}a\sigma} e^{-\tfrac12 \left(\tfrac{z-(a\mu+b)}{a\sigma}\right)^2} = \phi_{a\mu+b,a\sigma}(z) = \phi_{a\mu+b,|a|\sigma}(z) \]
Now, for negative \(a\) we have
\[ \Pr(aZ+b \leqslant z) = \Pr(-|a|Z+b \leqslant z) = \Pr\left(-Z \leqslant \frac{z-b}{|a|}\right) \]
Finally, the normal distribution is symmetric in the sense that
\[ \Pr(Z \geqslant \mu-z) = \Pr(Z \leqslant \mu+z) \]
Negating the terms in the left hand side's inequality yields
\[ \Pr(-Z \leqslant -\mu+z) = \Pr(Z \leqslant \mu+z) \]
and so \(-Z\) must be normally distributed with a mean of \(-\mu\) and a standard deviation of \(\sigma\). Consequently, by the above argument, the PDF of the transformed variable is
\[ \phi_{-|a|\mu + b,|a|\sigma}(x) = \phi_{a\mu + b,|a|\sigma}(x) \]

Now this is all well and good, you may be thinking, but what does it have to do with multiple dimensions?

What It Has To Do With Multiple Dimensions

If we take an independent pair of standard normally distributed random variables
\[ \begin{align*} Z_0 &\sim P(0, 1)\\ Z_1 &\sim P(0, 1) \end{align*} \]
and linearly combine them to create two new random variables with
\[ \begin{align*} Z^\prime_0 &= m_{00}Z_0 + m_{01}Z_1 + \mu_0\\ Z^\prime_1 &= m_{10}Z_0 + m_{11}Z_1 + \mu_1 \end{align*} \]
then, by the above properties, they must be normally distributed as
\[ \begin{align*} Z^\prime_0 &\sim N(\mu_0, \sigma_0)\\ Z^\prime_1 &\sim N(\mu_1, \sigma_1)\\ \sigma_0 &= \sqrt{m_{00}^2 + m_{01}^2}\\ \sigma_1 &= \sqrt{m_{10}^2 + m_{11}^2} \end{align*} \]
Clearly these random variables are not independent and we can quantify their linear dependence with the expected value of the product of their differences from their means, known as their covariance.
\[ \begin{align*} &\mathrm{E}\left[\left(Z^\prime_0-\mu_0\right) \times \left(Z^\prime_1-\mu_1\right)\right]\\ &\quad\quad= \mathrm{E}\left[\left(m_{00}Z_0 + m_{01}Z_1\right) \times \left(m_{10}Z_0 + m_{11}Z_1\right)\right]\\ &\quad\quad= \mathrm{E}\left[m_{00}Z_0 \times m_{10}Z_0\right] + \mathrm{E}\left[m_{00}Z_0 \times m_{11}Z_1\right] + \mathrm{E}\left[m_{01}Z_1 \times m_{10}Z_0\right] + \mathrm{E}\left[m_{01}Z_1 \times m_{11}Z_1\right]\\ &\quad\quad= m_{00}m_{10}\mathrm{E}\left[Z_0^2\right] + m_{00}m_{11}\mathrm{E}\left[Z_0 Z_1\right] + m_{01}m_{10}\mathrm{E}\left[Z_1 Z_0\right] + m_{01}m_{11}\mathrm{E}\left[Z_1^2\right] \end{align*} \]
Now \(Z_0\) and \(Z_1\) are independent and so we have
\[ \mathrm{E}\left[Z_0 Z_1\right] = \mathrm{E}\left[Z_0\right]\mathrm{E}\left[Z_1\right] = 0 \]
and therefore
\[ \mathrm{E}\left[\left(Z^\prime_0-\mu_0\right) \times \left(Z^\prime_1-\mu_1\right)\right] = m_{00}m_{10}\mathrm{E}\left[Z_0^2\right] + m_{01}m_{11}\mathrm{E}\left[Z_1^2\right] = m_{00}m_{10} + m_{01}m_{11} \]
since both \(Z_0\) and \(Z_1\) have means of zero and so the expectations of their squares equal their variances which, being the squares of their standard deviations, equal one.
Now the covariance is a measure of both the dependence of two random variables and their individual variability. To separate the concepts, the covariance is often divided by the product of their standard deviations to yield the correlation, which in this case equals
\[ \frac{\mathrm{E}\left[\left(Z^\prime_0-\mu_0\right) \times \left(Z^\prime_1-\mu_1\right)\right]}{\sigma_0 \times \sigma_1} = \frac{m_{00}m_{10} + m_{01}m_{11}}{\sqrt{m_{00}^2 + m_{01}^2} \times \sqrt{m_{10}^2 + m_{11}^2}} \]
Note that the correlation between any two random variables must be greater than or equal to minus one and less than or equal to one, as proven by derivation 3.

Derivation 3: Bounds For Correlations
We need only consider random variables \(X_0\) and \(X_1\) with means of zero since we can always add a constant to their observed values to transform them into those having any given mean.
For \(n\) observations \(x_{0i}\) and \(x_{1i}\) their sample covariance and variances are given by
\[ \begin{align*} \mathrm{E}_n\left[X_0 X_1\right] &= \frac1n\sum_{i=0}^{n-1} x_{0i} x_{1i}\\ \mathrm{E}_n\left[X_0^2\right] &= \frac1n\sum_{i=0}^{n-1} x_{0i}^2\\ \mathrm{E}_n\left[X_0^2\right] &= \frac1n\sum_{i=0}^{n-1} x_{1i}^2 \end{align*} \]
If we construct vectors \(\mathbf{x}_0\) and \(\mathbf{x}_1\) whose elements equal those observations then we have
\[ \begin{align*} \mathrm{E}_n\left[X_0 X_1\right] &= \frac1n \times \mathbf{x}_0 \times \mathbf{x}_1\\ \mathrm{E}_n\left[X_0^2\right] &= \frac1n \times \mathbf{x}_0 \times \mathbf{x}_0\\ \mathrm{E}_n\left[X_1^2\right] &= \frac1n \times \mathbf{x}_1 \times \mathbf{x}_1\\ \end{align*} \]
and
\[ \begin{align*} -1 \leqslant \frac{\mathrm{E}_n\left[X_0 X_1\right]}{\sqrt{\mathrm{E}_n\left[X_0^2\right]} \times \sqrt{\mathrm{E}_n\left[X_1^2\right]}} \leqslant 1 &\Leftrightarrow \left(\frac{\mathrm{E}_n\left[X_0 X_1\right]}{\sqrt{\mathrm{E}_n\left[X_0^2\right]} \times \sqrt{\mathrm{E}_n\left[X_1^2\right]}}\right)^2 \leqslant 1\\ &\Leftrightarrow \mathrm{E}_n\left[X_0 X_1\right]^2 \leqslant \mathrm{E}_n\left[X_0^2\right] \times \mathrm{E}_n\left[X_1^2\right]\\ &\Leftrightarrow \left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2 \leqslant \left(\mathbf{x}_0 \times \mathbf{x}_0\right) \times \left(\mathbf{x}_1 \times \mathbf{x}_1\right) \end{align*} \]
This is trivially true if every element of either \(\mathbf{x}_0\) or \(\mathbf{x}_1\) equals zero. If not, then it must be the case that
\[ \begin{align*} 0 &\leqslant \left(\mathbf{x}_0 - \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1\right) \times \left(\mathbf{x}_0 - \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1\right)\\ &= \mathbf{x}_0 \times \mathbf{x}_0 - \mathbf{x}_0 \times \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1 - \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1 \times \mathbf{x}_0 + \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1 \times \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \mathbf{x}_1\\ &= \mathbf{x}_0 \times \mathbf{x}_0 - \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \left(\mathbf{x}_0 \times \mathbf{x}_1\right) - \frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1} \left(\mathbf{x}_1 \times \mathbf{x}_0\right) + \left(\frac{\mathbf{x}_0 \times \mathbf{x}_1}{\mathbf{x}_1 \times \mathbf{x}_1}\right)^2 \left(\mathbf{x}_1 \times \mathbf{x}_1\right)\\ &= \mathbf{x}_0 \times \mathbf{x}_0 - \frac{\left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2}{\mathbf{x}_1 \times \mathbf{x}_1} - \frac{\left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2}{\mathbf{x}_1 \times \mathbf{x}_1} + \frac{\left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2}{\mathbf{x}_1 \times \mathbf{x}_1} = \mathbf{x}_0 \times \mathbf{x}_0 - \frac{\left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2}{\mathbf{x}_1 \times \mathbf{x}_1} \end{align*} \]
and therefore
\[ \left(\mathbf{x}_0 \times \mathbf{x}_1\right)^2 \leqslant \left(\mathbf{x}_0 \times \mathbf{x}_0\right) \times \left(\mathbf{x}_1 \times \mathbf{x}_1\right) \]
This holds for all \(n\) and hence implies bounds for the true correlation in the infinite limit.

Program 1 illustrates the difference between the distributions of points defined by \(Z_0\) and \(Z_1\) and those defined by \(Z^\prime_0\) and \(Z^\prime_1\) with means of zero by plotting the former in dark grey and the latter in black.

Program 1: Correlated Normal Random Variables

The former are spread out in a rough circle which is more densely populated at the centre whilst the latter fill a tilted ellipse in a similar fashion. This elliptical distribution is a consequence of \(Z^\prime_0\) and \(Z^\prime_1\) having different standard deviations but the tilt comes from their correlation.

Now we can easily generalise this two dimensional correlated normal distribution to any number of dimensions using a little linear algebra. Specifically, if we have a vector \(\mathbf{Z}\) whose elements \(Z_i\) are independent standard normally distributed random variables
\[ Z_i \sim N(0, 1) \]
then, given a matrix \(\mathbf{M}\) and a vector \(\mathbf{\mu}\)
\[ \mathbf{Z}^\prime = \mathbf{M} \times \mathbf{Z} + \mathbf{\mu} \]
is a correlated multivariate normal random variable whose elements are distributed as
\[ \begin{align*} Z^\prime_i &\sim N\left(\mu_i, \sigma_i\right)\\ \sigma_i &= \sqrt{\sum_j m_{ij}^2} \end{align*} \]
by the same argument that we used above since, by the rules of linear algebra
\[ Z^\prime_i = \left(\sum_j m_{ij} \times Z_j\right) + \mu_i \]
Derivation 4: The Covariance Matrix
From the definition of \(\mathbf{Z}^\prime\) we have
\[ Z^\prime_i - \mu_i = \sum_j m_{ij} \times Z_j \]
and so
\[ \begin{align*} &\mathrm{E}\left[\left(Z^\prime_i - \mu_i\right) \times \left(Z^\prime_j - \mu_j\right)\right]\\ &\quad= \mathrm{E}\left[\left(\sum_k m_{ik} \times Z_k\right) \times \left(\sum_l m_{jl} \times Z_l\right)\right]\\ &\quad= \mathrm{E}\left[\sum_k \sum_l m_{ik} \times Z_k \times m_{jl} \times Z_l\right]\\ &\quad= \sum_k \sum_l m_{ik} \times m_{jl} \times \mathrm{E}\left[Z_k Z_l\right] \end{align*} \]
Now \(\mathrm{E}\left[Z_k Z_l\right]\) equals one if \(k\) equals \(l\) and zero otherwise, yielding
\[ \begin{align*} \mathrm{E}\left[\left(Z^\prime_i - \mu_i\right) \times \left(Z^\prime_j - \mu_j\right)\right] &= \sum_k m_{ik} \times m_{jk}\\ &= \left(\mathbf{M} \times \mathbf{M}^\mathrm{T}\right)_{ij} \end{align*} \]
Finally, if we define
\[ \mathbf{\Sigma} = \mathbf{M} \times \mathbf{M}^\mathrm{T} \]
where \(\mathbf{M}^\mathrm{T}\) is the transpose of \(\mathbf{M}\) in which its rows and columns are switched, then the covariance between the \(i^\mathrm{th}\) and \(j^\mathrm{th}\) elements of \(\mathbf{Z}^\prime\) is given by
\[ \mathrm{E}\left[\left(Z^\prime_i - \mu_i\right) \times \left(Z^\prime_j - \mu_j\right)\right] = \Sigma_{ij} \]
as proven in derivation 4.

Note that this means that if we can find a matrix \(\mathbf{M}\) that satisfies this relation for any given covariance matrix \(\mathbf{\Sigma}\) then we can use it to transform standard normal vectors into multivariate normal vectors having those covariances. Fortunately, we have already seen how to do that using the Cholesky decomposition[3] which, given a positive definite \(\mathbf{\Sigma}\) satisfying
\[ \mathbf{v} \times \mathbf{\Sigma} \times \mathbf{v} > 0 \]
for any non-zero vector \(\mathbf{v}\), will find a lower triangular matrix \(\mathbf{L}\), having all zero elements on the top right above the diagonal, such that
\[ \mathbf{\Sigma} = \mathbf{L} \times \mathbf{L}^\mathrm{T} \]
Now because \(\mathbf{\Sigma}\) is the product of a matrix and its transpose it must be at least positive semidefinite since
\[ \mathbf{v} \times \mathbf{M} \times \mathbf{M}^\mathrm{T} \times \mathbf{v} = \left(\mathbf{v} \times \mathbf{M}\right) \times \left(\mathbf{M}^\mathrm{T} \times \mathbf{v}\right) = \left(\mathbf{v} \times \mathbf{M}\right) \times \left(\mathbf{v} \times \mathbf{M}\right) = \mathbf{v}^\prime \times \mathbf{v}^\prime \geqslant 0 \]
and in practice it's not so troubling to make the stricter requirement of positive definiteness since that also guarantees that it has an inverse.

The Characteristic Function

Of the functions associated with probability distributions, the characteristic function, or CF, is the perhaps the easiest to derive for the multivariate normal distribution. For a multivariate random variable \(\mathbf{X}\) and a vector argument \(\mathbf{t}\) the characteristic function \(\hat{f}_\mathbf{X}\) is defined by the expectation
\[ \hat{f}_\mathbf{X}(\mathbf{t}) = \mathrm{E}\left[e^{i\mathbf{t}\mathbf{X}}\right] \]
where \(i\) is the imaginary unit. For our multivariate normally distributed \(\mathbf{Z}^\prime\) this works out as
\[ \hat{\phi}_{\mathbf{\mu},\mathbf{\Sigma}}(\mathbf{t}) = e^{i\mathbf{\mu}\mathbf{t} - \frac12 \mathbf{t} \mathbf{\Sigma} \mathbf{t}} \]
as proven in derivation 5.

Derivation 5: The Multivariate Normal CF
By definition, the characteristic function of the multivariate normally distributed random variable \(\mathbf{Z}^\prime\) is given by
\[ \hat{\phi}_{\mathbf{\mu},\mathbf{\Sigma}}(\mathbf{t}) = \mathrm{E}\left[e^{i\mathbf{t}\mathbf{Z}^\prime}\right] = \mathrm{E}\left[e^{i\mathbf{t}(\mathbf{L} \mathbf{Z} + \mathbf{\mu})}\right] = \mathrm{E}\left[e^{i\mathbf{t}(\mathbf{L}\mathbf{Z})}\right] \times e^{i\mathbf{t}\mathbf{\mu}} = e^{i\mathbf{\mu}\mathbf{t}} \times \mathrm{E}\left[e^{i(\mathbf{t}\mathbf{L})\mathbf{Z}}\right] \]
Noting that
\[ (\mathbf{t}\mathbf{L})\mathbf{Z} = \sum_j (\mathbf{t}\mathbf{L})_j Z_j \]
and that the terms in the sum are independent normally distributed random variables with means of zero and standard deviations of \((\mathbf{t}\mathbf{L})_j\), we have
\[ \mathrm{E}\left[e^{i(\mathbf{t}\mathbf{L})\mathbf{Z}}\right] = \mathrm{E}\left[e^{i \times 1 \times ((\mathbf{t}\mathbf{L})\mathbf{Z})}\right] = \hat{f}_{\sum_j (\mathbf{t}\mathbf{L})_j Z_j}(1) = \prod_j \hat{\phi}_{0,(\mathbf{t}\mathbf{L})_j}(1) \]
where \(\hat{\phi}_{\mu,\sigma}\) is the univariate normal CF, since the characteristic function of the sum of a set of independent random variables equals the product of their individual characteristic functions.
Finally, substituting in the definition of the univariate CF
\[ \hat{\phi}_{\mu,\sigma}(t) = e^{i \mu t - \frac12 \sigma^2 t} \]
yields
\[ \begin{align*} \hat{\phi}_{\mathbf{\mu},\mathbf{\Sigma}}(\mathbf{t}) = e^{i\mathbf{\mu}\mathbf{t}} \times \prod_j e^{-\frac12 (\mathbf{t}\mathbf{L})_j^2} = e^{i\mathbf{\mu}\mathbf{t} - \frac12 \sum_j (\mathbf{t}\mathbf{L})_j^2} &= e^{i\mathbf{\mu}\mathbf{t} - \frac12 (\mathbf{t}\mathbf{L}) \times (\mathbf{t}\mathbf{L})} = e^{i\mathbf{\mu}\mathbf{t} - \frac12 (\mathbf{t}\mathbf{L}) \times (\mathbf{L}^\mathrm{T}\mathbf{t})}\\ &= e^{i\mathbf{\mu}\mathbf{t} - \frac12 \mathbf{t} \times \left(\mathbf{L}\mathbf{L}^\mathrm{T}\right) \times \mathbf{t}} = e^{i\mathbf{\mu}\mathbf{t} - \frac12 \mathbf{t} \mathbf{\Sigma} \mathbf{t}} \end{align*} \]

Program 2 illustrates the two dimensional bivariate normal CF by plotting its real part in the upper contour chart and its imaginary part in the lower one with red points representing negative values and green points positive ones in both cases.

Program 2: The Bivariate Normal CF

If you change l to the identity matrix with

var l = ak.matrix([[1,0],[0,1]]);

then you will see that the contour plots decay circularly rather than elliptically, reflecting the fact that the elements of the random vector are uncorrelated. Furthermore, if you set the mean u to zero with

var u = ak.vector([0,0]);

then you will see that the real parts of the CF are never negative and the imaginary parts of the CF are always equal to zero, which shouldn't come as too much of a surprise since the CF is then just the exponential of a negative real number and so must be a real number between zero and one.

The Probability Density Function

Now it is trivially the case that the probability that a set of independent events \(A_i\) all happen is equal to the product of their each happening individually
\[ \Pr\left(A_0 \wedge A_1 \wedge \dots \wedge A_{n-1}\right) = \Pr\left(A_0\right) \times \Pr\left(A_1\right) \times \dots \times \Pr\left(A_{n-1}\right) \]
where \(\wedge\) means and.
The CDF of a multivariate random variable equals the probability that every element of the latter is less than or equal to the associated element of the former's argument. For our vector of independent standard normally distributed elements \(\mathbf{Z}\) this equates to the product of integrals
\[ \Phi_{\mathbf{0},\mathbf{I}}(\mathbf{z}) = \prod_i \Phi_{0,1}\left(z_i\right) = \prod_i \int_{-\infty}^{z_i} \frac{1}{\sqrt{2\pi}} e^{-\frac12 z^2} \mathrm{d}z \]
where \(\mathbf{0}\) is the zero vector, \(\mathbf{I}\) is the identity matrix that leaves vectors unchanged when they are multiplied by it, and \(\Phi_{0,1}\) is the standard univariate normal CDF.
We can recover the PDF of \(\mathbf{Z}\) by differentiating the CDF with respect to each of the elements of its argument to yield
\[ \phi_{\mathbf{0},\mathbf{I}}(\mathbf{z}) = \prod_i \frac{1}{\sqrt{2\pi}} e^{-\frac12 z_i^2} = \frac{1}{(2\pi)^{\frac12 n}} e^{-\frac12 \sum_i z_i^2} = \frac{1}{(2\pi)^{\frac12 n}} e^{-\frac12 \mathbf{z} \times \mathbf{z}} \]
where \(n\) is the number of dimensions.

The correlations between the elements of \(\mathbf{Z}^\prime\) rather defeats this argument but fortunately, with a little calculus, we can demonstrate that its PDF is given by
\[ \phi_{\mathbf{\mu},\mathbf{\Sigma}}(\mathbf{z}) = \frac{1}{(2\pi)^{\frac12 n} |\mathbf{\Sigma}|^\frac12} e^{-\frac12 \left(\mathbf{z}-\mathbf{\mu}\right) \times \mathbf{\Sigma}^{-1} \times \left(\mathbf{z}-\mathbf{\mu}\right)} \]
where \(|\mathbf{\Sigma}|\) is the determinant of \(\mathbf{\Sigma}\), as shown in derivation 6.

Derivation 6: The Multivariate Normal PDF
Firstly, we'll define our correlated multivariate normal random variable with
\[ \mathbf{Z}^\prime = f(\mathbf{Z}) = \mathbf{L} \times \mathbf{Z} + \mathbf{\mu} \]
which also means that
\[ \mathbf{Z} = f^{-1}\left(\mathbf{Z}^\prime\right) = \mathbf{L}^{-1} \times \left(\mathbf{Z}^\prime - \mathbf{\mu}\right) \]
If we define \(\mathbf{V}^\prime\) as the set of points all of whose elements are less than or equal to those of a vector \(\mathbf{z}^\prime\) then we can express the CDF of \(\mathbf{Z}^\prime\) by integrating over the coordinate system of \(\mathbf{Z}\) with
\[ \Phi_{\mathbf{\mu},\mathbf{\Sigma}}\left(\mathbf{z}^\prime\right) = \int_{f^{-1}\left(\mathbf{V}^\prime\right)} \phi_{\mathbf{0},\mathbf{I}}(\mathbf{z}) \, \mathrm{d}\mathbf{z} \]
where we're using \(\mathrm{d}\mathbf{z}\) to indicate that the integral is with respect to every element of \(\mathbf{z}\).
We can solve this using integration by substitution with
\[ \Phi_{\mathbf{\mu},\mathbf{\Sigma}}\left(\mathbf{z}^\prime\right) = \int_{\mathbf{V}^\prime} \phi_{\mathbf{0},\mathbf{I}}(f^{-1}\left(\mathbf{z}^\prime\right)) \left|\frac{\partial\mathbf{z}}{\partial\mathbf{z}^\prime}\right| \mathrm{d}\mathbf{z}^\prime \]
where \(\left|\frac{\partial\mathbf{z}}{\partial\mathbf{z}^\prime}\right|\) is the determinant of the Jacobian matrix of the partial derivatives of the elements of \(\mathbf{z}\) with respect to the elements of \(\mathbf{z}^\prime\) such that
\[ \left(\frac{\partial\mathbf{z}}{\partial\mathbf{z}^\prime}\right)_{i,j} = \frac{\partial z_i}{\partial z^\prime_j} \]
which in this case is simply \(\mathbf{L}^{-1}\) and so
\[ \Phi_{\mathbf{\mu},\mathbf{\Sigma}}\left(\mathbf{z}^\prime\right) = \int_{\mathbf{V}^\prime} \phi_{\mathbf{0},\mathbf{I}}\left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right) \left|\mathbf{L}^{-1}\right| \mathrm{d}\mathbf{z}^\prime \]
Differentiating with respect to all of the elements of \(\mathbf{z}^\prime\) yields the PDF
\[ \begin{align*} \phi_{\mathbf{\mu},\mathbf{\Sigma}}\left(\mathbf{z}^\prime\right) &= \phi_{\mathbf{0},\mathbf{I}}\left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right) \left|\mathbf{L}^{-1}\right|\\ &= \frac{1}{|\mathbf{L}|} \times \phi_{\mathbf{0},\mathbf{I}}\left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right)\\ &= \frac{1}{|\mathbf{\Sigma}|^\frac12} \times \frac{1}{(2\pi)^{\frac12 n}} e^{-\frac12 \left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right) \times \left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right)}\\ &= \frac{1}{(2\pi)^{\frac12 n} |\mathbf{\Sigma}|^\frac12} e^{-\frac12 \left(\left(\mathbf{z}^\prime - \mathbf{\mu}\right) \times \mathbf{L}^{-1^\mathrm{T}}\right) \times \left(\mathbf{L}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)\right)}\\ &= \frac{1}{(2\pi)^{\frac12 n} |\mathbf{\Sigma}|^\frac12} e^{-\frac12 \left(\mathbf{z}^\prime - \mathbf{\mu}\right) \times \left(\mathbf{L}^{-1^\mathrm{T}} \times \mathbf{L}^{-1}\right) \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)}\\ &= \frac{1}{(2\pi)^{\frac12 n} |\mathbf{\Sigma}|^\frac12} e^{-\frac12 \left(\mathbf{z}^\prime - \mathbf{\mu}\right) \times \left(\mathbf{L} \times \mathbf{L}^\mathrm{T}\right)^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)}\\ &= \frac{1}{(2\pi)^{\frac12 n} |\mathbf{\Sigma}|^\frac12} e^{-\frac12 \left(\mathbf{z}^\prime - \mathbf{\mu}\right) \times \mathbf{\Sigma}^{-1} \times \left(\mathbf{z}^\prime - \mathbf{\mu}\right)} \end{align*} \]

Program 3 illustrates the shape of the bivariate normal PDF with a contour plot using our ak.choleskyDecomposition type to calculate the determinant and the inverse of the covariance matrix.

Program 3: The Bivariate Normal PDF

Somewhat reassuringly, this has the same sort of elliptical shape that we saw in the scatter plot of the correlated normal random variables in program 1, with greater values for the density near the origin decaying away to zero as we get further out.

The Cumulative Distribution Functions

Whilst the uncorrelated multivariate normal CDF is simple enough to calculate with the product of the CDFs of the univariate random variables from which it is constructed, the correlated multivariate normal CDF is an altogether more difficult beast to tame. Unlike the univariate normal CDF we don't have a nice formula to numerically approximate it and so must fall back on general purpose numerical integration algorithms. Unfortunately this approach will be very slow unless we think carefully about how we go about it and so we shall wait until after we've implemented the functions that we've defined so far before we take a look at it and its complement, the latter being the probability that every element of a multivariate normal random variable is greater than the associated elements of a given vector.

References

[1] Into The nth Dimension, www.thusspakeak.com, 2016.

[2] Define Normal, www.thusspakeak.com, 2014.

[3] Are You Definitely Positive, Mr Cholesky?, www.thusspakeak.com, 2015.

Leave a comment