Last year^{[1]} we took a look at multivariate uniformly distributed random variables, which generalise uniform random variables to multiple dimensions with random vectors whose elements are independently uniformly distributed. We have now seen^{[2][3][4]} how we can similarly generalise normally distributed random variables with the added property that the normally distributed elements of their vectors may be dependent upon each other; specifically that they may be correlated.
As it turns out, we can generalise this dependence to arbitrary sets of random variables with a fairly simple observation.
For example, given a correlated standard bivariate normal random variable \(N(\mathbf{\mu},\mathbf{\Sigma})\) with the mean vector and covariance matrix
Program 1 demonstrates this by first plotting in dark grey independent vectors constructed from pairs of observations of a standard uniform random variable, provided by
Whilst the observations of the individual elements both look pretty much the same as those of the standard uniform variable, the same certainly cannot be said of their vectors which are clearly concentrated in a lenticular region running from the topleft to the bottomright, as opposed to a square.
For example, given the observation \(\mathbf{U}\) defined above and a pair of inverse CDFs \(F_0^{1}\) and \(F_1^{1}\), we can construct an observation of a dependent bivariate random variable with
Once again the observations of the individual elements of the independent and dependent random variables look pretty consistent, but those of the vectors themselves definitely don't.
In this case we have the Gaussian copula, given by
Since the shape of the Gaussian copula only depends upon the correlation matrix of the multivariate normal CDF we can save ourselves a lot of work by deferring initialisation to our
Note that we're still treating the copula as being parametrised by the correlation matrix and so we calculate it from the CDF's covariance matrix with
Here we're caching the standard deviations in which we can safely overwrite with one as we go since we're iterating over those columns of the covariance matrix whose indices are greater than those of the rows.
You can change the values for the means,
We can use our
Note that if the denominator is equal to zero then we return zero for the density since this will only happen if one or more of the elements of its argument are equal, or at least extremely close, to zero or one, in which case the density should equal zero.
Program 5 plots the Gaussian copula density on a logarithmic scale, clearly revealing the lenticular shape that we saw in program 1.
Program 6 demonstrates that this behaves in the same way as the handrolled generator from program 1.
Note that all of these functions can be found in GaussianCopula.js.
Let us begin with the multivariate CDF defined by a copula and an array of marginal CDFs, implemented with the
The most notable thing about this function is that we're at least trying to provide some measure of error handling by checking that the types of the arguments are what we're expecting.
Program 7 uses it to show what the CDF of the dependent multivariate random variable that we defined in program 2 looks like.
As we should expect from its scatter plot, the CDF rapidly increases away from the left and the bottom of the contour diagram indicating that the probability of observations is greatest there.
Next up is
Program 8 uses this to plot the PDF of the same random variable which is also reassuringly consistent with its scatter plot.
Finally we have
Program 9 demonstrates how we can use this function to recreate that random variable by redrawing its scatter plot.
To conclude, these last three functions can be found in CopulaDistribution.js.
[2] Every Which Way Is Normal, www.thusspakeak.com, 2017.
[3] Multiple Multiply Normal Functions, www.thusspakeak.com, 2017.
[4] The Cumulative Distribution Unction, www.thusspakeak.com, 2017.
As it turns out, we can generalise this dependence to arbitrary sets of random variables with a fairly simple observation.
Dependent Uniforms
A basic property of the cumulative distribution function, or CDF, of a continuous random variable is that it transforms observations of it into observations of a standard uniform random variable, having equal probabilities of its observations falling within equally sized subsets of the interval from zero to one. So if we take a correlated standard multivariate normal random variable, whose elements have means and standard derivations of zero and one respectively, and calculate the standard normal CDF of each element of an observation of it, then we shall have an observation of a dependent multivariate uniform random variable.For example, given a correlated standard bivariate normal random variable \(N(\mathbf{\mu},\mathbf{\Sigma})\) with the mean vector and covariance matrix
\[
\begin{align*}
\mathbf{\mu} &=
\begin{pmatrix}
0\\
0
\end{pmatrix}\\
\mathbf{\Sigma} &=
\begin{pmatrix}
1 & \rho\\
\rho & 1
\end{pmatrix}
\end{align*}
\]
respectively, where \(\rho\) is the correlation, and if we define
\[
\begin{align*}
\mathbf{Z} &\sim N(\mathbf{\mu}, \mathbf{\Sigma})\\
\mathbf{U} &=
\begin{pmatrix}
\Phi\left(Z_0\right)\\
\Phi\left(Z_1\right)
\end{pmatrix}
\end{align*}
\]
where \(\sim\) means drawn from and \(\Phi\) is the standard normal CDF, then \(\mathbf{U}\) is an observation of a dependent standard bivariate uniform random variable.Program 1 demonstrates this by first plotting in dark grey independent vectors constructed from pairs of observations of a standard uniform random variable, provided by
Math.random
, whilst plotting every tenth of their first and second elements below and to the left respectively, and then doing the same in black for the first and second elements of the observations of the dependent variable.
Program 1: A Dependent Uniform Bivariate



Whilst the observations of the individual elements both look pretty much the same as those of the standard uniform variable, the same certainly cannot be said of their vectors which are clearly concentrated in a lenticular region running from the topleft to the bottomright, as opposed to a square.
Dependence Between Any Random Variables
It is also the case that if we know the inverse CDF of a particular random variable then evaluating it with an observation of a standard uniform random variable will result in an observation of that random variable. We can therefore create random vectors with dependent elements that follow whatever distributions we want, referred to as the marginal distributions, or at least we can if we have their inverse CDFs.For example, given the observation \(\mathbf{U}\) defined above and a pair of inverse CDFs \(F_0^{1}\) and \(F_1^{1}\), we can construct an observation of a dependent bivariate random variable with
\[
\mathbf{X} =
\begin{pmatrix}
F_0^{1}\left(U_0\right)\\
F_1^{1}\left(U_1\right)
\end{pmatrix}
\]
Program 2 illustrates this by adding log normal and exponential marginal distributions to program 1.
Program 2: A Dependent Mixed Bivariate



Once again the observations of the individual elements of the independent and dependent random variables look pretty consistent, but those of the vectors themselves definitely don't.
Gaussian Copulas
We can use the CDFs of dependent standard multivariate uniform distributions to completely specify the dependency between the marginal distributions of any multivariate distribution, which trivially follows from the fact that the values of their elements are exactly specified by the values of their marginal CDFs, and in this context we call them copulas.In this case we have the Gaussian copula, given by
\[
\begin{align*}
C_\mathbf{P}(\mathbf{u}) &= \Phi_\mathbf{P}(\mathbf{\Phi}^{1}(\mathbf{u}))\\
(\mathbf{\Phi}^{1}(\mathbf{u}))_i &= \Phi^{1}\left(u_i\right)
\end{align*}
\]
where \(\Phi_\mathbf{P}\) is a correlated standard multivariate normal CDF, having means of zero, standard deviations of one and a correlation matrix of \(\mathbf{P}\), and \(\Phi^{1}\) is the standard normal inverse CDF. Program 3 illustrates its shape with a contour plot of a bivariate Gaussian copula.
Program 3: A Gaussian Copula



Since the shape of the Gaussian copula only depends upon the correlation matrix of the multivariate normal CDF we can save ourselves a lot of work by deferring initialisation to our
ak.multiNormalCDF
function. By doing so, however, we allow the user to specify nonzero means and nonunit standard deviations and so must take care to deal with that possibility. Fortunately we can easily correct for this by ensuring that we map the arguments of the copula onto normally distributed values having the appropriate means and standard deviations with
\[
z_i = \mu_i + \sqrt{\Sigma_{i,i}} \times \Phi^{1}\left(u_i\right)
\]
as is done by the function returned by ak.gaussianCopula
given in listing 1.Listing 1: ak.gaussianCopula
ak.gaussianCopula = function() { var cdf = ak.multiNormalCDF.apply(ak, arguments); var inv = ak.normalInvCDF(); var mu = cdf.mu().toArray(); var sigma = cdf.sigma().toArray(); var rho = toRho(sigma); var n = mu.length; var i, f; for(i=0;i<n;++i) sigma[i] = Math.sqrt(sigma[i][i]); f = function(u) { if(ak.type(u)!==ak.VECTOR_T  u.dims()!==n) { throw new Error('invalid argument in ak.gaussianCopula'); } u = u.toArray(); for(i=0;i<n;++i) u[i] = mu[i] + sigma[i]*inv(u[i]); return cdf(ak.vector(u)); }; f.rho = function() {return ak.matrix(rho);}; return Object.freeze(f); };
Note that we're still treating the copula as being parametrised by the correlation matrix and so we calculate it from the CDF's covariance matrix with
\[
P_{ij} = \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii} \times \Sigma_{jj}}}
\]
as done by the toRho
helper function given in listing 2.Listing 2: The toRho Helper Function
function toRho(sigma) { var n = sigma.length; var r = new Array(n); var i, j; for(i=0;i<n;++i) { r[i] = new Array(n); r[i][i] = Math.sqrt(sigma[i][i]); } for(i=0;i<n;++i) { for(j=i+1;j<n;++j) r[i][j] = r[j][i] = sigma[i][j]/(r[i][i] * r[j][j]); r[i][i] = 1; } return r; }
Here we're caching the standard deviations in
r[i][i]
You can change the values for the means,
u0
and u1
, and the standard deviations, s0
and s1
, in program 4 to confirm for yourself that they have no effect on the shape of the Gaussian copula.
Program 4: ak.gaussianCopula



The Copula Density
We're typically much more interested in probability density functions, or PDFs, than CDFs since we can use them to calculate the expectations of functions of random variables. If we represent the dependence between the elements of the argument of an \(n\) dimensional multivariate CDF \(F\) having marginal CDFs \(F_i\) with a copula \(C\) so that
\[
\begin{align*}
u_i &= F_i\left(x_i\right)\\
F(\mathbf{x}) &= C(\mathbf{u})
\end{align*}
\]
then we can represent its associated multivariate probability density function \(f\) with
\[
\begin{align*}
f(\mathbf{x})
&= \frac{\mathrm{d}^n}{\mathrm{d}x_0 \dots \mathrm{d}x_{n1}} F(\mathbf{x})
= \frac{\mathrm{d}^n}{\mathrm{d}x_0 \dots \mathrm{d}x_{n1}} C(\mathbf{u})
= \frac{\mathrm{d}u_0}{\mathrm{d}x_0} \times \frac{\mathrm{d}u_{n1}}{\mathrm{d}x_{n1}} \times \frac{\mathrm{d}^n}{\mathrm{d}u_0 \dots \mathrm{d}u_{n1}} C(\mathbf{u})\\
&= f_0\left(x_0\right) \times \dots \times f_{n1}\left(x_{n1}\right) \times \frac{\mathrm{d}^n}{\mathrm{d}u_0 \dots \times \mathrm{d}u_{n1}} C(\mathbf{u})
= f_0\left(x_0\right) \times \dots \times f_{n1}\left(x_{n1}\right) \times c(\mathbf{u})
\end{align*}
\]
where the \(f_i\) are the marginal PDFs and which follows from the chain rule. The function \(c\) is known as the copula density and for the Gaussian copula is given by
\[
\begin{align*}
z_i &= \Phi^{1}\left(u_i\right)\\
c_P(\mathbf{u}) &= \frac{\phi_\mathbf{P}(\mathbf{z})}{\phi\left(z_0\right) \times \dots \times \phi\left(z_{n1}\right)}
\end{align*}
\]
where \(\phi_\mathbf{P}\) is the correlated standard multivariate normal PDF with a correlation matrix of \(\mathbf{P}\), \(\phi\) is the standard univariate normal PDF and \(\Phi^{1}\) is the inverse of its CDF, which once again follows from the chain rule.We can use our
ak.multiNormalPDF
to manage the initialisation of out implementation of the Gaussian copula density, provided that we take care to correct for nonzero means and nonunit standard deviations again. In particular, from the definition of the univariate normal PDF
\[
\phi_{\mu,\sigma}(z) = \frac{1}{\sqrt{2\pi}\sigma}e^{\frac12 \left(\tfrac{z\mu}{\sigma}\right)^2}
\]
we find that
\[
\phi_{\mu,\sigma}(\mu + \sigma \times z) = \frac{1}{\sqrt{2\pi}\sigma}e^{\frac12 z^2} = \tfrac{1}{\sigma} \phi(z)
\]
The ak.gaussianCopulaDensity
function given in listing 3 uses this to calculate the values of the marginal PDFs with a single standard ak.normalDistribution
function.Listing 3: ak.gaussianCopulaDensity
ak.gaussianCopulaDensity = function() { var mpdf = ak.multiNormalPDF.apply(ak, arguments); var pdf = ak.normalPDF(); var inv = ak.normalInvCDF(); var mu = mpdf.mu().toArray(); var sigma = mpdf.sigma().toArray(); var rho = toRho(sigma); var sn = 1; var n = mu.length; var i, f; for(i=0;i<n;++i) { sigma[i] = Math.sqrt(sigma[i][i]); sn *= sigma[i]; } f = function(u) { var den = 1; var z; if(ak.type(u)!==ak.VECTOR_T  u.dims()!==n) { throw new Error('invalid argument in ak.gaussianCopulaDensity'); } u = u.toArray(); for(i=0;i<n;++i) { z = inv(u[i]); den *= pdf(z); u[i] = mu[i] + sigma[i]*z; } return den===0 ? 0 : sn*mpdf(ak.vector(u))/den; }; f.rho = function() {return ak.matrix(rho);}; return Object.freeze(f); };
Note that if the denominator is equal to zero then we return zero for the density since this will only happen if one or more of the elements of its argument are equal, or at least extremely close, to zero or one, in which case the density should equal zero.
Program 5 plots the Gaussian copula density on a logarithmic scale, clearly revealing the lenticular shape that we saw in program 1.
Program 5: ak.gaussianCopulaDensity



ak.gaussianCopulaRnd
Whilst we're about it we might as well add an implementation of its random variable, which is done in listing 4 by exploiting the relationship between normal and standard normal random variables
\[
\begin{align*}
Z &\sim N(\mu,\sigma)\\
\frac{Z\mu}{\sigma} &\sim N(0,1)
\end{align*}
\]
Listing 4: ak.gaussianCopulaRnd
ak.gaussianCopulaRnd = function() { var rnd = ak.multiNormalRnd.apply(ak, arguments); var cdf = ak.normalCDF(); var mu = rnd.mu().toArray(); var sigma = rnd.sigma().toArray(); var rho = toRho(sigma); var n = mu.length; var i, f; for(i=0;i<n;++i) sigma[i] = 1/Math.sqrt(sigma[i][i]); f = function() { var z = rnd().toArray(); for(i=0;i<n;++i) z[i] = cdf((z[i]mu[i])*sigma[i]); return ak.vector(z); }; f.rho = function() {return ak.matrix(rho);}; f.rnd = rnd.rnd; return Object.freeze(f); };
Program 6 demonstrates that this behaves in the same way as the handrolled generator from program 1.
Program 6: ak.gaussianCopulaRnd



Note that all of these functions can be found in GaussianCopula.js.
Copula Distribution Functions
Given that an important use of copulas is to construct dependent multivariate distribution functions, specifically their CDFs, PDFs and random variables, it's convenient to implement functions to do so automatically. We have already seen how to do it and so it's really just a simple matter of programming at this point.Let us begin with the multivariate CDF defined by a copula and an array of marginal CDFs, implemented with the
ak.copulaCDF
function given in listing 5.Listing 5: ak.copulaCDF
ak.copulaCDF = function(copula, cdfs) { var f, n, i; if(ak.nativeType(copula)!==ak.FUNCTION_T) { throw new Error('invalid copula in ak.copulaCDF'); } if(ak.nativeType(cdfs)!==ak.ARRAY_T) { throw new Error('invalid marginal CDFs in ak.copulaCDF'); } n = cdfs.length; for(i=0;i<n;++i) { if(ak.nativeType(cdfs[i])!==ak.FUNCTION_T) { throw new Error('invalid marginal CDF in ak.copulaCDF'); } } f = function(x) { if(ak.type(x)!==ak.VECTOR_T  x.dims()!==n) { throw new Error('invalid argument in ak.copulaCDF'); } x = x.toArray(); for(i=0;i<n;++i) x[i] = cdfs[i](x[i]); return copula(ak.vector(x)); }; f.copula = function(){return copula;}; f.marginalCDFs = function(){return cdfs.slice(0);}; return Object.freeze(f); };
The most notable thing about this function is that we're at least trying to provide some measure of error handling by checking that the types of the arguments are what we're expecting.
Program 7 uses it to show what the CDF of the dependent multivariate random variable that we defined in program 2 looks like.
Program 7: ak.copulaCDF



As we should expect from its scatter plot, the CDF rapidly increases away from the left and the bottom of the contour diagram indicating that the probability of observations is greatest there.
Next up is
ak.copulaPDF
, defined with a copula density and arrays of marginal CDFs and PDFs, as shown by listing 6.Listing 6: ak.copulaPDF
ak.copulaPDF = function(density, cdfs, pdfs) { var f, n, i; if(ak.nativeType(density)!==ak.FUNCTION_T) { throw new Error('invalid copula in ak.copulaPDF'); } if(ak.nativeType(cdfs)!==ak.ARRAY_T) { throw new Error('invalid marginal CDFs in ak.copulaPDF'); } if(ak.nativeType(pdfs)!==ak.ARRAY_T  pdfs.length!==cdfs.length) { throw new Error('invalid marginal PDFs in ak.copulaPDF'); } n = cdfs.length; for(i=0;i<n;++i) { if(ak.nativeType(cdfs[i])!==ak.FUNCTION_T) { throw new Error('invalid marginal CDF in ak.copulaPDF'); } if(ak.nativeType(pdfs[i])!==ak.FUNCTION_T) { throw new Error('invalid marginal PDF in ak.copulaPDF'); } } f = function(x) { var px = 1; if(ak.type(x)!==ak.VECTOR_T  x.dims()!==n) { throw new Error('invalid argument in ak.copulaPDF'); } x = x.toArray(); for(i=0;i<n;++i) { px *= pdfs[i](x[i]); x[i] = cdfs[i](x[i]); } return px===0 ? 0 : px * density(ak.vector(x)); }; f.copulaDensity = function(){return density;}; f.marginalCDFs = function(){return cdfs.slice(0);}; f.marginalPDFs = function(){return pdfs.slice(0);}; return Object.freeze(f); };
Program 8 uses this to plot the PDF of the same random variable which is also reassuringly consistent with its scatter plot.
Program 8: ak.copulaPDF



Finally we have
ak.copulaRnd
which maps a copula random vector to a general multivariate random vector by applying the inverse CDFs of the latter's marginal distributions to the elements of the results of the former, as shown in listing 7.Listing 7: ak.copulaRnd
ak.copulaRnd = function(copulaRnd, invs) { var f, n, i; if(ak.nativeType(copulaRnd)!==ak.FUNCTION_T) { throw new Error('invalid copula random variable in ak.copulaRnd'); } if(ak.nativeType(invs)!==ak.ARRAY_T) { throw new Error('invalid marginal inverse CDFs in ak.copulaRnd'); } n = invs.length; for(i=0;i<n;++i) { if(ak.nativeType(invs[i])!==ak.FUNCTION_T) { throw new Error('invalid marginal inverse CDF in ak.copulaRnd'); } } f = function() { var x = copulaRnd(); if(ak.type(x)!==ak.VECTOR_T  x.dims()!==n) { throw new Error('invalid copula random variable in ak.copulaRnd'); } x = x.toArray(); for(i=0;i<n;++i) x[i] = invs[i](x[i]); return ak.vector(x); }; f.copulaRnd = function(){return copulaRnd;}; f.marginalInvCDFs = function(){return invs.slice(0);}; return Object.freeze(f); };
Program 9 demonstrates how we can use this function to recreate that random variable by redrawing its scatter plot.
Program 9: ak.copulaRnd



To conclude, these last three functions can be found in CopulaDistribution.js.
References
[1] Into The n^{th} Dimension, www.thusspakeak.com, 2016.[2] Every Which Way Is Normal, www.thusspakeak.com, 2017.
[3] Multiple Multiply Normal Functions, www.thusspakeak.com, 2017.
[4] The Cumulative Distribution Unction, www.thusspakeak.com, 2017.
Leave a comment