The Product Distribution Business

| 0 Comments

Several posts ago[1] we covered the normal distribution; the statistical distribution that governs the behaviour of sums of random variables by the central limit theorem. We noted that by the properties of logarithms we could transform a product into a sum with
\[ \ln \left(x_0 \times x_1 \times x_2 \times \dots \right) = \ln x_0 + \ln x_1 + \ln x_2 + \dots \]
and consequently the distribution that governs the behaviour of products of observations of a random variables is the normal distribution of their logarithms.
Since many sources of randomness accumulate from the multiplication, rather than the addition, of random components, this is an incredibly useful observation. So much so that it is an important distribution in its own right; the log normal distribution, \(LN(\mu,\sigma)\).

ak.logNormalCDF

If a random variable \(X\) is distributed as \(LN(\mu,\sigma)\) then we have
\[ \Pr\left(X \leqslant c\right) = \Pr\left(\ln X \leqslant \ln c\right) = \Phi_{\mu,\sigma}(\ln c) \]
where \(\Phi_{\mu,\sigma}\) is the CDF of a normal distribution with \(\mu\) and \(\sigma\) equal to the mean and standard deviation of the logarithm of \(X\) respectively. This is, by definition, the CDF of the log normal distribution
\[ P_{\mu,\sigma}\left(x\right) = \Phi_{\mu,\sigma}(\ln x) \]
The simplicity of this definition is reflected in the implementation of ak.logNormalCDF using ak.normalCDF, as illustrated by listing 1.

Listing 1: ak.logNormalCDF
ak.logNormalCDF = function(mu, sigma) {
 var cdf = ak.normalCDF(mu, sigma);
 var f = function(x){return x<=0 ? 0 : cdf(Math.log(x));};
 f.mu = cdf.mu;
 f.sigma = cdf.sigma;
 return Object.freeze(f);
};

The only slightly, and I mean slightly, non-trivial aspect of f is that we must take care to return zero for arguments less than zero rather than NaN. We must do this because asking what the probability is that an observation of a log normally distributed random variable will fall below zero is a perfectly well formed question to which the answer is zero.
Note that we add property access methods for \(\mu\) and \(\sigma\) to f by simply copying the mu and sigma methods of cdf.

Derivation 1: The Mean And Variance Of ln U(0, 1)
Firstly, we have
\[ \mu = \mathrm{E}[\ln U] = \int_0^1 \ln x \; \mathrm{d}x = \int_0^1 \ln x \times \frac{\mathrm{d}x}{\mathrm{d}x} \; \mathrm{d}x \]
where \(\mathrm{E}[\ln U]\) is the expected value of \(\ln U\). Applying integration by parts, we have
\[ \begin{align*} \mu &= \left[\ln x \times x\right]_0^1 - \int_0^1 \frac{1}{x} \times x \; \mathrm{d}x = 0 - \int_0^1 1 \; \mathrm{d}x\\ &= - [x]_0^1 = -1 \end{align*} \]
Next
\[ \begin{align*} \sigma^2 &= \mathrm{E}\left[(\ln U - \mu)^2\right] = \mathrm{E}\left[(\ln U)^2 - 2 \mu \ln U + \mu^2\right]\\ &= \mathrm{E}\left[(\ln U)^2\right] - 2\mu^2+\mu^2 = \mathrm{E}\left[(\ln U)^2\right] - 1 \end{align*} \]
Finally, using integration by parts again, we have
\[ \begin{align*} \mathrm{E}\left[(\ln U)^2\right] &= \int_0^1 \left(\ln x\right)^2 \times \frac{\mathrm{d}x}{\mathrm{d}x} \; \mathrm{d}x\\ &= \left[(\ln x)^2 \times x\right]_0^1 - \int_0^1 \frac{2 \ln x}{x} \times x \; \mathrm{d}x\\ &= 0 - 2 \int_0^1 \ln x \; \mathrm{d}x = -2\mu = 2 \end{align*} \]
and so \(\sigma^2=1\).
We can use ak.logNormalCDF to demonstrate that the log normal distribution is indeed the distribution of products by comparing a histogram of products of values generated with Math.random with
\[ \Pr\left(x_i \leqslant x < x_{i+1}\right) = P_{\mu,\sigma}\left(x_{i+1}\right) - P_{\mu,\sigma}\left(x_i\right) \]
To do so, we first need the mean \(\mu\) and variance \(\sigma^2\) of the logarithm of the standard uniform distribution that Math.random implements, shown by derivation 1 to be
\[ \begin{align*} \mu &= -1\\ \sigma^2 & = 1 \end{align*} \]
Now the mean and variance of the sum of \(n\) observations of a random variable are equal to \(n\) times the mean and variance of that random variable respectively, so the mean and variance of the logarithm of the product of \(n\) observations of the standard uniform are
\[ \begin{align*} \mu_n &= -n\\ \sigma_n^2 & = n \end{align*} \]
and consequently the CDF of the product is \(P_{-n,\sqrt{n}}(x)\), as used in program 1 in which the line graph is of the values calculated from the CDF and the bar chart is the aforementioned histogram.

Program 1: The Distribution Of Products

A pretty good match, I'd say!

ak.logNormalPDF

Derivation 2: The Mean Of LN(μ, σ)
The mean of \(LN(\mu,\sigma)\) is given by
\[ \begin{align*} \int_0^\infty x p_{\mu,\sigma}(x) \mathrm{d}x &= \int_0^\infty e^{\ln x} p_{\mu,\sigma}(x) \mathrm{d}x\\ &= \int_0^\infty e^{\ln x} \times \frac{1}{\sqrt{2\pi}\,\sigma x} e^{-\tfrac{(\ln x - \mu)^2}{2\sigma^2}}\\ &= \int_0^\infty \frac{1}{\sqrt{2\pi}\,\sigma x} e^{-\tfrac{(\ln x - \mu)^2}{2\sigma^2} + \ln x} \mathrm{d}x \end{align*} \]
Rearranging the exponentiated term, we have
\[ \begin{align*} -\tfrac{(\ln x - \mu)^2}{2\sigma^2} + \ln x &= -\tfrac{\left((\ln x)^2 - 2 \mu \ln x - 2 \sigma^2 \ln x + \mu^2\right)}{2\sigma^2}\\ &= -\tfrac{\left(\ln x - \left(\mu + \sigma^2\right)\right)^2 - 2 \mu \sigma^2 - \sigma^4}{2\sigma^2}\\ &= -\tfrac{\left(\ln x - \left(\mu + \sigma^2\right)\right)^2}{2\sigma^2} + \left(\mu + \tfrac{1}{2}\sigma^2\right) \end{align*} \]
and hence
\[ \begin{align*} \int_0^\infty x p_{\mu,\sigma}(x) \mathrm{d}x &= \int_0^\infty \frac{1}{\sqrt{2\pi}\,\sigma x} e^{-\tfrac{\left(\ln x - \left(\mu + \sigma^2\right)\right)^2}{2\sigma^2}} e^{\mu + \tfrac{1}{2}\sigma^2} \mathrm{d}x\\ &= e^{\mu + \tfrac{1}{2}\sigma^2}\int_0^\infty \frac{1}{\sqrt{2\pi}\,\sigma x} e^{-\tfrac{\left(\ln x - \left(\mu + \sigma^2\right)\right)^2}{2\sigma^2}}\mathrm{d}x\\ &= e^{\mu + \tfrac{1}{2}\sigma^2} \end{align*} \]
since the integrand is also a log normal PDF and so must integrate to one.
To recover the PDF of the log normal distribution we simply take the derivative of its CDF using the chain rule
\[ \begin{align*} p_{\mu,\sigma}(x) &= \frac{\mathrm{d}}{\mathrm{d}x} P_{\mu,\sigma}\left(x\right)\\ &= \frac{\mathrm{d}}{\mathrm{d}x}\Phi_{\mu,\sigma}(\ln x)\\ &= \phi_{\mu,\sigma}(\ln x) \times \frac{\mathrm{d}}{\mathrm{d}x}\ln x\\ &= \phi_{\mu,\sigma}(\ln x) \times \frac{1}{x}\\ &= \frac{1}{\sqrt{2\pi}\,\sigma x} e^{-\tfrac{(\ln x - \mu)^2}{2\sigma^2}} \end{align*} \]
where \(\phi_{\mu,\sigma}\) is the derivative of \(\Phi_{\mu,\sigma}\) or, in other words, the PDF of the normal distribution.

Note that we can use this to prove that the mean of the log normal distribution is given by
\[ \mathrm{E}[X] = e^{\mu + \tfrac{1}{2}\sigma^2} \]
where \(\mathrm{E}\) is the expectation and \(X\) is a random variable distributed as \(LN(\mu,\sigma)\), as shown in derivation 2.

The implementation of the PDF is no less trivial than that of the CDF, as shown by ak.logNormalPDF in listing 2.

Listing 2: ak.logNormalPDF
ak.logNormalPDF = function(mu, sigma) {
 var pdf = ak.normalPDF(mu, sigma);
 var f = function(x){return x<=0 ? 0 : pdf(Math.log(x))/x;};
 f.mu = pdf.mu;
 f.sigma = pdf.sigma;
 return Object.freeze(f);
};

Program 2 uses this to show how the shape of the PDF changes with \(\sigma\) for \(\mu\) fixed at zero.

Program 2: The Log Normal PDF

Note how it goes from something rather like the normal PDF to something totally unlike it as \(\sigma\) increases.

ak.logNormalInvCDF And ak.logNormalRnd

To invert the log normal CDF, or in other words to find the value \(x\) for which the CDF returns a given value \(c\), we need simply to rearrange the terms in the definition of the CDF
\[ \begin{align*} c &= P_{\mu,\sigma}\left(x\right) = \Phi_{\mu,\sigma}(\ln x)\\ \ln x &= \Phi^{-1}_{\mu,\sigma}(c)\\ x &= \exp\left(\Phi^{-1}_{\mu,\sigma}(c)\right) \end{align*} \]
as implemented in listing 3.

Listing 3: ak.logNormalInvCDF
ak.logNormalInvCDF = function(mu, sigma) {
 var inv = ak.normalInvCDF(mu, sigma);
 var f = function(c){return Math.exp(inv(c));};
 f.mu = inv.mu;
 f.sigma = inv.sigma;
 return Object.freeze(f);
};

By exactly the same reasoning we can draw a random number from the log normal distribution by exponentiating a normally distributed random number, as illustrated by listing 4.

Listing 4: ak.logNormalRnd
ak.logNormalRnd = function(mu, sigma, rnd) {
 var n = ak.normalRnd(mu, sigma, rnd);
 var f = function(){return Math.exp(n());};
 f.mu = n.mu;
 f.sigma = n.sigma;
 return Object.freeze(f);
};

Program 3 compares a bar chart of the histogram of a sample of ak.logNormalRnd with a line graph of the expected results as calculated using ak.logNormalCDF.

Program 3: ak.logNormalRnd

ak.logNormalCF

Now, given how easy it was to implement the log normal PDF, CDF, inverse CDF and random number generator in terms of the corresponding normal distribution functions, it might come as something of a surprise that the log normal CF is extremely difficult to implement.
For starters, its definition
\[ \hat p_{\mu,\sigma}(t) = \mathrm{E}\left[e^{itX}\right] = \int_0^\infty e^{itx} p_{\mu,\sigma}(x) \, \mathrm{d}x = \int_0^\infty \frac{1}{\sqrt{2\pi}\,\sigma x} e^{itx-\tfrac{(\ln x - \mu)^2}{2\sigma^2}} \, \mathrm{d}x \]
doesn't simplify to a tidy formula.

If we try to attack the expectation with Taylor's theorem we get
Derivation 3: The Divergence Of The CF Series
Defining the \(n^{th}\) term of the series as \(s_n\)
\[ s_n = \frac{\left(it\right)^n}{n!} e^{n\mu + \tfrac{1}{2}n^2\sigma^2} \]
The ratio of \(s_{n+1}\) to \(s_n\) is
\[ \begin{align*} \frac{s_{n+1}}{s_n} &= \frac{\frac{\left(it\right)^{n+1}}{(n+1)!} e^{(n+1)\mu + \tfrac{1}{2}(n+1)^2\sigma^2}}{\frac{\left(it\right)^n}{n!} e^{n\mu + \tfrac{1}{2}n^2\sigma^2}}\\ &= \frac{it}{n+1} e^{(n+1)\mu + \tfrac{1}{2}(n+1)^2\sigma^2} \times e^{-n\mu - \tfrac{1}{2}n^2\sigma^2}\\ &= \frac{e^{n\sigma^2}}{n+1} it \, e^{\mu + \tfrac{1}{2}\sigma^2} \end{align*} \]
The first term grows without limit as \(n\) increases and so, by the ratio test, the series diverges.
\[ \begin{align*} \mathrm{E}\left[e^{itX}\right] &= \mathrm{E}\left[\sum_{n=0}^\infty \frac{\left(itX\right)^n}{n!}\right]\\ &= \sum_{n=0}^\infty\mathrm{E}\left[\frac{\left(itX\right)^n}{n!}\right]\\ &= \sum_{n=0}^\infty\frac{\left(it\right)^n}{n!}\mathrm{E}\left[X^n\right] \end{align*} \]
where \(\sum\) is the summation sign. Noting that \(x^n = e^{n \ln x}\) we can easily generalise the proof given in derivation 2 to show that
\[ \mathrm{E}[X^n] = e^{n\mu + \tfrac{1}{2}n^2\sigma^2} \]
and so
\[ \mathrm{E}\left[e^{itX}\right] = \sum_{n=0}^\infty\frac{\left(it\right)^n}{n!} e^{n\mu + \tfrac{1}{2}n^2\sigma^2} \]
Unfortunately \(e^{\tfrac{1}{2}n^2\sigma^2}\) grows very quickly and, as proven in derivation 3 and demonstrated by program 4, this series doesn't converge.

Program 4: The Divergence Of The CF Series

Note that if you set sigma small enough then the factorial term nf will overflow to infinity before the exponential term r does, giving a rather misleading appearance of convergence.
Figure 1: The Oscillating Exponential

Finally, if we instead use numerical integration to approximate the CF we can still run into trouble. Specifically, the exponential part of the integrand
\[ e^{itx-\tfrac{(\ln x - \mu)^2}{2\sigma^2}} = \left(\cos(tx)+i\,\sin(tx)\right) \, e^{-\tfrac{(\ln x - \mu)^2}{2\sigma^2}} \]
has the unfortunate property that the complex part of the exponent grows positive more quickly than the real part grows negative and, unless \(\sigma\) is small, it will oscillate dramatically before it decays to zero, as illustrated by figure 1 which plots the real part of the exponential term for a \(\sigma\) of two, \(\mu\) of zero, \(t\) of one and \(x\) from zero to five hundred!
This behaviour is almost identical to that of the troublesome oscillatory function that stumped ak.rombergIntegral in a previous post[2]; we need a huge number of points to accurately compute the integral of the function since, as we proceed from left to right, we repeatedly overshoot with a positive area and then correct with a similarly sized negative area. Even far out on the right, the corrections derived from the relatively small oscillations are still important for numerical accuracy.
Which is rather annoying, really.

Fortunately however, much as we found with numerical differentiation[3]-[7], if we come at the problem from a different direction we can do much, much better! But you shall have to wait until next time to find out how.

References

[1] Define Normal, www.thusspakeak.com, 2014.

[2] The Tip Of The Romberg, www.thusspakeak.com, 2015.

[3] You're Going To Have To Think! Why [Insert Algorithm Here] Won't Cure Your Calculus Blues, www.thusspakeak.com, 2013.

[4] You're Going To Have To Think! Why Finite Differences Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[5] You're Going To Have To Think! Why Polynomial Approximation Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[6] You're Going To Have To Think! Why Computer Algebra Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[7] You're Going To Have To Think! Why Automatic Differentiation Won't Cure Your Calculus Blues., www.thusspakeak.com, 2014.

Leave a comment