Monte Carlo Or Bust

| 0 Comments

We have taken a look at a few different ways to numerically approximate the integrals of functions now[1][2][3], all of which have involved exactly integrating simple approximations of those functions over numerous small intervals. Whilst this is an appealing strategy, as is so often the case with numerical computing it is not the only one.

Probabilities And Integrals

There's a close relationship between calculus and probability theory, with integrals involving the probability density functions, or PDFs, of continuous probability distributions yielding both the probabilities that their observed values will fall within particular intervals and the expected values of functions evaluated at those observed values.
Of particular utility when it comes to numerically approximating integrals are expectations for the uniform distribution, which take the form
\[ \begin{align*} &U \sim U(a,b)\\ &\mathrm{E}[f(U)] = \int_a^b u_{a,b}(x) \times f(x) \; \mathrm{d}x = \int_a^b \frac{1}{b-a} \times f(x) \; \mathrm{d}x = \frac{1}{b-a} \times \int_a^b f(x) \; \mathrm{d}x \end{align*} \]
and so
\[ \int_a^b f(x) \; \mathrm{d}x = (b-a) \times \mathrm{E}[f(U)] = \mathrm{E}[(b-a) \times f(U)] \]
Now on the face of it we've just found a different way of expressing an integral, but we can rewrite the expectation as the limit of the mean of a sample as its size increases with
\[ \mathrm{E}[(b-a) \times f(U)] = \lim_{n\to\infty} \frac1n \times \sum_{i=1}^n (b-a) \times f\left(u_i\right) \]
where \(\sum\) is the summation sign and each \(u_i\) is an observation of a \(U(a,b)\) distributed random variable. We might therefore expect that, for sufficiently large \(n\), we can approximate the integral with
\[ \int_a^b f(x) \; \mathrm{d}x \approx \frac1n \times \sum_{i=1}^n (b-a) \times f\left(u_i\right) \]
This is known as the Monte Carlo integration algorithm since, like the famed casinos, it depends upon randomness for its operation.

Now, if \(\sigma\) is the standard deviation of those \(n\) observations then, by the central limit theorem, that approximation will be very nearly normally distributed with a standard deviation of \(\sigma\) divided by the square root of \(n\), at least provided that the limit of the mean and standard deviation for an infinite sample are finite, and we can use that fact to estimate the error of the approximation.

Derivation 1: The Sample Mean And Variance
By definition the sample mean and variance are given by
\[ \begin{align*} \mu &= \frac1n\sum_{i=1}^n x_i\\ \sigma^2 &= \frac1n\sum_{i=1}^n \left(x_i-\mu\right)^2 \end{align*} \]
The former is identical to the value that we are calculating and the latter can be rearranged as
\[ \begin{align*} \sigma^2 &= \frac1n\sum_{i=1}^n \left(x_i-\mu\right)^2 = \frac1n\sum_{i=1}^n \left(x_i^2 - 2 \times x_i \times \mu + \mu^2\right)\\ &= \frac1n\sum_{i=1}^n x_i^2 - \frac1n\sum_{i=1}^n 2 \times x_i \times \mu + \frac1n\sum_{i=1}^n \mu^2\\ &= \frac1n\sum_{i=1}^n x_i^2 - 2 \times \mu \times \frac1n\sum_{i=1}^n x_i + \frac1n \times n \times \mu^2\\ &= \frac{s_2}{n} - 2 \times \mu \times \mu + \mu^2 = \frac{s_2}{n} - \mu^2 \end{align*} \]
Specifically, if we define
\[ \begin{align*} x_i &= (b-a) \times f\left(u_i\right)\\ s_1 &= \sum_{i=1}^n x_i\\ s_2 &= \sum_{i=1}^n x_i^2 \end{align*} \]
then we can calculate the sample mean \(\mu\) and variance \(\sigma^2\) with
\[ \begin{align*} \mu &= \frac{s_1}{n}\\ \sigma^2 &= \frac{s_2}{n} - \mu^2 \end{align*} \]
as proven by derivation 1, and the standard deviation \(\sigma\) with the square root of the variance. Now it is possible that rounding errors will cause this value for the variance to be negative if, in the infinite limit, it is extremely small compared to the square of the mean. But in such cases we have, pretty much by definition, an accurate approximation and so it's perfectly reasonable to set it to zero with
\[ \sigma^2 = \max\left(\frac{s_2}{n} - \mu^2, 0\right) \]
We can then use \(\mu\) as an approximation of the integral and \(\sigma\) divided by the square root of \(n\) to give an indication of the approximation error.

To demonstrate this approach we shall try integrating the exponential function which, as you will no doubt recall, yields
\[ \int_a^b e^x \; \mathrm{d}x = \Big[e^x\Big]_a^b = e^b - e^a \]
Program 1 uses the above formulae for the sample mean and variance to approximate the integral and the standard deviation of the approximation, comparing them to the exact value and the true error.

Program 1: Integrating The Exponential Function

Whilst the standard deviation does indeed give a reasonable indication of the approximation error, it is unfortunately rather on the large side compared to the errors of the integration algorithms that we have implemented so far, as demonstrated by program 2 which compares the approximation error of the Monte Carlo algorithm to those of our ak.rombergIntegral and ak.simpsonIntegral integrators.

Program 2: Monte Carlo Versus The Rest

Despite this, the Monte Carlo algorithm has one significant advantage over its deterministic alternatives.

Multivariate Integration

Last time[4] we saw how we could recursively apply our integration algorithms to multivariate functions, being functions that take more than one argument, much as we solve them exactly by integrating over each argument in turn, treating the others as constants as we do so. Unfortunately numerically integrating numerical integrals compounds the approximation errors of these algorithms and so this approach gets rather computationally expensive as the number of dimensions increases.

Monte Carlo integration, on the other hand, can be generalised to multivariate functions with barely any modification. Specifically, we can approximate the integrals of multivariate functions with the expectations for uniformly distributed arguments of their values multiplied by the volume of the region over which we're integrating them. For example, in two dimensions, we have
\[ \int_{x_0}^{x_1} \int_{y_0}^{y_1} f(x, y) \; \mathrm{d}y \; \mathrm{d}x \approx \frac1n \times \sum_{i=1}^n \left(x_1-x_0\right) \times \left(y_1-y_0\right) \times f\left(u_i, v_i\right) \]
where each \(u_i\) is an observation of a \(U\left(x_0, x_1\right)\) distributed random variable and each \(v_i\) is an observation of a \(U\left(y_0, y_1\right)\) distributed one. This follows from precisely the same argument that we made for the univariate case, which we can summarise as the volume beneath a surface being trivially equal to its average height above the plane within a given region multiplied by the area of that region.
Now this approximation doesn't depend upon anything other than the number of samples, the values of the function for them and the area of the region over which it is being integrated and, crucially, neither does its standard deviation. This means that, when generalising it to multivariate functions, its error isn't directly affected by the number of dimensions, quite unlike the recursive application of univariate numerical integration algorithms.

To demonstrate bivariate Monte Carlo integration we shall use it to integrate the square root of the sum of \(x\) and \(y\) again, which has an exact solution of
\[ \begin{align*} \int_{x_0}^{x_1} \int_{y_0}^{y_1} \sqrt{x + y} \; \mathrm{d}y \; \mathrm{d}x &=\int_{x_0}^{x_1} \left(\int_{y_0}^{y_1} \left(x + y\right)^\tfrac12 \; \mathrm{d}y\right) \; \mathrm{d}x\\ &=\int_{x_0}^{x_1} \bigg[\tfrac23\left(x+y\right)^\tfrac32\bigg]_{y_0}^{y_1} \; \mathrm{d}x\\ &=\int_{x_0}^{x_1} \tfrac23\left(x+y_1\right)^\tfrac32 - \tfrac23\left(x+y_0\right)^\tfrac32 \; \mathrm{d}x\\ &=\bigg[\tfrac4{15}\left(x+y_1\right)^\tfrac52 - \tfrac4{15}\left(x+y_0\right)^\tfrac52\bigg]_{x_0}^{x_1}\\ &=\left(\tfrac4{15}\left(x_1+y_1\right)^\tfrac52 - \tfrac4{15}\left(x_1+y_0\right)^\tfrac52\right) - \left(\tfrac4{15}\left(x_0+y_1\right)^\tfrac52 - \tfrac4{15}\left(x_0+y_0\right)^\tfrac52\right) \end{align*} \]
Program 3 compares the bivariate Monte Carlo approximation of this integral and its standard deviation to the exact value and error in much the same way as program 1 did, to much the same effect.

Program 3: Bivariate Monte Carlo Integration

ak.monteCarloIntegral

Just as we did for our ak.nestedIntegral integrator, we shall define multivariate functions as taking a single vector argument using our ak.vector type. By tremendous coincidence, or perhaps just mundane foresight, we have already implemented a multivariate uniform distribution[4], including the ak.multiUniformRnd vector valued random number generator which we can use to produce such arguments.

We shall use the integral function given in listing 1 to actually compute the Monte Carlo integral.

Listing 1: The integral Function
function integral(f, threshold, steps, rnd, v) {
 var i = 0;
 var s1 = 0;
 var s2 = 0;
 var j, fx, s2i2, e1i1;

 if(!isFinite(v)) throw new Error('invalid bounds in ak.monteCarloIntegral');

 do {
  for(j=0;j<32;++j) {
   fx = f(rnd())*v;
   s1 += fx;
   s2 += fx*fx;
  }
  i += j;

  s2i2 = i*s2-s1*s1;
  e1i1 = threshold*(i+Math.abs(s1));
 }
 while(i<steps && s2i2>i*e1i1*e1i1);

 if(!(s2i2<=i*e1i1*e1i1)) {
  throw new Error('failure to converge in ak.monteCarloIntegral');
 }
 return s1/i;
}

Here, f is the function to be integrated, threshold is the convergence threshold, steps is the maximum number of steps, rnd is a uniform random number generator over the bounds of the integral and v is the volume enclosed by those bounds.
Once we've confirmed that the volume is finite, we take samples of 32 values of the function at a time to estimate the mean and standard deviation until either we reach the maximum number of steps or satisfy the convergence criterion of

s2i2<=i*e1i1*e1i1

which is justified by derivation 2.

Derivation 2: The Convergence Criterion
Given a sample mean and variance after the \(i^\mathrm{th}\) step of
\[ \begin{align*} \mu &= \frac{s_1}i\\ \sigma^2 &= \frac{s_2}i - \mu^2 \end{align*} \]
respectively, then, as discussed above, we can use
\[ \sqrt{\max\left(\frac{\sigma^2}i,0\right)} \]
as a measure of the approximation error. It's therefore not unreasonable to use a normalised error to test for convergence with
\[ \frac{\sqrt{\max\left(\frac{\sigma^2}i,0\right)}}{1+|\mu|} \leqslant \epsilon \]
where the vertical bars stand for the absolute value of the term that they enclose. Rearranging this, we have
\[ \begin{align*} \sqrt{\max\left(\frac{\sigma^2}i,0\right)} &\leqslant \epsilon \times (1+|\mu|)\\ \max\left(\frac{\sigma^2}i,0\right) &\leqslant \epsilon^2 \times (1+|\mu|)^2 \end{align*} \]
Now, since the right hand side of the inequality is a square it cannot be negative and so we don't need to set the minimum value of the left hand side to zero, yielding
\[ \begin{align*} \frac{\sigma^2}i &\leqslant \epsilon^2 \times (1+|\mu|)^2\\ \sigma^2 &\leqslant i \times \epsilon^2 \times (1+|\mu|)^2 \end{align*} \]
Finally, substituting in the formulae for \(\mu\) and \(\sigma\) yields
\[ \begin{align*} \frac{s_2}i - \frac{s_1^2}{i^2} &\leqslant i \times \epsilon^2 \times \left(1+\left|\frac{s_1}i\right|\right)^2\\ i \times s_2 - s_1^2 &\leqslant i \times \epsilon^2 \times \left(i+\left|s_1\right|\right)^2 \end{align*} \]
since \(i\) is positive and hence equal to \(|i|\).

The multivariate Monte Carlo integration algorithm is driven by the multiIntegral helper function given in listing 2 together with the volume function to calculate the volume enclosed by the bounds.

Listing 2: The multiIntegral Function
function volume(x0, x1) {
 var n = x0.dims();
 var v = 1;
 var i;

 for(i=0;i<n;++i) v *= x1.at(i)-x0.at(i);
 return v;
}

function multiIntegral(f, x0, x1, threshold, steps, rnd) {
 var n = x0.dims();

 if(n===0) throw new Error('invalid lower bound in ak.monteCarloIntegral');

 if(ak.nativeType(x1)===ak.UNDEFINED_T) {
  x1 = x0;
  x0 = ak.vector(n, 0);
 }
 else if(ak.type(x1)!==ak.VECTOR_T || x1.dims()!==n) {
  throw new Error('invalid upper bound in ak.monteCarloIntegral');
 }

 return integral(f, threshold, steps, ak.multiUniformRnd(x0, x1, rnd), volume(x0, x1));
}

Here we are first ensuring that x0 has at least one dimension and then if x1 is undefined setting it to x0 and setting x0 to a vector of zeros, otherwise checking that it is a vector with the same number of dimensions as x0. Finally, we call the integral function to do the actual work with an ak.multiUniformRnd initialised with the bounds and the random number generator rnd and the volume enclosed by them.

We should also like to support univariate Monte Carlo integration and will do so with the uniIntegral helper function given in listing 3.

Listing 3: The uniIntegral Function
function uniIntegral(f, x0, x1, threshold, steps, rnd) {
 if(ak.nativeType(x1)===ak.UNDEFINED_T) {
  x1 = x0;
  x0 = 0;
 }

 if(ak.nativeType(x0)!==ak.NUMBER_T) {
  throw new Error('invalid lower bound in ak.monteCarloIntegral');
 }

 if(ak.nativeType(x1)!==ak.NUMBER_T) {
  throw new Error('invalid upper bound in ak.monteCarloIntegral');
 }

 return integral(f, threshold, steps, ak.uniformRnd(x0, x1, rnd), x1-x0);
}

We can get away with using the integral function for both univariate and multivariate integration because it doesn't depend upon the randomly generated arguments for the function, only upon the results of it for them. Otherwise this simply replaces the upper bound with the lower bound and sets the latter to zero if the former is undefined and then checks that they are both numbers.

Finally, all of this is glued together in the ak.monteCarloIntegral function given in listing 4.

Listing 4: ak.monteCarloIntegral
var DEFAULT_STEPS = Math.pow(2, 22);
var DEFAULT_THRESHOLD = Math.pow(2, -9);

ak.monteCarloIntegral = function(f, threshold, steps, rnd) {
 if(ak.nativeType(f)!==ak.FUNCTION_T) {
  throw new Error('invalid function in ak.monteCarloIntegral');
 }

 threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? DEFAULT_THRESHOLD
                                                       : Math.abs(threshold);
 if(isNaN(threshold)) {
  throw new Error('non-numeric threshold passed to ak.monteCarloIntegral');
 }

 steps = ak.nativeType(steps)===ak.UNDEFINED_T ? DEFAULT_STEPS
                                               : ak.floor(Math.abs(steps));
 if(!(steps>0)) {
  throw new Error('invalid steps passed to ak.monteCarloIntegral');
 }

 if(ak.nativeType(rnd)===ak.UNDEFINED_T) {
  rnd = Math.random;
 }
 else if(ak.nativeType(rnd)!==ak.FUNCTION_T) {
  throw new Error('invalid generator in ak.monteCarloIntegral');
 }

 return function(x0, x1) {
  return ak.type(x0)===ak.VECTOR_T ? multiIntegral(f, x0, x1, threshold, steps, rnd)
                                   : uniIntegral(f, x0, x1, threshold, steps, rnd);
 };
};

This provides defaults for the convergence threshold, the maximum number of steps and the random number generator rnd, checking that they and the function f are valid before returning a function that forwards to the appropriate helper function to use the univariate or multivariate Monte Carlo integration algorithm to integrate the function between the bounds x0 and x1.

Program 4 demonstrates how we can use ak.monteCarloIntegral, with a convergence threshold of e, to approximate the integral of the bivariate function that we used as an example before.

Program 4: Using ak.monteCarloIntegral

This is pretty much in line with the results that we have seen so far, with the true normalised error, here calculated with ak.diff, being close to the convergence threshold most of the time. Unfortunately you can easily cause convergence failure by reducing e, say by dividing it by ten, without increasing the maximum number of steps.
Nevertheless, using expectations to approximate integrals is pretty much all that we can reasonably attempt for more than a few dimensions. We can, however, improve upon the way that we calculate those expectations.

References

[1] Trapezium Artistry, www.thusspakeak.com, 2015.

[2] The Tip Of The Romberg, www.thusspakeak.com, 2015.

[3] A Jump To The Left And Some Steps To The Right, www.thusspakeak.com, 2016.

[4] The Nesting Instinct, www.thusspakeak.com, 2016.

[5] Into The nth Dimension, www.thusspakeak.com, 2016.

Leave a comment