We have covered in some detail the numerical approximation of differentiation^{[1][5]} but have yet to consider the inverse operation of integration and I think that it's high time that we got around to it.
One of the puzzling things about calculus is that we can (almost) always find a formula for the derivative of a function, a fact we exploited with our computer algebra approach to numerical differentiation^{[4]}, but can easily come up with functions for which we can't find formulae for their integrals. A good example is the normal PDF
Such approximation formulae are both efficient and accurate but are extremely difficult to get right. For integrals that we might use time and time again, like the normal CDF, it is undoubtedly worth the time and effort to figure them out, or at least to assume that someone else has taken the time and effort to do so, but these are exceptional cases and in general we should prefer something more, well, general purpose.
Specifically, we should like to accurately approximate the integral of a function whilst making as few assumptions about it as possible.
Loosely speaking, the integral of a function between two points is equivalent to the area under its graph between those points, with the convention that when the graph falls below the \(x\) axis the area is negative. If we note that a vertical line between the \(x\) axis and the graph is an infinitesimal change in the area under the curve, as illustrated by figure 1, and define the area between zero and \(x\) with
Now these demonstrations are rather lacking in rigour, to say the least, and I've deliberately left out an important caveat; since the derivative of a constant is zero, the integral of the derivative of a function may differ from that function by an arbitrary constant offset \(C\). I sidestepped it here by deliberately choosing the bounds of integration so that it equalled zero. A more general result is
For indefinite integrals, in which we don't supply bounds, we have instead
So that's exactly what we're going to do!
A simple scheme is to approximate the graph with a bar chart in which we divide the range over which we're integrating into a number of subdivisions which we fill with bars whose heights are the value of the function at the midpoints of their bases, as illustrated by figure 2.
All we then need do is to calculate their areas by multiplying their widths by their heights and add them up, counting on each lying partially over the curve and partially under it to cancel out some of the approximation errors.
Now this certainly is simple, but it is also simplistic; so much so that it really isn't worth implementing, particularly since there's another scheme, very nearly as simple, that is more accurate.
If, instead of using bars to approximate the graph, we draw a straight line from the value of the function at the left hand side of each subdivision to the value of the function at its right hand side we get a much closer approximation, as illustrated by figure 3.
The trick to using this to approximate the integral is to drop vertical lines from the endpoints of each straight line segment to the \(x\) axis, forming a trapezium, one of which is illustrated in figure 4.
Now the area of such a trapezium is simply the average of the heights of its left and right sides multiplied by its width and it's no less trivial to add them up than it was for our bar chart approximation.
Typically, we divide the range of integration into \(n\) subdivisions of equal width \(h\) (a slightly odd convention that stems from trapeziums typically being drawn with the top and bottom edges parallel) yielding an approximation of the integral of
Unfortunately, whilst we were able to use numerical analysis to work out just how large \(\delta\) should be for the best accuracy, albeit under some rather contrived assumptions about the higher derivatives of the functions that we should like to differentiate, it is much harder to make an educated guess as to how wide the subdivisions of the trapezium rule should be. The problem stems from the fact that for the former we only needed to make assumptions about the behaviour of the function at the point at which we were calculating the derivative, whereas for the latter we need to make assumptions about the behaviour of the function over the entire range of integration.
For implementers of numerical libraries there is a design choice of last recourse that we may turn to when faced with such difficulties; force the poor benighted user to choose what they think might be an appropriate value. On the plus side, the user knows the problem that they're trying to solve and so is at least theoretically in a position to figure it out. On the minus side, it's a monumental pain in the backside for said user and, as the principal user of my own numerical libraries, I resent the blasted implementer for forcing me to do so.
As usual we first check that the arguments are valid, that
Note that if only one argument is passed to the integral then it will calculate the integral from zero to that argument, which we do by changing the values of
Having got these preliminaries out of the way, the last three lines apply the trapezium rule to the function
Here the first column is the approximate integral, the second the exact and the third the absolute difference between them. Note that the trapezium width has deliberately been chosen to be a rather large
I think that this effectively demonstrates that
[2] You're Going To Have To Think! Why Finite Differences Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[3] You're Going To Have To Think! Why Polynomial Approximation Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[4] You're Going To Have To Think! Why Computer Algebra Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[5] You're Going To Have To Think! Why Automatic Differentiation Won't Cure Your Calculus Blues., www.thusspakeak.com, 2014.
[6] Define Normal., www.thusspakeak.com, 2014.
One of the puzzling things about calculus is that we can (almost) always find a formula for the derivative of a function, a fact we exploited with our computer algebra approach to numerical differentiation^{[4]}, but can easily come up with functions for which we can't find formulae for their integrals. A good example is the normal PDF
\[
\phi_{\mu,\sigma}(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{\tfrac{(x\mu)^2}{2\sigma^2}}
\]
whose integral, the CDF, we implemented with an arcane approximation filled to bursting with magic numbers, FORTRAN style^{[6]}.Such approximation formulae are both efficient and accurate but are extremely difficult to get right. For integrals that we might use time and time again, like the normal CDF, it is undoubtedly worth the time and effort to figure them out, or at least to assume that someone else has taken the time and effort to do so, but these are exceptional cases and in general we should prefer something more, well, general purpose.
Specifically, we should like to accurately approximate the integral of a function whilst making as few assumptions about it as possible.
Areas And Integrals
Figure 1: The Area Under The Graph
\[
A(x) = \int_0^x f(y) \, \mathrm{d}y
\]
then we can demonstrate that the derivative is the inverse of the integral with
\[
\begin{align*}
\mathrm{d} A(x) &= f(x) \times \mathrm{d}x\\
\frac{\mathrm{d}}{\mathrm{d}x} A(x) &= f(x)
\end{align*}
\]
and so
\[
\frac{\mathrm{d}}{\mathrm{d}x} \int_0^x f(y) \, \mathrm{d}y = f(x)
\]
By simple substitution we also have
\[
\int_0^x \frac{\mathrm{d}}{\mathrm{d}y} A(y) \, \mathrm{d}y = \int_0^x f(y) \, \mathrm{d}y = A(x)
\]
illustrating that the converse also holds.Now these demonstrations are rather lacking in rigour, to say the least, and I've deliberately left out an important caveat; since the derivative of a constant is zero, the integral of the derivative of a function may differ from that function by an arbitrary constant offset \(C\). I sidestepped it here by deliberately choosing the bounds of integration so that it equalled zero. A more general result is
\[
\int_c^x \frac{\mathrm{d}}{\mathrm{d}y} A(y) \, \mathrm{d}y = \left[A(y)\right]_c^x = A(x)  A(c)
\]
The square brackets contain the formula for the integral with a constant of zero and its value at the subscript is subtracted from its value at the superscript to yield the result. The advantage of this form of the integral, known as the definite integral, is that the constant cancels out
\[
\left[A(y) + C\right]_c^x = (A(x)+C)  (A(c)+C) = A(x)  A(c) = \left[A(y)\right]_c^x
\]
so we can safely assume that it is zero.For indefinite integrals, in which we don't supply bounds, we have instead
\[
\int \frac{\mathrm{d}}{\mathrm{d}y} A(y) \, \mathrm{d}y = A(x) + C
\]
Note that the derivative of this result is still the inverse of the integral since
\[
\frac{\mathrm{d}}{\mathrm{d}x} (A(x) + C) = \frac{\mathrm{d}}{\mathrm{d}x} A(x)
\]
Given the relationship between areas and integrals, it is extremely tempting to use approximations of the former to yield approximations of the latter.So that's exactly what we're going to do!
Areas And Approximations
Figure 2: A Simple Approximation
Figure 3: A Better Simple Approximation
Figure 4: A Line Segment's Trapezium
All we then need do is to calculate their areas by multiplying their widths by their heights and add them up, counting on each lying partially over the curve and partially under it to cancel out some of the approximation errors.
Now this certainly is simple, but it is also simplistic; so much so that it really isn't worth implementing, particularly since there's another scheme, very nearly as simple, that is more accurate.
If, instead of using bars to approximate the graph, we draw a straight line from the value of the function at the left hand side of each subdivision to the value of the function at its right hand side we get a much closer approximation, as illustrated by figure 3.
The trick to using this to approximate the integral is to drop vertical lines from the endpoints of each straight line segment to the \(x\) axis, forming a trapezium, one of which is illustrated in figure 4.
Now the area of such a trapezium is simply the average of the heights of its left and right sides multiplied by its width and it's no less trivial to add them up than it was for our bar chart approximation.
Typically, we divide the range of integration into \(n\) subdivisions of equal width \(h\) (a slightly odd convention that stems from trapeziums typically being drawn with the top and bottom edges parallel) yielding an approximation of the integral of
\[
\sum_{i=1}^{n} \tfrac{1}{2}\left(f\left(x_i\right) + f\left(x_{i+1}\right)\right) \times h
\]
where \(x_i\) and \(x_{i+1}\) are the left and right endpoints of the \(i^{\mathrm{th}}\) subdivision, which we can rearrange as
\[
\tfrac{1}{2} h \bigg(f\left(x_1\right) + 2f\left(x_2\right) + 2f\left(x_3\right) + \dots + 2f\left(x_{n1}\right) + 2f\left(x_n\right) + f\left(x_{n+1}\right)\bigg)
\]
This is known as the trapezium rule and is the most basic algorithm that we shall use for numerical integration, much as finite differences such as the symmetric finite difference
\[
\frac{\mathrm{d}f}{\mathrm{d}x}(y) \approx \frac{f(y+\delta)  f(y\delta)}{2\delta}
\]
were the most basic algorithms that we used for numerical differentiation^{[2]}.Unfortunately, whilst we were able to use numerical analysis to work out just how large \(\delta\) should be for the best accuracy, albeit under some rather contrived assumptions about the higher derivatives of the functions that we should like to differentiate, it is much harder to make an educated guess as to how wide the subdivisions of the trapezium rule should be. The problem stems from the fact that for the former we only needed to make assumptions about the behaviour of the function at the point at which we were calculating the derivative, whereas for the latter we need to make assumptions about the behaviour of the function over the entire range of integration.
For implementers of numerical libraries there is a design choice of last recourse that we may turn to when faced with such difficulties; force the poor benighted user to choose what they think might be an appropriate value. On the plus side, the user knows the problem that they're trying to solve and so is at least theoretically in a position to figure it out. On the minus side, it's a monumental pain in the backside for said user and, as the principal user of my own numerical libraries, I resent the blasted implementer for forcing me to do so.
ak.trapeziumIntegral
The implementation of the trapezium rule follows our familiar convention of creating a function that will compute the integral, as shown in listing 1.Listing 1: ak.trapeziumIntegral
ak.trapeziumIntegral = function(f, h) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('invalid function in ak.trapeziumIntegral'); } h = Math.abs(h); if(isNaN(h)  h===0) { throw new Error('invalid trapezium width in ak.trapeziumIntegral'); } return function(x0, x1) { return trapeziumIntegral(f, h, x0, x1); }; };
As usual we first check that the arguments are valid, that
f
is a function and h
is an appropriate trapezium width, before returning a function that forwards to trapeziumIntegral
, given in listing 2, to compute the integral from x0
to x1
.Listing 2: trapeziumIntegral
function trapeziumIntegral(f, h, x0, x1) { var n, i, s; if(ak.nativeType(x1)===ak.UNDEFINED_T) { x1 = x0; x0 = 0; } if(!isFinite(x0)  !isFinite(x1)) { throw new Error('invalid range of integration in ak.trapeziumIntegral'); } n = ak.ceil(Math.abs(x1x0)/h); if(n>ak.INT_MAX) throw new Error('too many steps in ak.trapeziumIntegral'); if(n===0) n = 1; h = (x1x0)/n; s = 0.5*(f(x0) + f(x1)); for(i=1;i<n;++i) s += f(x0 + i*h); return s*h; }
Note that if only one argument is passed to the integral then it will calculate the integral from zero to that argument, which we do by changing the values of
x0
and x1
if the latter is undefined. We then check that these bounds are finite and calculate the number of trapeziums required to fill the range, rounding up to the nearest integer. If this number is within the range of exactly representable integers we compute a new trapezium width that is therefore the closest smaller value to the user specified width that exactly divides the range.Having got these preliminaries out of the way, the last three lines apply the trapezium rule to the function
f
and return its result.
Using ak.trapeziumIntegral
The simplest way to demonstrateak.trapeziumIntegral
, whose implementation is found in TrapeziumIntegral.js, is to apply it to a function whose integral we know how to solve, say a quadratic
\[
\begin{align*}
\int_{0}^{z} \left(ax^2 + bx + c\right) \; \mathrm{d}x &= \left[\tfrac{1}{3}ax^3 + \tfrac{1}{2}bx^2 + cx\right]_{0}^{z}\\
&= \tfrac{1}{3}az^3 + \tfrac{1}{2}bz^2 + cz
\end{align*}
\]
and program 1 does exactly that.
Program 1: Using ak.trapeziumIntegral



Here the first column is the approximate integral, the second the exact and the third the absolute difference between them. Note that the trapezium width has deliberately been chosen to be a rather large
0.1
to emphasise approximation errors. Try setting it to 1e4
for a demonstration of better accuracy (or smaller still if you have the patience).I think that this effectively demonstrates that
ak.trapeziumIntegral
does indeed approximate the integral of a function and just how inconvenient having to choose the trapezium width is; it's not immediately obvious what width we should choose even when we know the exact formula for the integral. So in the next post we shall apply some of the lessons we learned from our numerical differentiation algorithms to implement a numerical integration algorithm in which we explicitly specify the desired accuracy of the approximation.
References
[1] You're Going To Have To Think! Why [Insert Algorithm Here] Won't Cure Your Calculus Blues, www.thusspakeak.com, 2013.[2] You're Going To Have To Think! Why Finite Differences Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[3] You're Going To Have To Think! Why Polynomial Approximation Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[4] You're Going To Have To Think! Why Computer Algebra Won't Cure Your Calculus Blues, www.thusspakeak.com, 2014.
[5] You're Going To Have To Think! Why Automatic Differentiation Won't Cure Your Calculus Blues., www.thusspakeak.com, 2014.
[6] Define Normal., www.thusspakeak.com, 2014.
Leave a comment