All Your Basis Are Belong To Us

| 0 Comments

A few years ago we saw how we could approximate a function \(f\) between pairs of points \(\left(x_i, f\left(x_i\right)\right)\) and \(\left(x_{i+1}, f\left(x_{i+1}\right)\right)\) by linear[1] and cubic spline[2][3] interpolation which connect them with straight lines and cubic polynomials respectively, the latter of which yield smooth curves at the cost of somewhat arbitrary choices about their exact shapes.
An alternative approach is to construct a single function that passes through all of the points and, given that \(n^\mathrm{th}\) order polynomials are uniquely defined by \(n+1\) values at distinct \(x_i\), it's tempting to use them. To do so we need to find a set of coefficients \(a_j\) that satisfy the polynomial equations
\[ f\left(x_i\right) = a_0 + a_1 \times x_i + a_2 \times x_i^2 + \dots + a_n \times x_i^n = \sum_{j=0}^n a_j \times x_i^j \]
for each \(x_i\), where \(\sum\) is the summation sign. Now, if we define a vector \(\mathbf{f}\) with the elements
\[ f_i = f\left(x_i\right) \]
another \(\mathbf{a}\) whose elements are the coefficients \(a_j\) and a matrix \(\mathbf{X}\) having the elements
\[ x_{ij} = x_i^j \]
then we can express those polynomial equations in terms of their elements with
\[ f_i = \sum_{j=0}^n x_{ij} \times a_j \]
which, by the rules of matrix-vector multiplication, implies that
\[ \mathbf{f} = \mathbf{X} \times \mathbf{a} \]
To calculate the values of the coefficients we simply need to pre-multiply both sides by the inverse of \(\mathbf{X}\)
\[ \mathbf{a} = \mathbf{X}^{-1} \times \mathbf{f} \]
as demonstrated for the sine function by program 1 using our ak.matrix and ak.vector types, plotting it in grey and the interpolation in black.

Program 1: Polynomial Interpolation

Clearly the resulting polynomial is a good approximation of the sine between the points, only significantly diverging from it outside of them as highlighted by program 2.

Program 2: Polynomial Extrapolation

We can easily get a feeling for the magnitude of this divergence by considering the Taylor series of a function in the region of some argument \(x\), given by
\[ f(x+\delta) = f(x) + \delta f^\prime(x) + \tfrac12 \delta^2 f^{\prime\prime}(x) + \dots + \tfrac{1}{n!} \delta^n f^{(n)}(x) + \dots = \sum_i \tfrac{1}{i!} \delta^i f^{(i)}(x) \]
where the exclamation mark stands for the factorial of the term preceding it and \(f^\prime\), \(f^{\prime\prime}\) and \(f^{(n)}\) are the first, second and \(n^\mathrm{th}\) derivarives of \(f\). For the sine function in the region of zero this equates to
\[ \sin(x) = x - \tfrac16 x^3 + \tfrac1{120} x^5 - \tfrac1{5040} x^7 + \dots \]
If we truncate the series to a fifth order polynomial then, since the factorial grows much more quickly than powers of \(x\), the most significant power that we discard is the seventh which at the last point equals
\[ \frac{1}{5040} \times \left(\frac57\pi\right)^7 \approx 0.06 \]
but increases rapidly beyond it
\[ \begin{align*} \frac{1}{5040} \times \pi^7 &\approx 0.6\\ \frac{1}{5040} \times \left(\frac97\pi\right)^7 &\approx 3.5 \end{align*} \]
Whilst extrapolation is almost always inaccurate and is therefore forgiveable, polynomial interpolation can fail in ways that are much harder to dismiss. For example, when applied to the Runge function
\[ f(x) = \frac{1}{1 + 25 \times x^2} \]
it oscillates wildly at the edges as the number of terms increases, as demonstrated by program 3.

Program 3: The Runge Function

The problem stems from the fact that the coefficients of its Taylor series grow very large very quickly, as shown by program 4 using our ak.surreal type.

Program 4: The Runge Taylor Series Coefficients

As a result, when we attempt to fit a finite polynomial that is as accurate as possible to it at equally spaced points the adjustments to the coefficients to make up for the terms that have been truncated yield ever increasing errors as we move away from zero.
Noting that the higher order terms don't have that much impact close to zero, it makes sense to concentrate the samples at the edges, sacrificing the accuracy of the interpolation for small arguments for the benefit of large ones. For example, the Chebyshev nodes
\[ x_i = \cos\left(\frac{2 \times i + 1}{2 \times n} \times \pi\right) \]
for \(i\) from zero to \(n-1\) can be particularly effective, as shown by program 5.

Program 5: Using Chebyshev Nodes

If we have the luxury of choosing values for the \(x_i\) in an interval \([a, b]\) then it is not unreasonable to use Chebyshev nodes that are shifted and scaled to it with
\[ x_i = \frac{a+b}{2} + \frac{a-b}{2} \times \cos\left(\frac{2 \times i + 1}{2 \times n} \times \pi\right) \]
although it should be noted that they are by no means guaranteed to yield the best possible approximation for any given function.

Basis Functions

Now there's nothing in our formulation that requires that we use polynomials and we can, in fact, choose any set of functions that do not coincide at all of the nodes which are called basis functions. A particularly appealing choice are bell shaped curves such as the normal probability density function
\[ \phi(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
For example, program 6 uses instances of our ak.normalPDF with means set at equally spaced nodes \(x_i\) to fit the Runge function.

Program 6: Using Normal PDFs

This clearly doesn't result in the oscillations of polynomial interpolation, but it does unfortunately require that we choose appropriate standard deviations for the normal PDFs, as illustrated by program 7.

Program 7: Varying Standard Deviations

Whilst it would appear from this that it's best to use large standard deviations, there is a limit beyond which the PDFs become almost flat at the scale of the interpolation requiring very finely balanced coefficients, generally known as weights when using functions other than polynomials, to cancel out similar values and leave multiples of their differences to pass through the points. Given that floating point numbers only have a finite number of digits this inevitably breaks down into numerical instability, as demonstrated by program 8.

Program 8: Numerical Instability

Now it might not be immediately obvious what constitutes a large standard deviation for any given problem but fortunately, for equally spaced nodes \(\mu_i\), there's a relatively simple way to think about it; how large do we want the value of a normal PDF to be at its immediate neighbours' nodes?
Suppose that the difference between the nodes is \(\delta\) and that we want each curve to equal some fraction \(c\) of the values of its closest neighbours at theirs. We consequently require their common standard deviation \(\sigma\) to satisfy
\[ \begin{align*} \phi_i(\mu_i\pm\delta) &= c \times \phi_i(\mu_i)\\ \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{\delta^2}{2\sigma^2}} &= c \times \frac{1}{\sqrt{2\pi}\sigma} \end{align*} \]
Cancelling, taking logs and rearranging gives us
\[ \begin{align*} \frac{\delta^2}{2\sigma^2} &= -\ln c\\ \sigma^2 &= -\frac{\delta^2}{2 \times \ln c} \end{align*} \]
and we will typically want to choose a value fairly close to one for \(c\).

Radial Basis Functions

Rotationally symmetric functions, known as radial basis functions, can be extremely useful for interpolation in multiple dimensions. For instance, the multivariate normal PDF with a mean vector \(\boldsymbol{\mu}\) and a covariance matrix \(\boldsymbol{\Sigma}\) in \(n\) dimensions takes the form
\[ \phi(\mathbf{x}) = \frac{1}{(2\pi)^\frac{n}2 \times |\boldsymbol{\Sigma}|^\frac12} e^{-\frac12 \times (\mathbf{x}-\boldsymbol{\mu}) \times \boldsymbol{\Sigma}^{-1} \times (\mathbf{x}-\boldsymbol{\mu})} \]
where the vertical bars enclosing \(\boldsymbol{\Sigma}\) stand for its determinant. This is rotationally symmetric if each element of its associated vector valued random variable is independent and has the same standard deviation \(\sigma\), simplifying to
\[ \phi(\mathbf{x}) = \frac{1}{(2\pi)^\frac{n}2 \times \sigma^n} e^{-\frac{|\mathbf{x}-\boldsymbol{\mu}|^2}{2\sigma^2}} \]
where the vertical bars enclosing the vector \(\mathbf{x}-\boldsymbol{\mu}\) represent its length. Note that this means that we can still use the trick for choosing the standard deviation based on how large we want the values of each PDF to be at neighbouring nodes.

For example, program 9 interpolates Rosenbrock's function
\[ f\left(x_0, x_1\right) = 100 \times \left(x_1-x_0^2\right)^2 + \left(1-x_0\right)^2 \]
using two-dimensional normal PDFs laid out in a grid with a neighbour node value ratio of 0.95.

Program 9: Interpolating Rosenbrock's Function

Any univariate function \(f\) that is symmetric about zero can be used to construct a radial basis function centred at \(\boldsymbol{\mu}\) with
\[ f_\boldsymbol{\mu}(\mathbf{x}) = f(|\mathbf{x}-\boldsymbol{\mu}|) \]
For example
\[ f(x) = e^{-(\sigma x)^2} \]
yields exactly the same interpolation as the multivariate normal PDF, albeit with a different value for \(\sigma\). Another bell shaped curve that we can use is
\[ f(x) = \frac{1}{1+(\sigma x)^2} \]
Now there's nothing in our definition of radial basis functions that requires that they are bounded and one that isn't is
\[ f(x) = \sqrt{1+(\sigma x)^2} \]
However, it should be noted that not all unbounded radial basis functions yield numerically stable interpolations.

ak.basisFunctionInterpolate

We follow our usual scheme when adding basis function interpolation to the ak library. Specifically, we dispatch to a constructors object to initialise the interpolation's state, then calculate the weights with the makeWeights helper function and finally create a function f to perform the interpolation by summing the results of the basis functions multiplied by them for an argument x, adding property accessors that return the nodes, the basis functions and the weights, as shown by listing 1.

Listing 1: ak.basisFunctionInterpolate
ak.basisFunctionInterpolate = function() {
 var state = {nodes:[], functions:[], weights:[]};
 var arg0  = arguments[0];
 var f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);
 makeWeights(state);

 f = function(x) {
  var n = state.functions.length;
  var y = 0;
  while(n-->0) y += state.weights[n]*state.functions[n](x);
  return y;
 };
 f.nodes = function() {return copyNodes(state.nodes);};
 f.functions = function() {return state.functions.slice(0);};
 f.weights = function() {return state.weights.slice(0);};
 return Object.freeze(f);
};

var constructors = {};

The first constructors initialise the interpolation with an array of nodes x, a function to approximate y and a set of basis functions f, as shown by listing 2.

Listing 2: Construction From A Function
constructors[ak.ARRAY_T] = function(state, arg0, args) {
 var arg1 = args[1];
 constructors[ak.ARRAY_T][ak.nativeType(arg1)](state, arg0, arg1, args);
};

constructors[ak.ARRAY_T][ak.FUNCTION_T] = function(state, arg0, arg1, args) {
 var arg2 = args[2];
 constructors[ak.ARRAY_T][ak.FUNCTION_T][ak.nativeType(arg2)](state, arg0, arg1, arg2);
};

constructors[ak.ARRAY_T][ak.FUNCTION_T][ak.ARRAY_T] = function(state, x, y, f) {
 var n = x.length;
 var i, xi, yi;

 if(f.length!==n) {
  throw new Error('size mismatch in ak.basisFunctionInterpolate');
 }

 state.nodes.length = n;
 for(i=0;i<n;++i) {
  if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
   throw new Error('invalid basis function in ak.basisFunctionInterpolate');
  }
  xi = x[i];
  yi = Number(y(xi));
  if(!isFinite(yi)) {
   throw new Error('invalid node value in ak.basisFunctionInterpolate');
  }
  state.nodes[i] = {x:xi, y:yi};
 }
 state.functions = f.slice(0);
};

Here we're simply evaluating the value of the function at each node and storing both its argument and its result, as well as copying the array of basis functions and checking that they are all valid. The next set of constructors set the values at each node to those passed to them in an array y, as defined by listing 3.

Listing 3: Construction From A Sample
constructors[ak.ARRAY_T][ak.ARRAY_T] = function(state, arg0, arg1, args) {
 var arg2 = args[2];
 constructors[ak.ARRAY_T][ak.ARRAY_T][ak.nativeType(arg2)](state, arg0, arg1, arg2);
};

constructors[ak.ARRAY_T][ak.ARRAY_T][ak.ARRAY_T] = function(state, x, y, f) {
 var n = x.length;
 var i, yi;

 if(y.length!==n || f.length!==n) {
  throw new Error('size mismatch in ak.basisFunctionInterpolate');
 }

 state.nodes.length = n;
 for(i=0;i<n;++i) {
  if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
   throw new Error('invalid basis function in ak.basisFunctionInterpolate');
  }
  yi = Number(y[i]);
  if(!isFinite(yi)) {
   throw new Error('invalid node value in ak.basisFunctionInterpolate');
  }
  state.nodes[i] = {x:x[i], y:yi};
 }
 state.functions = f.slice(0);
};

The last constructor, given in listing 4, allows each node and value to be represented with x and y properties of objects.

Listing 4: Construction From Points
constructors[ak.ARRAY_T][ak.ARRAY_T][ak.UNDEFINED_T] = function(state, nodes, f) {
 var n = nodes.length;
 var i, x, y;

 if(f.length!==n) throw new Error('size mismatch in ak.basisFunctionInterpolate');

 state.nodes.length = n;
 for(i=0;i<n;++i) {
  if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
   throw new Error('invalid basis function in ak.basisFunctionInterpolate');
  }
  x = nodes[i].x; if(ak.nativeType(x)===ak.FUNCTION_T) x = x();
  y = nodes[i].y; if(ak.nativeType(y)===ak.FUNCTION_T) y = y();
  y = Number(y);
  if(!isFinite(y)) {
   throw new Error('invalid node value in ak.basisFunctionInterpolate');
  }
  state.nodes[i] = {x:x, y:y};
 }
 state.functions = f.slice(0);
};

Now the most important part of ak.basisFunctionInterpolate is the calculation of the weights, which is performed after construction by the makeWeights helper function defined in listing 5.

Listing 5: Creating The Weights
function makeWeights(state) {
 var n = state.nodes.length;
 var y = new Array(n);
 var a = new Array(n*n);
 var i = 0;
 var r, c, x;

 for(r=0;r<n;++r) {
  x = state.nodes[r].x;
  y[r] = state.nodes[r].y;
  for(c=0;c<n;++c) a[i++] = state.functions[c](x);
 }
 y = ak.vector(y);
 a = ak.matrix(n, n, a);
 state.weights = ak.mul(ak.inv(a), y).toArray();
}

This time the values at the nodes that we want to interpolated between are stored in y, the values of the basis functions at them are stored in a and the weights that we calculate from them are stored in the weights member of the state, ready for the interpolation to use.

Finally, when returning the nodes and their values we want to make sure that they can't be inadvertently changed and so we create an array of copies, as shown by listing 6.

Listing 6: Copying The Nodes And Their Values
function copyNodes(nodes) {
 var n = nodes.length;
 var copy = new Array(n);
 var i;

 for(i=0;i<n;++i) copy[i] = {x:nodes[i].x, y:nodes[i].y};
 return copy;
}

Program 10 shows how we can use ak.basisFunctionInterpolate, which is implemented in BasisFunctionInterpolate.js, to approximate Rosenbrock's function at randomly placed nodes.

Program 10: Using ak.basisFunctionInterpolate

Here we're setting each PDF's standard deviation according to its value at its nearest neighbouring node which means that, unlike with regularly placed PDFs with equal standard deviations, the matrix of basis function values won't contain lots of equal elements. It will therefore usually be much easier to invert and so we can afford to set the node value ratio a lot closer to one, in this case to 0.99.

References

[1] Chalk The Lines, www.thusspakeak.com, 2018.

[2] Cubic Line Division, www.thusspakeak.com, 2018.

[3] We're Not For Turning, www.thusspakeak.com, 2018.

Leave a comment