Some time ago^{[1]} we saw how Newton's method used the derivative of a univariate scalar valued function to guide the search for an argument at which it took a specific value. A related problem is finding a vector at which a multivariate vector valued function takes one, or at least comes as close as possible to it. In particular, we should often like to fit an arbitrary parametrically defined scalar valued functional form to a set of points with possibly noisy values, much as we did using linear regression^{[2]} to find the best fitting weighted sum of a given set of functions, and in this post we shall see how we can generalise Newton's method to solve such problems.
For a multivariate vector valued function \(\mathbf{f}\) and a target vector value \(\mathbf{y}\) we can express the inversion problem as the search for a vector \(\mathbf{x}^\ast\) that minimises \(\mathbf{f}(\mathbf{x})  \mathbf{y}\), where the vertical bars represent the length of the vector between them, being the distance between \(\mathbf{f}(\mathbf{x})\) and \(\mathbf{y}\). Equivalently, we can seek to minimise the function
To see how this relates to regression, suppose that we have a set of points \(\mathbf{x}_i\) and associated scalar values \(y_i\) that we wish to approximate with a function \(f_\mathbf{p}\), where \(\mathbf{p}\) is a vector of parameters which, much like the coefficients of a polynomial, control its particular shape within the constraints of its general form. The best set of parameters \(\mathbf{p}^\ast\) minimises the sum of squared errors
If the second derivatives of the elements of \(\mathbf{f}\) are small or if \(\mathbf{x}\) is close to \(\mathbf{x}^\ast\), so that \(\mathbf{f}(\mathbf{x})\) is close to \(\mathbf{y}\), then \(\mathbf{Q}\) will be small and we can reasonably approximate the matrixvector equation by discarding it with
Recall that
A simple scheme is to choose values for \(\lambda_0\), a rate of decrease \(\rho_\) for when progress is being made and a rate of increase \(\rho_+\) for when it is not. Defining
Program 1 uses this approach to search for a solution to a pair of randomly generated simultaneous quadratic equations from a random starting point, marking the target point in yellow, unsuccessful steps in red and successful steps in green, printing the difference from the target value and the magnitude of the step in the console at the bottom and restarting if it converges on a local minimum other than the target.
An example of a parametrically defined scalar valued function and its derivatives with respect to the parameters is
Note that whilst the fit initially gets progressively better, the search will typically get trapped in a local minimum and therefore fail to pass exactly through the points.
It returns a function that expects a target value
The
The
Note that our
It then uses the
The square of the Jacobian matrix is stored in the
The
If the function to be inverted is scalar valued then the Jacobian matrix is equivalent to a vector of its derivatives. We can consequently simplify the implementation by requiring
The
Finally, the
Program 3 demonstrates how to use
As before, we see that the final parameters produce a better fit than those we started with but are still prone to falling into local minima and consequently failing to pass exactly though the sampled points, which is as good an observation to conclude with as any.
[2] Regressive Tendencies, www.thusspakeak.com, 2020.
[3] Gill, P., Murray, W. and Wright, M., Practical Optimization, Academic Press, 1981.
[4] Conquering The Eigen, www.thusspakeak.com, 2014.
[5] The Spectral Apparition, www.thusspakeak.com, 2020.
For a multivariate vector valued function \(\mathbf{f}\) and a target vector value \(\mathbf{y}\) we can express the inversion problem as the search for a vector \(\mathbf{x}^\ast\) that minimises \(\mathbf{f}(\mathbf{x})  \mathbf{y}\), where the vertical bars represent the length of the vector between them, being the distance between \(\mathbf{f}(\mathbf{x})\) and \(\mathbf{y}\). Equivalently, we can seek to minimise the function
\[
g(\mathbf{x}) = \frac12 \times \sum_i \left(f_i(\mathbf{x})  y_i\right)^2
\]
which is equal to half of the square of their distance.To see how this relates to regression, suppose that we have a set of points \(\mathbf{x}_i\) and associated scalar values \(y_i\) that we wish to approximate with a function \(f_\mathbf{p}\), where \(\mathbf{p}\) is a vector of parameters which, much like the coefficients of a polynomial, control its particular shape within the constraints of its general form. The best set of parameters \(\mathbf{p}^\ast\) minimises the sum of squared errors
\[
\sum_i \left(f_\mathbf{p}(\mathbf{x}_i)  y_i\right)^2
\]
which we can express as
\[
\sum_i \left(f_i(\mathbf{p})  y_i\right)^2
\]
by defining
\[
f_i(\mathbf{p}) = f_\mathbf{p}(\mathbf{x}_i)
\]
The GaussNewton Method
Given an initial point \(\mathbf{x}\), we can find \(\mathbf{x}^\ast\) with a step \(\boldsymbol{\delta}\) such that \(g(\mathbf{x} + \boldsymbol{\delta})\) is minimised. If we approximate the result with the second order multivariate Taylor series
\[
\hat{g}(\mathbf{x}+\boldsymbol{\delta}) = g(\mathbf{x}) + \sum_j \delta_j \times \frac{\partial g}{\partial x_j}(\mathbf{x}) + \frac12 \times \sum_j \sum_k \delta_j \times \delta_k \times \frac{\partial^2 g}{\partial x_j \partial x_k}(\mathbf{x})
\]
where the fractions with \(\partial\) symbols are partial derivatives, then its minimum with respect to \(\boldsymbol{\delta}\) should yield a step that moves closer to \(\mathbf{x}^\ast\). That minimum is given by a \(\boldsymbol{\delta}\) satisfying the matrixvector equation
\[
\left(\mathbf{Q} + \mathbf{J}^\mathrm{T} \times \mathbf{J}\right) \times \boldsymbol{\delta} = \mathbf{J}^\mathrm{T} \times (\mathbf{f}(\mathbf{x})  \mathbf{y})
\]
where the \(\mathrm{T}\) superscript represents the transpose of a matrix, in which the rows and columns are swapped, \(\mathbf{Q}\) equals
\[
\sum_i \mathbf{H}^i \times \left(f_i(\mathbf{x})  y_i\right)
\]
\(\mathbf{J}\) is the Jacobian matrix of \(\mathbf{f}\) having the elements
\[
J_{i,j} = \frac{\partial f_i}{\partial x_j}(\mathbf{x})
\]
and \(\mathbf{H}^i\) is the Hessian matrix of \(f_i\) with elements
\[
H^i_{j,k} = \frac{\partial^2 f_i}{\partial x_j \partial x_k}(\mathbf{x})
\]
as proven in derivation 1.
Derivation 1: The Minimising Step
Firstly, we have
\[
\frac{\partial \hat{g}}{\partial \delta_k}(\mathbf{x}+\boldsymbol{\delta})
= \frac{\partial g}{\partial x_k}(\mathbf{x}) + \sum_j \delta_j \times \frac{\partial^2 g}{\partial x_j \partial x_k}(\mathbf{x})
\]
and
\[
\begin{align*}
\frac{\partial g}{\partial x_k}(\mathbf{x})
&= \sum_i \frac{\partial f_i}{\partial x_k}(\mathbf{x}) \times \left(f_i(\mathbf{x})  y_i\right)\\
\frac{\partial^2 g}{\partial x_j \partial x_k}(\mathbf{x})
&= \sum_i \frac{\partial^2 f_i}{\partial x_j \partial x_k}(\mathbf{x}) \times \left(f_i(\mathbf{x})  y_i\right)
+ \frac{\partial f_i}{\partial x_k}(\mathbf{x}) \times \frac{\partial f_i}{\partial x_j}(\mathbf{x})
\end{align*}
\]
To find the \(\boldsymbol{\delta}\) that minimises \(\hat{g}\) we require its derivatives to equal zero, so that
\[
\sum_i \frac{\partial f_i}{\partial x_k}(\mathbf{x}) \times \left(f_i(\mathbf{x})  y_i\right)
+ \sum_j \delta_j \times \sum_i \frac{\partial^2 f_i}{\partial x_j \partial x_k}(\mathbf{x}) \times \left(f_i(\mathbf{x})  y_i\right) + \frac{\partial f_i}{\partial x_k}(\mathbf{x}) \times \frac{\partial f_i}{\partial x_j}(\mathbf{x}) = 0
\]
Defining
\[
\begin{align*}
\mathbf{a} &= \mathbf{J}^T \times (\mathbf{f}(\mathbf{x})  \mathbf{y})\\
\mathbf{b} &= \left(\mathbf{Q} + \mathbf{J}^\mathrm{T} \times \mathbf{J}\right) \times \boldsymbol{\delta}
\end{align*}
\]
we have
\[
\begin{align*}
a_k &= \sum_i J^\mathrm{T}_{k,i} \times \left(f_i(\mathbf{x})  y_i\right)
= \sum_i J_{i,k} \times \left(f_i(\mathbf{x})  y_i\right)\\
b_k &= \sum_j \left(\left(Q_{k,j} + \sum_i J^\mathrm{T}_{k,i} \times J_{i,j}\right)\right) \times \delta_j
= \sum_j \delta_j \times \sum_i H^i_{k,j} \times \left(f_i(\mathbf{x})  y_i\right) + J_{i,k} \times J_{i,j}
\end{align*}
\]
and so \(\boldsymbol{\delta}\) must satisfy
\[
\mathbf{b} = \mathbf{a}
\]
If the second derivatives of the elements of \(\mathbf{f}\) are small or if \(\mathbf{x}\) is close to \(\mathbf{x}^\ast\), so that \(\mathbf{f}(\mathbf{x})\) is close to \(\mathbf{y}\), then \(\mathbf{Q}\) will be small and we can reasonably approximate the matrixvector equation by discarding it with
\[
\mathbf{J}^\mathrm{T} \times \mathbf{J} \times \boldsymbol{\delta} = \mathbf{J}^\mathrm{T} \times (\mathbf{f}(\mathbf{x})  \mathbf{y})
\]
so that we only need to calculate the first derivatives, yielding
\[
\boldsymbol{\delta} = \left(\mathbf{J}^\mathrm{T} \times \mathbf{J}\right)^{1} \times \mathbf{J}^\mathrm{T} \times (\mathbf{f}(\mathbf{x})  \mathbf{y})
\]
Rather than take a step of \(\boldsymbol{\delta}\) we would perform a univariate search for a minimum in its direction, known as the GaussNewton direction. The GaussNewton method^{[3]} iteratively repeats this process with
\[
\begin{align*}
\boldsymbol{\delta}_i &= \left(\mathbf{J}_i^\mathrm{T} \times \mathbf{J}_i\right)^{1} \times \mathbf{J}_i^\mathrm{T} \times (\mathbf{f}\left(\mathbf{x_i}\right)  \mathbf{y})\\
\mathbf{x}_{i+1} &= \mathbf{x}_i + \lambda_i^\ast \times \boldsymbol{\delta}_{i}
\end{align*}
\]
where \(\lambda_i^\ast\) is the value that minimises \(\mathbf{f}\) in the direction of \(\boldsymbol{\delta}_i\) from \(\mathbf{x}_i\), until the difference between \(\mathbf{f}\left(\mathbf{x}_i\right)\) and \(\mathbf{y}\), or the magnitude of step, is sufficiently small.Recall that
\[
\left(\mathbf{J}^\mathrm{T} \times \mathbf{J}\right)^{1} \times \mathbf{J}^\mathrm{T}
\]
is the left inverse of \(\mathbf{J}\), satisfying
\[
\left(\left(\mathbf{J}^\mathrm{T} \times \mathbf{J}\right)^{1} \times \mathbf{J}^\mathrm{T}\right) \times \mathbf{J}
= \left(\mathbf{J}^\mathrm{T} \times \mathbf{J}\right)^{1} \times \left(\mathbf{J}^\mathrm{T} \times \mathbf{J}\right)
= \mathbf{I}
\]
Unfortunately, it's quite possible that \(\mathbf{J}^\mathrm{T} \times \mathbf{J}\) will be singular, or nearly singular, in which case our ak.leftInv
will be subject to significant numerical errors. We can go some way to dealing with such matrices by using the ak.stableInv
of our ak.jacobiDecomposition
or ak.spectralDecomposition
implementations^{[4][5]} of the eigensystem which ignores those directions along which the matrix is close to singular.
The LevenbergMarquardt Method
Instead of simply removing \(\mathbf{Q}\) we can replace it with a nonnegative multiple of the identity matrix
\[
\boldsymbol{\delta} =  \left(\lambda \times \mathbf{I} + \mathbf{J}^\mathrm{T} \times \mathbf{J}\right)^{1} \times \mathbf{J}^\mathrm{T} \times (\mathbf{f}(\mathbf{x})  \mathbf{y})
\]
to yield the LevenbergMarquardt direction \(\boldsymbol{\delta}\). The iterative process
\[
\begin{align*}
\boldsymbol{\delta}_i &= \left(\lambda_i \times \mathbf{I} + \mathbf{J}_i^\mathrm{T} \times \mathbf{J}_i\right)^{1} \times \mathbf{J}_i^\mathrm{T} \times (\mathbf{f}\left(\mathbf{x_i}\right)  \mathbf{y})\\
\mathbf{x}_{i+1} &= \mathbf{x}_i + \boldsymbol{\delta}_{i}
\end{align*}
\]
is known as the LevenbergMarquardt method and replaces the univariate search for a minimum with adjustments to \(\lambda_i\). Specifically, noting that
\[
\lim_{\lambda \to \infty}\boldsymbol{\delta} = \frac{1}{\lambda} \times \mathbf{J}^\mathrm{T} \times (\mathbf{f}\left(\mathbf{x}\right)  \mathbf{y})
\]
which is an infinitesimal step along the line of steepest descent, \(\lambda_i\) blends the GaussNewton direction with it and controls the step length. When \(\mathbf{x}\) is distant from \(\mathbf{x}^\ast\) we should expect gradient descent to be better than GaussNewton and when it's close we should expect the opposite to be true. We should consequently start with a relatively large value of \(\lambda\) and reduce it as we near the target.A simple scheme is to choose values for \(\lambda_0\), a rate of decrease \(\rho_\) for when progress is being made and a rate of increase \(\rho_+\) for when it is not. Defining
\[
\begin{align*}
a_{1} &= \rho_\\
a_k &= \rho_+^k
\end{align*}
\]
we can iterate from \(k\) equal to minus one, taking trial steps with \(a_k \times \lambda_i\) until the difference between the target value and the value of the function decreases, at which point we set \(\lambda_{i+1}\) to \(a_k \times \lambda_i\) and \(x_{i+1}\) to its step's result.Program 1 uses this approach to search for a solution to a pair of randomly generated simultaneous quadratic equations from a random starting point, marking the target point in yellow, unsuccessful steps in red and successful steps in green, printing the difference from the target value and the magnitude of the step in the console at the bottom and restarting if it converges on a local minimum other than the target.
Program 1: Inverting Quadratics



An example of a parametrically defined scalar valued function and its derivatives with respect to the parameters is
\[
\begin{align*}
f(x) &= \frac{1}{n} \times \sum_{i=0}^{n1} \sin\left(a_i \times x + b_i\right)\\
\frac{\partial f}{\partial a_i}(x) &= \frac{1}{n} \times x \times \cos\left(a_i \times x + b_i\right)\\
\frac{\partial f}{\partial b_i}(x) &= \frac{1}{n} \times \cos\left(a_i \times x + b_i\right)
\end{align*}
\]
Program 2 randomly chooses the parameters and a set of values at which to evaluate the function, using the LevenbergMarquardt method to search for a new set of parameters to fit them, starting from another random choice.
Program 2: Fitting Sinusoids



Note that whilst the fit initially gets progressively better, the search will typically get trapped in a local minimum and therefore fail to pass exactly through the points.
The Implementation Of The LevenbergMarquart Method
Theak
library's implementation is ak.levenbergInverse
, defined in listing 1, which expects the function f
, its Jacobian matrix of derivatives df
, an initial value for \(\lambda\) and its rates of change \(\rho_\) and \(\rho_+\), along with an optional convergence threshold
and maximum number of steps
, defaulting to three quarters of the floating point digits and one thousand respectively.Listing 1: ak.levenbergInverse
ak.levenbergInverse = function(f, df, lambda, rhom, rhop, threshold, steps) { if(ak.nativeType(f)!==ak.FUNCTION_T) { throw new Error('invalid function in ak.levenbergInverse'); } if(ak.nativeType(df)!==ak.FUNCTION_T) { throw new Error('invalid derivative in ak.levenbergInverse'); } if(ak.nativeType(lambda)!==ak.NUMBER_T  lambda<=0  !isFinite(lambda)) { throw new Error('invalid lambda in ak.levenbergInverse'); } if(ak.nativeType(rhom)!==ak.NUMBER_T  !(rhom>0 && rhom<1)) { throw new Error('invalid rho in ak.levenbergInverse'); } if(ak.nativeType(rhop)!==ak.NUMBER_T  rhop<=1  !isFinite(rhop)) { throw new Error('invalid rho+ in ak.levenbergInverse'); } threshold = ak.nativeType(threshold)===ak.UNDEFINED_T ? Math.pow(ak.EPSILON, 0.75) : Math.abs(threshold); if(isNaN(threshold)) { throw new Error('invalid convergence threshold in ak.levenbergInverse'); } steps = ak.nativeType(steps)===ak.UNDEFINED_T ? 1000 : ak.floor(Math.abs(steps)); if(isNaN(steps)) throw new Error('invalid maximum steps in ak.levenbergInverse'); return function(y, hint) { return solve(f, df, y, hint, lambda, rhom, rhop, threshold, steps); }; }; function solve(f, df, y, x0, lambda, rhom, rhop, threshold, steps) { if(ak.type(x0)!==ak.VECTOR_T) { throw new Error('invalid hint in ak.levenbergInverse'); } if(ak.type(y)===ak.VECTOR_T) { return vectorSolve(f, df, y, x0, lambda, rhom, rhop, threshold, steps); } return scalarSolve(f, df, y, x0, lambda, rhom, rhop, threshold, steps); }
It returns a function that expects a target value
y
and a hint
for the starting point which defers to the solve
function which requires that the latter be an ak.vector
object and itself defers to the vectorSolve
and scalarSolve
functions depending upon whether the former is or not.The
vectorSolve
function is given in listing 2 and iteratively calls the vectorStep
function until either the normalised difference between the current and target values state.ady
or the normalised step length state.adx
fall within an averaged threshold.Listing 2: The vectorSolve Helper Function
function vectorSolve(f, df, y, x0, lambda, rhom, rhop, threshold, steps) { var y0 = f(x0); var state = {x:x0, y:y0, lambda:lambda, ady:ak.diff(y0, y)}; var nx = x0.dims(); var ny = y0.dims(); var step = 0; var a, i; if(!(state.ady>threshold*ny)) return x0; a = new Array(nx); for(i=0;i<nx;++i) a[i] = new Array(nx); state.dftdf = a; do { if(step++===steps) throw new Error('failure to converge in ak.levenbergInverse'); vectorStep(f, df, y, state, rhom, rhop, threshold, steps); } while(state.ady>threshold*ny && state.adx>threshold*nx); for(i=0;i<ny && !isNaN(state.y.at(i));++i); return i===ny ? state.x : ak.vector(nx, ak.NaN); }
The
vectorStep
function, defined in listing 3, first caches those values that don't change as \(\lambda\) is modified during a step of the LevenbergMarquardt method. Specifically, the Jacobian matrix of derivatives at state.x
, expected to be an ak.matrix
object, the difference between the current and target values, the product of the transpose of the former and the latter and the square of the former.Note that our
ak.vector
type makes no distinction between column and row vectors and so we use the identity
\[
\mathbf{M}^\mathrm{T} \times \mathbf{x} = \mathbf{x} \times \mathbf{M}
\]
to calculate the product of the transpose of the Jacobian and the current difference in values.It then uses the
delta
function to calculate the step for each value of \(\lambda\) until the difference is reduced, finally saving the updated values in the state
object.Listing 3: The vectorStep Helper Function
function vectorStep(f, df, y, state, rhom, rhop, threshold, steps) { var k = 0; var lambda, x1, y1, ady; state.df = df(state.x); state.dy = ak.sub(state.y, y); state.dftdy = ak.mul(state.dy, state.df); vectorSquare(state); lambda = state.lambda*rhom; if(lambda===0) lambda = state.lambda; delta(state, lambda, threshold, steps); x1 = ak.sub(state.x, state.dx); y1 = f(x1); ady = ak.diff(y1, y); while(ady>state.ady) { if(k===steps) throw new Error('failure to converge in ak.levenbergInverse'); lambda = state.lambda*Math.pow(rhop, k++); delta(state, lambda, threshold, steps); x1 = ak.sub(state.x, state.dx); y1 = f(x1); ady = ak.diff(y1, y); } state.ady = ady; state.adx = ak.diff(x1, state.x); state.x = x1; state.y = y1; state.lambda = lambda; }
The square of the Jacobian matrix is stored in the
state.dftdf
array which was initialised in vectorSolve
and is calculated by vectorSquare
in listing 4.Listing 4: The vectorSquare Helper Function
function vectorSquare(state) { var df = state.df; var dy = state.dy; var dftdf = state.dftdf; var nr = df.rows(); var nc = df.cols(); var r, c, s, k; for(r=0;r<nc;++r) { for(c=r;c<nc;++c) { s = 0; for(k=0;k<nr;++k) s += df.at(k, r) * df.at(k, c); dftdf[r][c] = dftdf[c][r] = s; } } }
The
delta
function defined in listing 5 adds \(\lambda \times \mathbf{I}\) to a copy of the square of the Jacobian matrix and uses the ak.stableDiv
overload for ak.spectralDecomposition
to calculate \(\boldsymbol{\delta}\) whilst ignoring directions in which the matrix is within the threshold
of being singular.Listing 5: The delta Helper Function
function delta(state, lambda, threshold, steps) { var dftdf = state.dftdf; var n = dftdf.length; var a = new Array(n); var i; for(i=0;i<n;++i) { a[i] = dftdf[i].slice(0); a[i][i] += lambda; } a = ak.matrix(a); a = ak.spectralDecomposition(a, threshold, steps); state.dx = ak.stableDiv(state.dftdy, a, threshold); }
If the function to be inverted is scalar valued then the Jacobian matrix is equivalent to a vector of its derivatives. We can consequently simplify the implementation by requiring
df
to return an ak.vector
and creating new versions of the helper functions that exploit this, as begun by scalarSolve
in listing 6.Listing 6: The scalarSolve Helper Function
function scalarSolve(f, df, y, x0, lambda, rhom, rhop, threshold, steps) { var y0 = f(x0); var state = {x:x0, y:y0, lambda:lambda, ady:ak.diff(y0, y)}; var nx = x0.dims(); var step = 0; var a, i; if(!(state.ady>threshold)) return x0; a = new Array(nx); for(i=0;i<nx;++i) a[i] = new Array(nx); state.dftdf = a; do { if(step++===steps) throw new Error('failure to converge in ak.levenbergInverse'); scalarStep(f, df, y, state, rhom, rhop, threshold, steps); } while(state.ady>threshold && state.adx>threshold*nx); return !isNaN(state.y) ? state.x : ak.vector(nx, ak.NaN); }
The
scalarStep
helper function follows the form of vectorStep
, although dftdy
is now a scalar multiple of the vector of derivatives.Listing 7: The scalarStep Helper Function
function scalarStep(f, df, y, state, rhom, rhop, threshold, steps) { var k = 0; var lambda, x1, y1, ady; state.df = df(state.x); state.dy = ak.sub(state.y, y); state.dftdy = ak.mul(state.dy, state.df); scalarSquare(state); lambda = state.lambda*rhom; if(lambda===0) lambda = state.lambda; delta(state, lambda, threshold, steps); x1 = ak.sub(state.x, state.dx); y1 = f(x1); ady = ak.diff(y1, y); while(ady>state.ady) { if(k===steps) throw new Error('failure to converge in ak.levenbergInverse'); lambda = state.lambda*Math.pow(rhop, k++); delta(state, lambda, threshold, steps); x1 = ak.sub(state.x, state.dx); y1 = f(x1); ady = ak.diff(y1, y); } state.ady = ady; state.adx = ak.diff(x1, state.x); state.x = x1; state.y = y1; state.lambda = lambda; }
Finally, the
scalarSquare
function given in listing 8 populates dftdf
with the outer product of the derivative vector with itself, defined for a pair of vectors \(\mathbf{x}\) and \(\mathbf{y}\) as
\[
\begin{align*}
\mathbf{M} &= \mathbf{x} \otimes \mathbf{y}\\
m_{ij} &= x_i \times y_j
\end{align*}
\]
Listing 8: The scalarSquare Helper Function
function scalarSquare(state) { var df = state.df; var dy = state.dy; var dftdf = state.dftdf; var n = df.dims(); var r, c; for(r=0;r<n;++r) { for(c=r;c<n;++c) { dftdf[r][c] = dftdf[c][r] = df.at(r) * df.at(c); } } }
Program 3 demonstrates how to use
ak.levenbergInverse
, defined in LevenbergInverse.js, by repeating the fitting of the parametrically defined function from program 2, plotting the results of the initial parameters in red and those of the final in black.
Program 3: Using ak.levenbergInverse



As before, we see that the final parameters produce a better fit than those we started with but are still prone to falling into local minima and consequently failing to pass exactly though the sampled points, which is as good an observation to conclude with as any.
References
[1] On The Shoulders Of Gradients, www.thusspakeak.com, 2014.[2] Regressive Tendencies, www.thusspakeak.com, 2020.
[3] Gill, P., Murray, W. and Wright, M., Practical Optimization, Academic Press, 1981.
[4] Conquering The Eigen, www.thusspakeak.com, 2014.
[5] The Spectral Apparition, www.thusspakeak.com, 2020.
Leave a comment