# Regressive Tendencies

Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them[1]. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points.

### The Straight And Narrow

If we draw a straight line with an independent variable $$x$$, a dependent variable $$y$$ and coefficients $$a_k$$
$y = a_0 + a_1 \times x$
though a set of points $$\left(x_i, y_i\right)$$ then the total squared vertical difference between it and them is
$\Delta = \sum_{i=0}^{n-1} \left(a_0 + a_1 \times x_i - y_i\right)^2$
where $$\sum$$ is the summation sign. A reasonable definition of the best fitting line is the one that minimises $$\Delta$$ which we can find by noting that, as a quadratic function of $$a_0$$ and $$a_1$$, it has a unique minimum at which its partial derivatives with respect to them, being those that treat the other as a constant, equal zero. They are given by
\begin{align*} \frac{\partial \Delta}{\partial a_0} &= \sum_{i=0}^{n-1} 2 \times \left(a_0 + a_1 \times x_i - y_i\right) = 2 \times a_0 \times \sum_{i=0}^{n-1} 1 + 2 \times a_1 \times \sum_{i=0}^{n-1} x_i - 2 \times \sum_{i=0}^{n-1} y_i\\ \frac{\partial \Delta}{\partial a_1} &= \sum_{i=0}^{n-1} 2 \times \left(a_0 + a_1 \times x_i - y_i\right) \times x_i = 2 \times a_0 \times \sum_{i=0}^{n-1} x_i + 2 \times a_1 \times \sum_{i=0}^{n-1} x_i^2 - 2 \times \sum_{i=0}^{n-1} x_i \times y_i \end{align*}
and so, defining
\begin{align*} \Sigma_x = \sum_{i=0}^{n-1} x_i && \Sigma_{x^2} = \sum_{i=0}^{n-1} x_i^2 && \Sigma_{xy} = \sum_{i=0}^{n-1} x_i \times y_i \end{align*}
we require that
\begin{align*} a_0 \times n + a_1 \times \Sigma_x &= \Sigma_y\\ a_0 \times \Sigma_x + a_1 \times \Sigma_{x^2} &= \Sigma_{xy} \end{align*}
Expressing this as a matrix-vector equation, we have
\begin{align*} \begin{pmatrix} n & \Sigma_x\\ \Sigma_x & \Sigma_{x^2} \end{pmatrix} \times \begin{pmatrix} a_0\\ a_1 \end{pmatrix} &= \begin{pmatrix} \Sigma_y\\ \Sigma_{x y} \end{pmatrix}\\ \begin{pmatrix} a_0\\ a_1 \end{pmatrix} = \begin{pmatrix} n & \Sigma_x\\ \Sigma_x & \Sigma_{x^2} \end{pmatrix}^{-1} &\times \begin{pmatrix} \Sigma_y\\ \Sigma_{x y} \end{pmatrix} \end{align*}
which program 1 uses to fit a line through points that randomly deviate from a linear relationship.

Program 1: Univariate Linear Regression

Now, if we define
\begin{align*} \mathbf{X} = \begin{pmatrix} 1 & x_0\\ 1 & x_1\\ \vdots & \vdots\\ 1 & x_{n-1} \end{pmatrix} && \mathbf{a} = \begin{pmatrix} a_0\\ a_1 \end{pmatrix} && \mathbf{y} = \begin{pmatrix} y_0\\ y_1\\ \vdots \\ y_{n-1} \end{pmatrix} \end{align*}
then we have
\begin{align*} \boldsymbol{\delta} &= \mathbf{X} \times \mathbf{a} - \mathbf{y} = \begin{pmatrix} a_0 + a_1 \times x_0 - y_0\\ a_0 + a_1 \times x_1 - y_1\\ \vdots\\ a_0 + a_1 \times x_{n-1} - y_{n-1} \end{pmatrix}\\ \\ \Delta &= \boldsymbol{\delta} \times \boldsymbol{\delta} = \mathbf{a} \times \mathbf{X}^\mathrm{T} \times \mathbf{X} \times \mathbf{a} - \mathbf{a} \times \mathbf{X}^\mathrm{T} \times \mathbf{y} - \mathbf{y} \times \mathbf{X} \times \mathbf{a} + \mathbf{y} \times \mathbf{y} \end{align*}
Furthermore, noting that
\begin{align*} \Delta &= \sum_{i,j} a_i \times \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)_{i,j} \times a_j - \sum_{i,j} a_i \times \mathbf{X}_{i,j} \times y_j - \sum_{i,j} y_i \times \mathbf{X}_{i,j} \times a_j + \sum_i y_i \times y_i\\ \frac{\partial\Delta}{\partial a_k} &= \sum_j \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)_{k,j} \times a_j + \sum_i a_i \times \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)_{i,k} - \sum_j \mathbf{X}_{k,j} \times y_j - \sum_i y_i \times \mathbf{X}_{i,k} \end{align*}
we can create a vector of the derivatives with
\begin{align*} \frac{\partial\Delta}{\partial\mathbf{a}} &= \mathbf{X}^\mathrm{T} \times \mathbf{X} \times \mathbf{a} + \mathbf{a} \times \mathbf{X}^\mathrm{T} \times \mathbf{X} - \mathbf{X}^\mathrm{T} \times \mathbf{y} - \mathbf{y} \times \mathbf{X}\\ &= 2 \times \mathbf{X}^\mathrm{T} \times \mathbf{X} \times \mathbf{a} - 2 \times \mathbf{X}^\mathrm{T} \times \mathbf{y} \end{align*}
To set its elements to zero we require
$\mathbf{X}^\mathrm{T} \times \mathbf{X} \times \mathbf{a} = \mathbf{X}^\mathrm{T} \times \mathbf{y}$
which is solved by
$\mathbf{a} = \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)^{-1} \times \mathbf{X}^\mathrm{T} \times \mathbf{y}$
The matrix in this expression is the left inverse of $$\mathbf{X}$$, satisfying
$\left(\left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)^{-1} \times \mathbf{X}^\mathrm{T}\right) \times \mathbf{X} = \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right)^{-1} \times \left(\mathbf{X}^\mathrm{T} \times \mathbf{X}\right) = \mathbf{I}$
and program 2 uses our ak.leftInv implementation of it to fit the line from an ak.matrix x and an ak.vector y.

Program 2: Using The Left Inverse

### Generalising The Regression

We can easily extend this to higher dimensions by introducing additional independent variables and coefficients with
\begin{align*} \mathbf{X} = \begin{pmatrix} 1 & x_{0,0} & x_{0,1} & \dots & x_{0,k-1}\\ 1 & x_{1,0} & x_{1,1} & \dots & x_{1,k-1}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n-1,0} & x_{n-1,1} & \dots & x_{n-1,k-1} \end{pmatrix} && \mathbf{a} = \begin{pmatrix} a_0\\ a_1\\ a_2\\ \vdots\\ a_k \end{pmatrix} && \mathbf{y} = \begin{pmatrix} y_0\\ y_1\\ \vdots \\ y_{n-1} \end{pmatrix} \end{align*}
which program 3 demonstrates by solving for the coefficients of a nine dimensional linear function with random noise and comparing them to the known coefficients, represented as $$a^\ast$$.

Program 3: Higher Dimensions

This can be further generalised to functions of vector independent variables
\begin{align*} \mathbf{X} = \begin{pmatrix} f_0(\mathbf{x}_0) & f_1(\mathbf{x}_0) & f_2(\mathbf{x}_0) & \dots & f_k(\mathbf{x}_0)\\ f_0(\mathbf{x}_1) & f_1(\mathbf{x}_1) & f_2(\mathbf{x}_1) & \dots & f_k(\mathbf{x}_1)\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ f_0(\mathbf{x}_{n-1}) & f_1(\mathbf{x}_{n-1}) & f_2(\mathbf{x}_{n-1}) & \dots & f_k(\mathbf{x}_{n-1})\\ \end{pmatrix} && \mathbf{a} = \begin{pmatrix} a_0\\ a_1\\ a_2\\ \vdots\\ a_k \end{pmatrix} && \mathbf{y} = \begin{pmatrix} y_0\\ y_1\\ \vdots \\ y_{n-1} \end{pmatrix} \end{align*}
by noting that it can be recovered with
$f_i(\mathbf{x}) = \begin{cases} 1 & i=0\\ x_{i-1} & \text{otherwise} \end{cases}$
In this context these functions are known as regressors rather than basis functions and can be used to find non-linear relationships between the dependent and independent variables. For example, if we have reason to believe that their true relationship is the exponential
$y = e^{a_0} \times x^{a_1}$
then we can use logarithms to transform it into the linear equation
$\ln y = a_0 + a_1 \times \ln x$
which we can fit through multiplicatively noisy data with
\begin{align*} \mathbf{X} = \begin{pmatrix} 1 & \ln x_0\\ 1 & \ln x_1\\ \vdots & \vdots\\ 1 & \ln x_{n-1} \end{pmatrix} && \mathbf{a} = \begin{pmatrix} a_0\\ a_1 \end{pmatrix} && \mathbf{y} = \begin{pmatrix} \ln y_0\\ \ln y_1\\ \vdots \\ \ln y_{n-1} \end{pmatrix} \end{align*}
as demonstrated by program 4 which multiplies the elements of $$\mathbf{y}$$ by random exponentials.

Program 4: Using A Logarithmic Regressor

Since our solution for the regression only required linearity for the coefficients, not for the independent variables, we're free to choose any regressors that we want. In practice, however, we need to be careful not to fool ourselves into fitting the assumed random errors. For example, if we have $$n$$ points then using the regressors
$f_i(x) = x^i$
for $$i$$ from zero to $$n-1$$ is equivalent to a polynomial interpolation and so the regression will pass exactly through the values at the points, noise and all.
We shall therefore need to choose them so that they reflect a known, or at least strongly suspected, underlying relationship between the dependent and independent variables with unknown coefficients or, given that they're applied to arbitrary functions of the independent variables rather than the variables themselves, weights. If there's only one of the latter then we might try to do so by plotting the values at the points and judging by eye whether they follow a trend such as linear, polynomial or exponential. Better would be to find a theoretical justification for a relationship between them which, given how difficult it is to spot trends in more than one dimension, is just what we'll need to do when there are more than one independent variable.

### The ak Implementation

The ak.linearRegression function defined in listing 1 first dispatches to a constructors object to initialise its state, then calls the makeWeights helper function to calculate the regressor weights before defining the function to calculate the sum of the weighted regressors and adding to it property accessor methods for the nodes, the regressors and their weights.

Listing 1: ak.linearRegression
ak.linearRegression = function() {
var state = {nodes:[], regressors:[], weights:[]};
var arg0  = arguments[0];
var f;

constructors[ak.nativeType(arg0)](state, arg0, arguments);
makeWeights(state);

f = function(x) {
var n = state.regressors.length;
var y = 0;
while(n-->0) y += state.weights[n]*state.regressors[n](x);
return y;
};
f.nodes = function() {return copyNodes(state.nodes);};
f.regressors = function() {return state.regressors.slice(0);};
f.weights = function() {return state.weights.slice(0);};
return Object.freeze(f);
};

var constructors = {};

Note that we don't check the type of the function's argument since the regressors' arguments are expected to match it.

The first constructor expects an array of values of the independent variable x, an array of associated values of the dependent variable y and an array of regressors f, as shown by listing 2.

Listing 2: The Array Constructor
constructors[ak.ARRAY_T] = function(state, arg0, args) {
var arg1 = args[1];
constructors[ak.ARRAY_T][ak.nativeType(arg1)](state, arg0, arg1, args);
};

constructors[ak.ARRAY_T][ak.ARRAY_T] = function(state, arg0, arg1, args) {
var arg2 = args[2];
constructors[ak.ARRAY_T][ak.ARRAY_T][ak.nativeType(arg2)](state, arg0, arg1, arg2);
};

constructors[ak.ARRAY_T][ak.ARRAY_T][ak.ARRAY_T] = function(state, x, y, f) {
var n = x.length;
var i, yi;

if(y.length!==n || f.length>n) {
throw new Error('size mismatch in ak.linearRegression');
}

state.nodes.length = n;
for(i=0;i<n;++i) {
yi = Number(y[i]);
if(!isFinite(yi)) {
throw new Error('invalid node value in ak.linearRegression');
}
state.nodes[i] = {x:x[i], y:yi};
}

n = f.length;
for(i=0;i<n;++i) {
if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
throw new Error('invalid regressor in ak.linearRegression');
}
}
state.regressors = f.slice(0);
};

Here we're first checking that there are as many dependent values as independent points and that the regression isn't over-specified by having more regressors than them. We then initialise the state object's nodes with objects containing x and y properties representing the points and their associated values, checking tha the latter are valid numbers and that the regressors are functions. Once again we don't check the types of the independent variable since the regressors are expected to be consistent with them.
The second supports an array of node objects and another of regressors, but is otherwise equivalent to the first.

Listing 3: The Node Constructor
constructors[ak.ARRAY_T][ak.ARRAY_T][ak.UNDEFINED_T] = function(state, nodes, f) {
var n = nodes.length;
var i, x, y;

if(f.length>n) {
throw new Error('size mismatch in ak.linearRegression');
}

state.nodes.length = n;

for(i=0;i<n;++i) {
x = nodes[i].x; if(ak.nativeType(x)===ak.FUNCTION_T) x = x();
y = nodes[i].y; if(ak.nativeType(y)===ak.FUNCTION_T) y = y();
y = Number(y);
if(!isFinite(y)) {
throw new Error('invalid node value in ak.linearRegression');
}
state.nodes[i] = {x:x, y:y};
}

n = f.length;
for(i=0;i<n;++i) {
if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
throw new Error('invalid regressor in ak.linearRegression');
}
}
state.regressors = f.slice(0);
};

The final constructor requires an array of independent variable values, a function to compute the dependent variable's values from them and, as previously, an array of regressors.

Listing 4: The Function Constructor
constructors[ak.ARRAY_T][ak.FUNCTION_T] = function(state, arg0, arg1, args) {
var arg2 = args[2];
constructors[ak.ARRAY_T][ak.FUNCTION_T][ak.nativeType(arg2)](state, arg0, arg1, arg2);
};

constructors[ak.ARRAY_T][ak.FUNCTION_T][ak.ARRAY_T] = function(state, x, y, f) {
var n = x.length;
var i, xi, yi;

if(f.length>n) {
throw new Error('size mismatch in ak.linearRegression');
}

state.nodes.length = n;
for(i=0;i<n;++i) {
xi = x[i];
yi = Number(y(xi));
if(!isFinite(yi)) {
throw new Error('invalid node value in ak.linearRegression');
}
state.nodes[i] = {x:xi, y:yi};
}

n = f.length;
for(i=0;i<n;++i) {
if(ak.nativeType(f[i])!==ak.FUNCTION_T) {
throw new Error('invalid regressor in ak.linearRegression');
}
}
state.regressors = f.slice(0);
};

The makeWeights function implemented in listing 5 follows the implementation that we tested above, storing the results of the regressors in the array a before applying the left inverse of an ak.matrix initialised from it to an ak.vector initialised from y to calculate the weights.

Listing 5: The makeWeights Helper Function
function makeWeights(state) {
var rows = state.nodes.length;
var cols = state.regressors.length;
var y = new Array(rows);
var a = new Array(rows*cols);
var i = 0;
var r, c, x;

for(r=0;r<rows;++r) {
x = state.nodes[r].x;
y[r] = state.nodes[r].y;
for(c=0;c<cols;++c) a[i++] = state.regressors[c](x);
}

y = ak.vector(y);
a = ak.matrix(rows, cols, a);
state.weights = ak.mul(ak.leftInv(a), y).toArray();
}

The copyNodes function simply creates an array of new objects that have the same x and y property values as that of the state object's nodes, as shown by listing 6.

Listing 6: The copyNodes Helper Function
function copyNodes(nodes) {
var n = nodes.length;
var copy = new Array(n);
var i;

for(i=0;i<n;++i) copy[i] = {x:nodes[i].x, y:nodes[i].y};
return copy;
}

The most common use of linear regression is, rather unsurprisingly, to fit linear functions to data and so it makes sense to implement a function to create linear regressors to do so, as is done by ak.linearRegressors in listing 7.

Listing 7: Linear Regressors
ak.linearRegressors = function(n) {
var f = [];
switch(ak.nativeType(n)) {
case ak.NUMBER_T:
if(n!==ak.floor(n) || n<0) {
throw new Error('invalid argument in ak.linearRegressors');
}
f.length = n+1;
makeMultiRegressors(n-1, f);
break;

case ak.UNDEFINED_T:
f.length = 2;
f[1] = function(x) {return x;};
f[0] = function(x) {return 1;};
break;

default:
throw new Error('invalid argument in ak.linearRegressors');
}
return f;
};

If no argument is passed to it then we assume that the independent variable is a scalar and create an array of two regressors that return one and it respectively. If it's passed a number then we assume that it represents the length of a vector independent variable and forward to the makeMultiRegressors helper function defined in listing 8 to populate an array of regressors that return one and each of the vector's elements in turn.

Listing 8: Multiple Dimensions
function makeMultiRegressors(n, f) {
if(n===-1) {
f[0] = function(x) {return 1;};
}
else {
f[n+1] = function(x) {return x.at(n);};
makeMultiRegressors(n-1, f);
}
}

Note that it uses recursion to ensure that each regressor captures a unique index n whose value never changes. If we used a for loop then they would all refer to its iterator which would be equal to its terminating by the time that we tried to use them.

Program 5 demonstrates the use of ak.linearRegression and ak.linearRegressors, both of which are implemented in LinearRegression.js, for a scalar independent variable

Program 5: Using ak.linearRegression For Scalars

and program 6 demonstrates their use for vector independent variables.

Program 6: Using ak.linearRegression For Vectors

Finally, program 7 demonstrates how to use ak.linearRegression with user defined regressors by using the constant r0 and logarithmic r1 to fit an exponential relationship between the dependent and independent variables.

Program 7: User Defined Regressors

Note that we're explicitly taking logarithms of the noisy yi values of the dependent variable.

There's plenty more to cover on the subject of regression, but in the next post we shall take a look at an entirely different approach to approximating the values of functions with random errors.

### References

[1] All Your Basis Are Belong To Us, www.thusspakeak.com, 2020.