The Matrix Recoded

| 0 Comments

Last time[1] we discussed vectors, which in some sense extend the notion of number into multiple dimensions. Unfortunately they don't adequately extend the notion of arithmetic into multiple dimensions; for that, we also need matrices. Together, they comprise the building blocks of linear algebra.
Matrices represent linear transformations of vectors in the sense that multiplying a vector by a matrix results in a vector whose elements are linear combinations of those of the original vector. For example
\[ \begin{pmatrix} a & b\\ c & d \end{pmatrix} \times \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} ax + by\\ cx + dy \end{pmatrix} \]
An important thing to note is that if the matrix and vector swap places then the result is different, unlike multiplication with numbers
\[ \begin{pmatrix} x & y \end{pmatrix} \times \begin{pmatrix} a & b\\ c & d \end{pmatrix} = \begin{pmatrix} ax + cy\\ bx + dy \end{pmatrix} \]
Here we're using the notational convention of representing the vector with a row of elements rather than with a column when post multiplying by a matrix. It is important to note that this is purely a notational convention; there is no mathematical difference between a row vector and a column vector.
Now, pre-multiplication of a vector by a matrix is only defined if the vector has the same number of elements as the matrix has columns. Similarly, post-multiplication requires it to have the same number of elements as the matrix has rows. This noted, if we denote the element in the \(i^\mathrm{th}\) row and \(j^\mathrm{th}\) column of a matrix \(\mathbf{M}\) as \(m_{ij}\) and the \(i^\mathrm{th}\) element of a vector \(\mathbf{v}\) as \(v_i\), we can express the rules of vector-matrix multiplication with
\[ \begin{align*} (\mathbf{M} \times \mathbf{v})_i &= \sum_j m_{ij} \times v_j\\ (\mathbf{v} \times \mathbf{M})_j &= \sum_i v_i \times m_{ij} \end{align*} \]
where \(\sum\) is the summation sign.
The rules for matrix-matrix multiplication are similarly expressed with
\[ (\mathbf{M} \times \mathbf{N})_{ij} = \sum_k m_{ik} \times n_{kj} \]
each term of which can thought of as the vector product of a row of the left hand side with a column of the right hand side; hence the notational convention for vector-matrix multiplication. For example
\[ \left( \frac{\begin{matrix}a & b\end{matrix}}{\begin{matrix}c & d\end{matrix}} \right) \times \left( \begin{matrix} w\\ y \end{matrix} \bigg| \begin{matrix} x\\ z \end{matrix} \right) = \begin{pmatrix} aw + by & ax + bz\\ cw + dy & cx + dz \end{pmatrix} \]
Here the horizontal and vertical bars serve as a visual cue for dividing up the matrix product into a series of vector products. Note that a consequence of the definition of matrix-matrix multiplication is that it is only defined if the left hand term has the same number of columns as the right hand term has rows.

Now the notational convention for matrix-vector multiplication is suggestive of a new way of multiplying vectors. The usual definition of vector multiplication can be represented as
\[ \begin{pmatrix} x_0 & x_1 \end{pmatrix} \times \begin{pmatrix} y_0\\ y_1 \end{pmatrix} = \begin{pmatrix} x_0 y_0 + x_1 y_1 \end{pmatrix} \]
whose only element is the expected result.
If we reverse the row and column representations, we have
\[ \begin{pmatrix} x_0\\ x_1 \end{pmatrix} \times \begin{pmatrix} y_0 & y_1 \end{pmatrix} = \begin{pmatrix} x_0 y_0 & x_0 y_1\\ x_1 y_0 & x_1 y_1 \end{pmatrix} \]
which is known as the outer product of a pair of vectors, as opposed to the normal rule which is known as the inner product, and is written \(\mathbf{v}_1 \otimes \mathbf{v}_2\).

Addition and subtraction of matrices and multiplication and division by scalars proceed in much the same fashion as for vectors; element by element
\[ \begin{align*} (\mathbf{M} + \mathbf{N})_{ij} &= m_{ij} + n_{ij}\\ (\mathbf{M} - \mathbf{N})_{ij} &= m_{ij} - n_{ij}\\ \\ (\mathbf{M} \times x)_{ij} &= m_{ij} \times x\\ (x \times \mathbf{M})_{ij} &= m_{ij} \times x\\ \\ (\mathbf{M} \div x)_{ij} &= m_{ij} \div x \end{align*} \]
For example
\[ \begin{align*} \begin{pmatrix} a & b\\ c & d \end{pmatrix} + \begin{pmatrix} w & x\\ y & z \end{pmatrix} &= \begin{pmatrix} a+w & b+x\\ c+y & d+z \end{pmatrix} \\ \begin{pmatrix} a & b\\ c & d \end{pmatrix} - \begin{pmatrix} w & x\\ y & z \end{pmatrix} &= \begin{pmatrix} a-w & b-x\\ c-y & d-z \end{pmatrix} \end{align*} \]
and
\[ \begin{align*} \begin{pmatrix} a & b\\ c & d \end{pmatrix} \times x &= \begin{pmatrix} a \times x & b \times x\\ c \times x & d \times x \end{pmatrix} \\ \begin{pmatrix} a & b\\ c & d \end{pmatrix} \div x &= \begin{pmatrix} a \div x & b \div x\\ c \div x & d \div x \end{pmatrix} \end{align*} \]

A Question Of Identity

Square matrices whose elements are all equal to zero except on the leading diagonal, the set of elements whose position in their row is the same as that in their column, where they are equal to one are known as identity matrices, represented as \(\mathbf{I}\), and are the multiplicative units of linear algebra. For example
\[ \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} \times \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} x & y \end{pmatrix} \times \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 \times x + 0 \times y\\ 0 \times x + 1 \times y \end{pmatrix} = \begin{pmatrix} x\\ y \end{pmatrix} \]
Now that we have a multiplicative unit, we can define the inverse of a matrix \(\mathbf{M}\) with
\[ \mathbf{M}^{-1} \times \mathbf{M} = \mathbf{M} \times \mathbf{M}^{-1} = \mathbf{I} \]
For a two by two matrix, the inverse is given by
\[ \begin{pmatrix} a & b\\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix} \]
which we can easily confirm with
\[ \begin{align*} \frac{1}{ad-bc} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix} \times \begin{pmatrix} a & b\\ c & d \end{pmatrix} &= \frac{1}{ad-bc} \begin{pmatrix} da - bc & db - bd\\ -ca + ac & -cb + ad \end{pmatrix}\\ &= \frac{1}{ad-bc} \begin{pmatrix} ad - bc & 0\\ 0 & ad - bc \end{pmatrix}\\ &= \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} \end{align*} \]
The denominator of the fraction, \(ad-bc\), is known as the determinant of the matrix and must be non-zero for the matrix to have an inverse. Matrices with determinants of zero are known as singular matrices and project vectors onto a lower dimensional space.
Note that matrices with very small determinants, known as near singular matrices, whilst mathematically having inverses will present significant numerical difficulties when trying to compute their inverses. This is because they squash vectors down to a volume that is very thin in some directions, losing many significant digits in those directions through cancellation error.

Now, one useful application of inverses is for the solution of systems of simultaneous linear equations, which can be expressed with matrices and vectors as
\[ \mathbf{A} \times \mathbf{x} = \mathbf{b} \]
for a known matrix \(\mathbf{A}\) and vector \(\mathbf{b}\) and an unknown vector \(\mathbf{x}\), and solved with
\[ \mathbf{x} = \mathbf{A}^{-1} \times \mathbf{b} \]
For example, we can rewrite the simultaneous equations
\[ \begin{align*} 8x + 3y &= 2\\ 4x - 2y &= 8 \end{align*} \]
as
\[ \begin{pmatrix} 8 & 3\\ 4 & -2 \end{pmatrix} \times \begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} 2\\ 8 \end{pmatrix} \]
and can solve them with
\[ \begin{array}{lclcl} \begin{pmatrix} x\\ y \end{pmatrix} &=& \begin{pmatrix} 8 & 3\\ 4 & -2 \end{pmatrix}^{-1} \times \begin{pmatrix} 2\\ 8 \end{pmatrix} &=& -\frac{1}{28} \begin{pmatrix} -2 & -3\\ -4 & 8 \end{pmatrix} \times \begin{pmatrix} 2\\ 8 \end{pmatrix}\\ &=& -\frac{1}{28} \begin{pmatrix} -4 - 24\\ -8 + 64 \end{pmatrix} &=& -\frac{1}{28} \begin{pmatrix} -28\\ 56 \end{pmatrix}\\ &=& \begin{pmatrix} 1\\ -2 \end{pmatrix} \end{array} \]

A Pivotal Algorithm

To invert square matrices of any size we can use a variant of Gaussian elimination, which you may recall we exploited when we used polynomials to approximate the derivative of a function[2]. To do this we manipulate the rows of a matrix using a set of operations to transform it into an identity matrix. By applying exactly the same set of operations to an identity matrix we can recover the inverse.
Specifically, the row operations that we shall use are
  1. Swap two rows
  2. Divide a row by a constant
  3. Add a constant multiple of one row to another
For a \(n\) by \(n\) matrix, the algorithm proceeds by iterating over the indices \(i\), performing the following steps
  1. Swap the \(i^\mathrm{th}\) row with the one, if any, below it with the absolute largest value in the \(i^\mathrm{th}\) column
  2. Divide the \(i^\mathrm{th}\) row by its \(i^\mathrm{th}\) element, or pivot
  3. Subtract from the other rows the \(i^\mathrm{th}\) row multiplied by the element in their \(i^\mathrm{th}\) column
To illustrate, let's apply the algorithm to
\[ \begin{pmatrix} 2 & 2 & 1\\ 4 & 0 & -1\\ -1 & 3 & 2 \end{pmatrix} \]
First off we need to apply the same operations to an identity matrix, so we write it out to the right
\[ \begin{pmatrix} 2 & 2 & 1\\ 4 & 0 & -1\\ -1 & 3 & 2 \end{pmatrix} \quad \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{pmatrix} \]
For the first index we find that the second row has the largest absolute value, so we swap them
\[ \begin{pmatrix} 4 & 0 & -1\\ 2 & 2 & 1\\ -1 & 3 & 2 \end{pmatrix} \quad \begin{pmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1 \end{pmatrix} \]
Note that we do the same with the right hand side. Next we divide the new first rows by 4
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 2 & 2 & 1\\ -1 & 3 & 2 \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 1 & 0 & 0\\ 0 & 0 & 1 \end{pmatrix} \]
Now we subtract twice the first rows from the second rows
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 2 & 1\tfrac{1}{2}\\ -1 & 3 & 2 \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 1 & -\tfrac{1}{2} & 0\\ 0 & 0 & 1 \end{pmatrix} \]
and then add the first rows to the third rows
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 2 & 1\tfrac{1}{2}\\ 0 & 3 & 1\tfrac{3}{4} \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 1 & -\tfrac{1}{2} & 0\\ 0 & \tfrac{1}{4} & 1 \end{pmatrix} \]
For the second index we swap the second and third rows
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 3 & 1\tfrac{3}{4}\\ 0 & 2 & 1\tfrac{1}{2} \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 0 & \tfrac{1}{4} & 1\\ 1 & -\tfrac{1}{2} & 0 \end{pmatrix} \]
and divide the second rows by 3
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 1 & \tfrac{7}{12}\\ 0 & 2 & 1\tfrac{1}{2} \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 0 & \tfrac{1}{12} & \tfrac{1}{3}\\ 1 & -\tfrac{1}{2} & 0 \end{pmatrix} \]
Since the first row already has a zero in the second column we only need to subtract twice the second rows from the third rows
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 1 & \tfrac{7}{12}\\ 0 & 0 & \tfrac{1}{3} \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 0 & \tfrac{1}{12} & \tfrac{1}{3}\\ 1 & -\tfrac{2}{3} & -\tfrac{2}{3} \end{pmatrix} \]
There are no rows to swap with the third, so we begin by dividing them by ⅓
\[ \begin{pmatrix} 1 & 0 & -\tfrac{1}{4}\\ 0 & 1 & \tfrac{7}{12}\\ 0 & 0 & 1 \end{pmatrix} \quad \begin{pmatrix} 0 & \tfrac{1}{4} & 0\\ 0 & \tfrac{1}{12} & \tfrac{1}{3}\\ 3 & -2 & -2 \end{pmatrix} \]
and finally add a quarter times the third rows to the first rows and subtract seven twelfths times the third from the second
\[ \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{pmatrix} \quad \begin{pmatrix} \tfrac{3}{4} & -\tfrac{1}{4} & -\tfrac{1}{2}\\ -1\tfrac{3}{4} & 1\tfrac{1}{4} & 1\tfrac{1}{2}\\ 3 & -2 & -2 \end{pmatrix} \]
It's a simple matter to confirm that the right hand matrix is indeed the inverse of the original matrix by multiplying them and checking that the result is the identity matrix.
The reason why this algorithm works is that each of the row operations is equivalent to a matrix multiplication. By applying them to both the left and right hand sides we are effectively multiplying both by the inverse of the left hand side and since the right hand side is originally the identity it must end up as the inverse.

Now, if you've got your algorithmic analysis hat planted firmly upon your head, you will have realised that we choose the row with the largest absolute value so as to minimise the effects of cancellation error; if a small value is the result of a subtraction involving two similar numbers, which may very well be the case, dividing by it will introduce significant errors, to say nothing of the possibility of dividing by zero. You may also have noticed that there's no need to actually update the columns of the left hand matrix up to and including that of the pivot since they play no further part in the calculation. We shall exploit this minor optimisation when it comes to the implementation.
Despite these numerical precautions and optimisations this is a relatively crude algorithm for matrix inversion and we shall return to the problem with more sophisticated approaches in later posts.

Note that we can also use this algorithm to calculate the determinant of a matrix by multiplying the pivots and negating the product each time we swap a row. The reason for this is that the determinant of a matrix whose non-leading diagonal elements are all zero is simply the product of the leading diagonal elements, that swapping two rows in a matrix negates its determinant and that adding a multiple of one row to another leaves it unchanged (try it out with a two by two matrix to get a feel for these properties).
For the above example this gives us a determinant of
\[ \begin{vmatrix} 2 & 2 & 1\\ 4 & 0 & -1\\ -1 & 3 & 2 \end{vmatrix} = -1 \times 4 \times -1 \times 3 \times \tfrac{1}{3} = 4 \]

Non-Square Inverses

Whilst the inverse is only defined for square matrices, we can define something similar for non-square matrices. Specifically, for a matrix \(\mathbf{M}\) with \(m\) rows and \(n\) columns we have
\[ \begin{align*} \mathbf{L} \times \mathbf{M} &= \mathbf{I}_{n \times n}\phantom{\mathbf{I}_{m \times m}} \quad m > n\\ \mathbf{M} \times \mathbf{R} &= \mathbf{I}_{m \times m}\phantom{\mathbf{I}_{n \times n}} \quad m < n \end{align*} \]
where we use \(\mathbf{I}_{k \times k}\) to denote the identity matrix with \(k\) rows and columns.
To define these left and right inverses we need the transpose of a matrix, \(\mathbf{M}^\mathrm{T}\), which we get by swapping the rows and columns of a matrix \(\mathbf{M}\)
\[ \mathbf{M}^\mathrm{T}_{ij} = \mathbf{M}_{ji} \]
For example
\[ \begin{pmatrix} a & b & c\\ d & e & f \end{pmatrix}^\mathrm{T} = \begin{pmatrix} a & d\\ b & e\\ c & f \end{pmatrix} \]
Note that the transpose and the inverse have a similar property with regard to multiplication, in that
\[ \begin{align*} \left(\mathbf{M} \times \mathbf{N}\right)^{-1} &= \mathbf{N}^{-1} \times \mathbf{M}^{-1}\\ \left(\mathbf{M} \times \mathbf{N}\right)^\mathrm{T} &= \mathbf{N}^\mathrm{T} \times \mathbf{M}^\mathrm{T} \end{align*} \]
Now the product of a matrix and its transpose is trivially square and, providing it isn't singular, can therefore be inverted. We can consequently define \(\mathbf{L}\) and \(\mathbf{R}\) as
\[ \begin{align*} \mathbf{L} &= \left(\mathbf{M}^\mathrm{T} \times \mathbf{M}\right)^{-1} \times \mathbf{M}^\mathrm{T}\\ \mathbf{R} &= \mathbf{M}^\mathrm{T} \times \left(\mathbf{M} \times \mathbf{M}^\mathrm{T}\right)^{-1} \end{align*} \]
The right inverse can be used to find the RMS smallest set of variables that solve a system of linear equations in which there are fewer equations than unknowns. For example
\[ \begin{align*} 2x + 3y - 2z &= 2\\ 3x - 2y + 3z &= 4 \end{align*} \]
can be rewritten as
\[ \begin{pmatrix} 2 & 3 & -2\\ 3 & -2 & 3 \end{pmatrix} \times \begin{pmatrix} x\\ y\\ z \end{pmatrix} = \begin{pmatrix} 2\\ 4 \end{pmatrix} \]
and solved with
\[ \begin{align*} \begin{pmatrix} x\\ y\\ z \end{pmatrix} &= \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \times \left( \begin{pmatrix} 2 & 3 & -2\\ 3 & -2 & 3 \end{pmatrix} \times \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \right)^{-1} \times \begin{pmatrix} 2\\ 4 \end{pmatrix}\\ &= \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \times \begin{pmatrix} 17 & -6\\ -6 & 22 \end{pmatrix}^{-1} \times \begin{pmatrix} 2\\ 4 \end{pmatrix}\\ &= \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \times \frac{1}{338} \begin{pmatrix} 22 & 6\\ 6 & 17 \end{pmatrix} \times \begin{pmatrix} 2\\ 4 \end{pmatrix}\\ &= \frac{1}{169} \times \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \times \begin{pmatrix} 22 & 6\\ 6 & 17 \end{pmatrix} \times \begin{pmatrix} 1\\ 2 \end{pmatrix}\\ &= \frac{1}{169} \times \begin{pmatrix} 2 & 3\\ 3 & -2\\ -2 & 3 \end{pmatrix} \times \begin{pmatrix} 34\\ 40 \end{pmatrix}\\ &= \begin{pmatrix} \tfrac{188}{169}\\ \tfrac{22}{169}\\ \tfrac{52}{169} \end{pmatrix} \end{align*} \]
which can easily be confirmed to satisfy the linear equations. The RMS value of the variables is equal to the magnitude of this vector, \(\sqrt{\tfrac{228}{169}}\). Now I shan't prove that this is the smallest possible solution but shall instead just show that it is smaller than one that sets \(z\) to zero, giving \(x\) and \(y\) with
\[ \begin{align*} \begin{pmatrix} x\\ y \end{pmatrix} &= \begin{pmatrix} 2 & 3\\ 3 & -2 \end{pmatrix}^{-1} \times \begin{pmatrix} 2\\ 4 \end{pmatrix}\\ &= -\frac{1}{13} \begin{pmatrix} -2 & -3\\ -3 & 2 \end{pmatrix} \times \begin{pmatrix} 2\\ 4 \end{pmatrix}\\ &= \begin{pmatrix} \tfrac{16}{13}\\ -\tfrac{2}{13} \end{pmatrix} \end{align*} \]
which can also easily be shown to satisfy the equations and has a length of \(\sqrt{\tfrac{20}{13}}\), or \(\sqrt{\tfrac{260}{169}}\).

Note that we can now define division by square matrices and matrices with more columns than rows as pre-multiplication by the inverse and right inverse respectively.

A Few Operations More

Whilst the determinant uses the same notation as the magnitude of other arithmetic types, it isn't a particularly suitable candidate for the magnitude of a matrix. Instead we shall simply take the square root of the sum of the squares of the elements, known as the Frobenius norm.
\[ \|\mathbf{M}\| = \sqrt{\sum_i \sum_j m_{ij}^2} \]
One useful application of the norm is to detect when a matrix is near singular. Specifically, the larger the product of the norm of a matrix and that of its inverse, \(\|\mathbf{M}\| \times \|\mathbf{M}^{-1}\|\), the closer that matrix is to being singular.
Note that we can trivially define the distance between two matrices \(\mathbf{M}\) and \(\mathbf{N}\) as \(\|\mathbf{M}-\mathbf{N}\|\).

Next, we have the trace of a square matrix which is simply the sum of the elements on the leading diagonal
\[ \mathrm{tr}(\mathbf{M}) = \sum_i m_{ii} \]
Now, one property of square matrices is that if we multiply them by themselves then the result has the same number of rows and columns. This means that we can do so as many times as we wish, giving us a simple notion of an integer power of a matrix
\[ \mathbf{M}^n = \prod_{i=1}^n \mathbf{M} \]
where \(\prod\) is the product sign. Trivially, we define the zeroth power of a square matrix as the identity with the same number of rows and columns and negative powers as positive powers of the inverse so that
\[ \mathbf{M}^{n+m} = \mathbf{M}^n \times \mathbf{M}^m \]
for all integer \(n\) and \(m\), just like for numbers.

Armed with integer powers of square matrices, we can generalise our old friend Taylor's theorem to them. For example, the Taylor series of the exponential function is
\[ e^x = 1 + x + \tfrac{1}{2} x^2 + \tfrac{1}{6} x^3 + \dots + \tfrac{1}{n!} x^n + \dots \]
We can consequently define the exponential of a square matrix as
\[ e^\mathbf{M} = \mathbf{I} + \mathbf{M} + \tfrac{1}{2} \mathbf{M}^2 + \tfrac{1}{6} \mathbf{M}^3 + \dots + \tfrac{1}{n!} \mathbf{M}^n + \dots \]
which converges for all square matrices.
For example, given the matrix
\[ \mathbf{M} = \begin{pmatrix} 0 & 1 & 2\\ 0 & 0 & -1\\ 0 & 0 & 0 \end{pmatrix} \]
we have
\[ \begin{align*} \mathbf{M}^2 &= \begin{pmatrix} 0 & 1 & 2\\ 0 & 0 & -1\\ 0 & 0 & 0 \end{pmatrix} \times \begin{pmatrix} 0 & 1 & 2\\ 0 & 0 & -1\\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & -1\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{pmatrix}\\ \mathbf{M}^3 &= \begin{pmatrix} 0 & 1 & 2\\ 0 & 0 & -1\\ 0 & 0 & 0 \end{pmatrix} \times \begin{pmatrix} 0 & 0 & -1\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{pmatrix} \end{align*} \]
and so all powers higher than the second must have all elements equal to zero, giving a matrix exponent of
\[ e^\mathbf{M} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{pmatrix} + \begin{pmatrix} 0 & 1 & 2\\ 0 & 0 & -1\\ 0 & 0 & 0 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} 0 & 0 & -1\\ 0 & 0 & 0\\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1\tfrac{1}{2}\\ 0 & 1 & -1\\ 0 & 0 & 1 \end{pmatrix} \]
Note that we can consequently define a number raised to the power of a matrix with
\[ x^\mathbf{M} = e^{\ln(x) \times \mathbf{M}} \]
These are but the first of many generalisations of functions to matrices, which we shall return to in later posts.

A Matrix Class

Now that we've covered the basics of matrix arithmetic we're ready to implement our matrix class.
The typical way to store the elements of a matrix is to concatenate their rows into an array, as is done in the state member in the ak.matrix class shown in listing 1.

Listing 1: ak.matrix
ak.MATRIX_T = 'ak.matrix';

function Matrix(){}
Matrix.prototype = {TYPE: ak.MATRIX_T, valueOf: function(){return ak.NaN;}};

ak.matrix = function() {
 var m     = new Matrix();
 var state = {rows: 0, cols: 0, elems: []};
 var arg0  = arguments[0];

 constructors[ak.nativeType(arg0)](state, arg0, arguments);

 m.rows = function() {return state.rows;};
 m.cols = function() {return state.cols;};

 m.at = function(r, c) {return state.elems[Number(r)*state.cols+Number(c)];};

 m.toArray  = function() {return toArray(state.rows, state.cols, state.elems);};
 m.toString = function() {return toString(state.rows, state.cols, state.elems);};

 m.toExponential = function(d) {
  var f = function(x){return x.toExponential(d);};
  return toString(state.rows, state.cols, state.elems, f);
 };

 m.toFixed = function(d) {
  var f = function(x){return x.toFixed(d);};
  return toString(state.rows, state.cols, state.elems, f);
 };

 m.toPrecision = function(d) {
  var f = function(x){return x.toPrecision(d);};
  return toString(state.rows, state.cols, state.elems, f);
 };

 return Object.freeze(m);
};

var constructors = {};

We're using our familiar dispatch to the constructors object to initialise the matrix state. We provide access to the numbers of rows and columns with the rows and cols methods and to the elements with the at method. We also have conversion to a native array of arrays via toArray and support for the standard string conversions.

ak.matrix Constructors

A natural way to initialise a matrix is with an array of arrays. For simplicity we shall assume that all of the arrays have the same length and rely upon undefined being numerically NaN to handle the error of having arrays of differing lengths, as shown in listing 2.

Listing 2: ak.matrix Array Constructor
constructors[ak.ARRAY_T] = function(state, arr) {
 var rows = arr.length;
 var cols = rows ? arr[0].length : 0;
 var elems = new Array(rows*cols);
 var off = 0;
 var r, row, c;

 for(r=0;r<rows;++r) {
  row = arr[r];
  for(c=0;c<cols;++c) elems[off++] = Number(row[c]);
 }

 state.rows  = rows;
 state.cols  = cols;
 state.elems = elems;
};

Next up we specify the number of rows and columns and populate the elements from an array representing the elements of the rows concatenated together, as done by the elems member of the matrix state, given in listing 3.

Listing 3: ak.matrix Dimensions and Elements Constructor
constructors[ak.NUMBER_T] = function(state, rows, args) {
 var arg1 = args[1];

 state.rows = rows;
 constructors[ak.NUMBER_T][ak.nativeType(arg1)](state, arg1, args);
};

constructors[ak.NUMBER_T][ak.NUMBER_T] = function(state, cols, args) {
 var arg2 = args[2];

 state.cols = cols;
 constructors[ak.NUMBER_T][ak.NUMBER_T][ak.nativeType(arg2)](state, arg2);
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.ARRAY_T] = function(state, vals) {
 var n = state.rows*state.cols;
 var elems = new Array(n);
 var i;

 for(i=0;i<n;++i) elems[i] = Number(vals[i]);
 state.elems = elems;
};

We also allow the elements to be initialised according to their row and column positions, or all to a single value, or to zero if only the rows and columns are provided, as shown in listing 4.

Listing 4: ak.matrix Dimensions Constructors
constructors[ak.NUMBER_T][ak.NUMBER_T][ak.FUNCTION_T] = function(state, f) {
 var rows = state.rows;
 var cols = state.cols;
 var elems = new Array(rows*cols);
 var off = 0;
 var r, c;

 for(r=0;r<rows;++r) {
  for(c=0;c<cols;++c) {
   elems[off++] = Number(f(r, c));
  }
 }
 state.elems = elems;
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.NUMBER_T] = function(state, val) {
 var n = state.rows*state.cols;
 var elems = new Array(n);
 var i;

 val = Number(val);
 for(i=0;i<n;++i) elems[i] = val;
 state.elems = elems;
};

constructors[ak.NUMBER_T][ak.NUMBER_T][ak.UNDEFINED_T] = function(state) {
 var n = state.rows*state.cols;
 var elems = new Array(n);
 var i;

 for(i=0;i<n;++i) elems[i] = 0;
 state.elems = elems;
};

As usual we can initialise an ak.matrix object with one that supports a similar interface, as shown in listing 5.

Listing 5: ak.matrix Object Constructor
constructors[ak.OBJECT_T] = function(state, obj) {
 var rows = obj.rows;
 var cols = obj.cols;
 var off = 0;
 var elems, r, c;

 rows = (ak.nativeType(rows)===ak.FUNCTION_T) ? Number(rows()) : Number(rows);
 cols = (ak.nativeType(cols)===ak.FUNCTION_T) ? Number(cols()) : Number(cols);

 elems = new Array(rows*cols);

 for(r=0;r<rows;++r) {
  for(c=0;c<cols;++c) {
   elems[off++] = Number(obj.at(r, c));
  }
 }

 state.rows  = rows;
 state.cols  = cols;
 state.elems = elems;
};

It's convenient to be able to construct identity matrices without explicitly specifying the elements, so we have a constructor that we use to do so, as shown in listing 6.

Listing 6: ak.matrix Identity Constructor
constructors[ak.STRING_T] = function(state, type, args) {
 constructors[ak.STRING_T][type.toUpperCase()](state, args);
};

constructors[ak.STRING_T]['IDENTITY'] = function(state, args) {
 var arg1 = args[1];
 constructors[ak.STRING_T]['IDENTITY'][ak.nativeType(arg1)](state, arg1);
};

constructors[ak.STRING_T]['IDENTITY'][ak.NUMBER_T] = function(state, n) {
 var elems = new Array(n*n);
 var off = 0;
 var r, c;

 for(r=0;r<n;++r) {
  for(c=0;c<r;++c)   elems[off++] = 0;
  elems[off++] = 1;
  for(c=r+1;c<n;++c) elems[off++] = 0;
 }

 state.rows  = n;
 state.cols  = n;
 state.elems = elems;
};

To specify that we want to construct an identity matrix, we pass the string 'identity' as the first argument and since identity matrices are necessarily square we only need to pass one number for the rows and columns.

Another common type of matrix is the diagonal matrix which, like the identity matrix, has elements equal to zero off the leading diagonal but, unlike it, can take any values on the leading diagonal. For the sake of parsimony we shall frequently represent them with vectors and it is therefore convenient to construct diagonal matrices in the same fashion as we construct vectors. To do so we pass the string 'diagonal' as the first argument and arguments compatible with the ak.vector constructors for the rest. The latter are used to construct an ak.vector via JavaScript's Function.apply method which is then used to initialise the leading diagonal, as shown in listing 7.

Listing 7: ak.matrix Diagonal Constructor
constructors[ak.STRING_T]['DIAGONAL'] = function(state, args) {
 var a = new Array(args.length-1);
 var off = 0;
 var i, v, r, c, n, elems;

 for(i=1;i<args.length;++i) a[i-1] = args[i];
 v = ak.vector.apply(null, a);
 n = v.dims();
 elems = new Array(n*n);

 for(r=0;r<n;++r) {
  for(c=0;c<r;++c)   elems[off++] = 0;
  elems[off++] = v.at(r);
  for(c=r+1;c<n;++c) elems[off++] = 0;
 }

 state.rows  = n;
 state.cols  = n;
 state.elems = elems;
};

Note that this string identifier approach will be the mechanism that we shall use whenever we need parameterised special cases of types which, unlike the imaginary unit for example, cannot simply be ak constants.

ak.matrix Conversions

The toArray method simply forwards to a toArray function defined in listing 8.

Listing 8: toArray
function toArray(rows, cols, elems) {
 var a = new Array(rows);
 var off = 0;
 var r, row, c;

 for(r=0;r<rows;++r) {
  row = new Array(cols);
  for(c=0;c<cols;++c) row[c] = elems[off++];
  a[r] = row;
 }
 return a;
}

The various string conversion methods similarly forward to a toString function, passing a numeric string conversion function if necessary. Listing 9 shows the implementation of toString.

Listing 9: toString
function toString(rows, cols, elems, f) {
 var s = [];
 var off = 0;
 var r, c;

 s.push('[');

 for(r=0;r<rows;++r) {
  s.push('[');

  for(c=0;c<cols;++c) {
   s.push(f ? f(elems[off++]) : elems[off++]);
   s.push(',');
  }

  s.pop();
  s.push(']');
  s.push(',');
 }

 s.pop();
 s.push(']');

 return s.join('');
}

This creates a square bracketed comma separated list of square bracketed comma separated lists representing the rows. To keep the logic simple, entries are always followed by commas which are removed when adding closing brackets.

ak.matrix Overloads

The simplest of the ak.matrix overloads is the transpose, as illustrated by listing 10.

Listing 10: ak.matrix Transpose
function transpose(m) {
 return ak.matrix(m.cols(), m.rows(), function(r, c){return m.at(c, r);});
}
if(!ak.transpose) ak.transpose = function(x) {return ak.transpose[ak.type(x)](x)};
ak.overload(ak.transpose, ak.MATRIX_T, transpose);

Here we simply use the constructor that initialises the elements according to their row and column indices to swap the rows and columns. Note that since ak.transpose is not one of our core arithmetic operations[3] we must add it to the dispatch mechanism if another class hasn't already done so.
Only slightly more complicated are negation, the magnitude and the trace, given in listing 11.

Listing 11: ak.matrix Negation, Magnitude And Trace
function neg(m) {
 var rows = m.rows();
 var cols = m.cols();
 var a = new Array(rows*cols);
 var off = 0;
 var r, c;

 for(r=0;r<rows;++r) {
  for(c=0;c<cols;++c) {
   a[off++] = -m.at(r, c);
  }
 }
 return ak.matrix(rows, cols, a);
}
ak.overload(ak.neg, ak.MATRIX_T, neg);

function abs(m) {
 var s = 0;
 var rows = m.rows();
 var cols = m.cols();
 var r, c, x;

 for(r=0;r<rows;++r) {
  for(c=0;c<cols;++c) {
   x = m.at(r, c);
   s += x*x;
  }
 }
 return Math.sqrt(s);
}
ak.overload(ak.abs, ak.MATRIX_T, abs);

function tr(m) {
 var n = m.rows();
 var s = 0;
 var i;

 if(m.cols()!==n) throw new Error('non-square matrix in ak.matrix tr');

 for(i=0;i<n;++i) s += m.at(i, i);
 return s;
}
if(!ak.tr) ak.tr = function(x) {return ak.tr[ak.type(x)](x)};
ak.overload(ak.tr, ak.MATRIX_T, tr);

Next up are matrix addition and scalar multiplication, shown in listing 12.

Listing 12: ak.matrix Addition And Scalar Multiplication
function add(m0, m1) {
 var rows = m0.rows();
 var cols = m0.cols();
 var a = m0.toArray();
 var r, c, row;

 if(m1.rows()!==rows || m1.cols()!==cols) {
  throw new Error('dimensions mismatch in ak.matrix add');
 }

 for(r=0;r<rows;++r) {
  row = a[r];
  for(c=0;c<cols;++c) {
   row[c] += m1.at(r, c);
  }
 }
 return ak.matrix(a);
}
ak.overload(ak.add, [ak.MATRIX_T, ak.MATRIX_T], add);

function mulMR(m, s) {
 var rows = m.rows();
 var cols = m.cols();
 var a = m.toArray();
 var r, c, row;

 for(r=0;r<rows;++r) {
  row = a[r];
  for(c=0;c<cols;++c) row[c] *= s;
 }
 return ak.matrix(a);
}
ak.overload(ak.mul, [ak.MATRIX_T, ak.NUMBER_T], mulMR);

In both cases we convert the left hand side to a native array and update its row arrays before constructing a new ak.matrix with the result.

Program 1 illustrates the operations we have defined so far.

Program 1: Basic Matrix Operations

For matrix-vector multiplication we must distinguish between pre and post multiplication, as shown in listing 13.

Listing 13: ak.matrix-ak.vector Multiplication
function mulMV(m, v) {
 var rows = m.rows();
 var cols = m.cols();
 var dims = v.dims();
 var a = new Array(rows);
 var r, c, s;

 if(dims!==cols) throw new Error('dimensions mismatch in ak.matrix mul');

 for(r=0;r<rows;++r) {
  s = 0;
  for(c=0;c<cols;++c) s += m.at(r, c) * v.at(c);
  a[r] = s;
 }
 return ak.vector(a);
}

function mulVM(v, m) {
 var rows = m.rows();
 var cols = m.cols();
 var dims = v.dims();
 var a = new Array(cols);
 var r, c, s;

 if(dims!==rows) throw new Error('dimensions mismatch in ak.matrix mul');

 for(c=0;c<cols;++c) {
  s = 0;
  for(r=0;r<rows;++r) s += v.at(r) * m.at(r, c);
  a[c] = s;
 }
 return ak.vector(a);
}

ak.overload(ak.mul, [ak.MATRIX_T, ak.VECTOR_T], mulMV);
ak.overload(ak.mul, [ak.VECTOR_T, ak.MATRIX_T], mulVM);

These are straightforward implementations of the rules of matrix-vector multiplication as defined above. The effect of matrix multiplication on points in the unit square is illustrated by program 2.

Program 2: Matrix-Vector Multiplication

Turing up the complexity dial a notch, listing 14 implements integer powers of square matrices.

Listing 14: ak.matrix Integer Power
function powMR(m, r) {
 var n = m.rows();
 var p = ak.matrix('identity', n);
 var i;

 if(m.cols()!==n)    throw new Error('non-square matrix in ak.matrix pow');
 if(r!==ak.floor(r)) throw new Error('non-integer exponent in ak.matrix pow');

 if(r<0) {
  r = -r;
  m = inv(m);
 }

 while(r>0) {
  if(r%2===1) p = mul(p, m);
  m = mul(m, m);
  r = ak.floor(r/2);
 }
 return p;
}
ak.overload(ak.pow, [ak.MATRIX_T, ak.NUMBER_T], powMR);

This is perhaps the first non-trivial operation, albeit just barely. First we check that the matrix is square and the exponent is an integer, inverting the matrix with the yet to be defined function inv if it is negative. The main loop is essentially the Russian peasant multiplication algorithm applied to the calculation of integer powers. It works by starting with an identity matrix and multiplying by powers of two of the original matrix if the exponent has a bit set for that power of two, reducing the number of matrix multiplications from the exponent to up to twice its base two logarithm.
For example

\(\displaystyle{ \mathbf{M}^{11} = \mathbf{I} \times \mathbf{M} \times \mathbf{M}^2 \times \mathbf{M}^8 }\)

We can implement the matrix exponent in terms of the integer matrix power, truncating the Taylor series when the absolute value of the next term is smaller than a normalised multiple of the sum so far, as shown in listing 15.

Listing 15: ak.matrix Exponential
function exp(m) {
 var eps = ak.EPSILON;
 var n   = m.rows();
 var mi  = ak.matrix('identity', n);
 var s   = mi;
 var i   = 1;

 if(m.cols()!==n) throw new Error('non-square matrix in ak.matrix exp');

 do {
  mi = divMR(mul(mi, m), i++);
  s  = add(s, mi);
 }
 while(abs(mi)>(1+abs(s))*eps);
 return s;
}

Cranking the complexity dial to eleven, listing 16 provides the implementation of the inverse.

Listing 16: ak.matrix Inverse
function inv(m) {
 var n = m.rows();
 var a = m.toArray();
 var i = ak.matrix('identity', n).toArray();
 var r1, r2, c, arow1, irow1, abs1, arow2, irow2, abs2, x;

 if(m.cols()!==n) throw new Error('non-square matrix in ak.matrix inv');

 for(r1=0;r1<n;++r1) {
  arow1 = a[r1];
  irow1 = i[r1];
  abs1  = Math.abs(arow1[r1]);

  for(r2=r1+1;r2<n;++r2) {
   arow2 = a[r2];
   abs2  = Math.abs(arow2[r1]);

   if(abs2>abs1) {
    irow2 = i[r2];

    a[r2] = arow1; arow1 = arow2; a[r1] = arow1;
    i[r2] = irow1; irow1 = irow2; i[r1] = irow1;
    abs1  = abs2;
   }
  }

  if(abs1===0) throw new Error('singular matrix in ak.matrix inv');
  x = arow1[r1];

  for(c=r1+1;c<n;++c) arow1[c] /= x;
  for(c=0;   c<n;++c) irow1[c] /= x;

  for(r2=0;r2<n;++r2) {
   if(r2!==r1) {
    arow2 = a[r2];
    irow2 = i[r2];
    x = arow2[r1];

    for(c=r1+1;c<n;++c) arow2[c] -= x*arow1[c];
    for(c=0;   c<n;++c) irow2[c] -= x*irow1[c];
   }
  }
 }
 return ak.matrix(i);
}
ak.overload(ak.inv, ak.MATRIX_T, inv);

The first inner loop on r2 finds the absolute largest element in the current column, swapping its row with the current row to form the pivot, and the second performs the addition of multiples of that row to zero out the elements below the pivot. Note that as mentioned above, we only update columns of the first matrix to the right of pivot to reduce the total number of operations.
The use of ak.inv is illustrated in program 3.

Program 3: The Matrix Inverse

Note that the minus signs before the zeros in the product of the matrix and its inverse are a consequence of numerical error, which you can confirm by changing the toFixed string conversion to toExponential.
To see a case of catastrophic numerical error, change the last row of m to [2,1,0]. The resulting matrix is singular and so does not have an inverse, but since we're working with floating point cancellation errors creep in to the algorithm, leaving elements that should be zero with very small values instead. These end up in a pivot position and, being non-zero, do not trigger the exception for singular matrices but rather introduce massive errors.
We see similar behaviour when the matrix is very nearly singular, as you can see if you change the last row to [2,1,1e-16]. In this case, the matrix does actually have a well defined inverse but given that we're working with floating point we simply do not have enough digits of precision to find it.
If you note how much larger the product of the absolute values of the matrix and its inverse is in these cases, you can see how we could use it to identify such errors in client code.

The right and left inverses simply use ak.transpose and ak.inv to implement their formulae, as shown in listing 17.

Listing 17: ak.matrix Left And Right Inverses
function leftInv(m) {
 var t;

 if(m.rows()===m.cols()) return inv(m);

 if(m.rows()<m.cols()) {
  throw new Error('fewer rows than columns in ak.matrix leftInv');
 }

 t = transpose(m);
 return mul(inv(mul(t, m)), t);
}
ak.overload(ak.leftInv, ak.MATRIX_T, leftInv);

function rightInv(m) {
 var t;

 if(m.rows()===m.cols()) return inv(m);

 if(m.cols()<m.rows()) {
  throw new Error('fewer columns than rows in ak.matrix rightInv');
 }

 t = transpose(m);
 return mul(t, inv(mul(m, t)));
}
ak.overload(ak.rightInv, ak.MATRIX_T, rightInv);

Note that, in both cases, if the matrix is square the inverse is returned since it takes less work and yields the same result. Using the right inverse as an example, for square matrices we have

\(\displaystyle{ \mathbf{M}^\mathrm{T} \times \left(\mathbf{M} \times \mathbf{M}^\mathrm{T}\right)^{-1} = \mathbf{M}^\mathrm{T} \times \mathbf{M}^{\mathrm{T}^{-1}} \times \mathbf{M}^{-1} = \mathbf{I} \times \mathbf{M}^{-1} = \mathbf{M}^{-1} }\)

since the transpose is also square and may therefore be inverted.

Given that we do this, we can simply define division by a matrix as pre-multiplication by its right inverse and not worry that we're doing more work than is necessary, as illustrated in listing 18.

Listing 18: ak.matrix Division
function divRM(r, m) {
 return mulMR(rightInv(m), r);
}
ak.overload(ak.div, [ak.VECTOR_T, ak.MATRIX_T], divVM);

Program 4 demonstrates the use of ak.matrix for solving systems of linear equations.

Program 4: Solving Linear Equations

Finally, the complete set of ak.matrix overloads are given in listing 19.

Listing 19: ak.matrix Overloads
ak.overload(ak.abs, ak.MATRIX_T, abs);
ak.overload(ak.det, ak.MATRIX_T, det);
ak.overload(ak.exp, ak.MATRIX_T, exp);
ak.overload(ak.inv, ak.MATRIX_T, inv);
ak.overload(ak.leftInv, ak.MATRIX_T, leftInv);
ak.overload(ak.neg, ak.MATRIX_T, neg);
ak.overload(ak.rightInv, ak.MATRIX_T, rightInv);
ak.overload(ak.tr, ak.MATRIX_T, tr);
ak.overload(ak.transpose, ak.MATRIX_T, transpose);

ak.overload(ak.add,  [ak.MATRIX_T, ak.MATRIX_T], add);
ak.overload(ak.dist, [ak.MATRIX_T, ak.MATRIX_T], dist);
ak.overload(ak.div,  [ak.MATRIX_T, ak.MATRIX_T], div);
ak.overload(ak.div,  [ak.MATRIX_T, ak.NUMBER_T], divMR);
ak.overload(ak.div,  [ak.NUMBER_T, ak.MATRIX_T], divRM);
ak.overload(ak.div,  [ak.VECTOR_T, ak.MATRIX_T], divVM);
ak.overload(ak.eq,   [ak.MATRIX_T, ak.MATRIX_T], eq);
ak.overload(ak.mul,  [ak.MATRIX_T, ak.MATRIX_T], mul);
ak.overload(ak.mul,  [ak.MATRIX_T, ak.NUMBER_T], mulMR);
ak.overload(ak.mul,  [ak.NUMBER_T, ak.MATRIX_T], mulRM);
ak.overload(ak.mul,  [ak.MATRIX_T, ak.VECTOR_T], mulMV);
ak.overload(ak.mul,  [ak.VECTOR_T, ak.MATRIX_T], mulVM);
ak.overload(ak.ne,   [ak.MATRIX_T, ak.MATRIX_T], ne);
ak.overload(ak.pow,  [ak.MATRIX_T, ak.NUMBER_T], powMR);
ak.overload(ak.pow,  [ak.NUMBER_T, ak.MATRIX_T], powRM);
ak.overload(ak.sub,  [ak.MATRIX_T, ak.MATRIX_T], sub);

ak.overload(ak.outerMul, [ak.VECTOR_T, ak.VECTOR_T], outerMul);

The definitions of these operations can be found in Matrix.js.

Armed with these basic arithmetic operations we are ready to explore the wider world of linear algebra and we shall take our first tentative steps in the next post.

References

[1] What's Our Vector, Victor?, www.thusspakeak.com, 2014.

[2] You’re Going To Have To Think! Why Polynomial Approximation Won’t Cure Your Calculus Blues, www.thusspakeak.com, 2014.

[3] Climbing Mount Impractical, www.thusspakeak.com, 2013.

Leave a comment