At The Foot Of The Eigen


Over the last couple of posts[1][2] we've covered the arithmetic of vectors and matrices and now we're ready to explore some more advanced aspects of linear algebra, starting with eigensystems.

So What's An Igon System?

That's eigensystem. One word. Derived from the German word akin to characteristic.

The eigensystem of a square matrix \(\mathbf{M}\) is the set of non-zero vectors \(\mathbf{v}_i\), known as eigenvectors, and associated numbers \(\lambda_i\), known as eigenvalues, that satisfy the relation
\[ \mathbf{M} \times \mathbf{v}_i = \lambda_i \times \mathbf{v}_i \]
Eigenvectors are therefore invariants of a matrix in the sense that their directions are unchanged when multiplied by it.

Now, we can rewrite the relation as
\[ \mathbf{M} \times \mathbf{v}_i = \lambda_i \times \mathbf{I} \times \mathbf{v}_i\\ \left(\mathbf{M}-\lambda_i \times \mathbf{I}\right) \times \mathbf{v}_i = \mathbf{0} \]
where \(\mathbf{I}\) and \(\mathbf{0}\) are the appropriately sized identity matrix and zero vector respectively.
If \(\mathbf{v}_i\) is non-zero then the only way that this can be satisfied is if \(\left(\mathbf{M}-\lambda_i \times \mathbf{I}\right)\) is singular, or in other words has a determinant of zero.
\[ \left|\mathbf{M}-\lambda_i \times \mathbf{I}\right| = 0 \]
This is known as the characteristic equation of the matrix \(\mathbf{M}\) and its solutions are the eigenvalues.
Given that the determinant of an \(n\) by \(n\) matrix is an \(n^{th}\) order polynomial, the characteristic equation must have \(n\) solutions albeit not necessarily distinct and possibly complex. For each of these \(n\) eigenvalues we can find non-zero solutions to simultaneous equations defined by
\[ \left(\mathbf{M}-\lambda_i \times \mathbf{I}\right) \times \mathbf{v}_i = \mathbf{0} \]
to yield the associated eigenvectors. Note that since the matrix is necessarily singular, each eigenvalue must have an infinite number of eigenvectors as can be demonstrated by noting that, for any number \(c\), we have
\[ \begin{align*} \mathbf{M} \times \left(c \times \mathbf{v}_i\right) &= c \times \left(\mathbf{M} \times \mathbf{v}_i\right)\\ &= c \times \left(\lambda_i \times \mathbf{v}_i\right)\\ &= \lambda_i \times \left(c \times \mathbf{v}_i\right) \end{align*} \]
To reign in this menagerie, we usually restrict our interest to the set of eigenvectors that have unit length
\[ \left|\mathbf{v}_i\right| = 1 \]

A Rather Two Dimensional Character...istic

The characteristic equation of the two by two matrix
\[ \begin{pmatrix} a & b\\ c & d \end{pmatrix} \]
\[ \begin{vmatrix} a-\lambda_i & b\\ c & d-\lambda_i \end{vmatrix} =(a-\lambda_i) \times (d-\lambda_i) - bc = \lambda_i^2 - (a+d)\lambda_i + (ad-bc) = 0 \]
This is a simple quadratic equation that can be solved using the formula that we learnt in school
\[ \begin{align*} \lambda_0 &= \frac{(a+d) + \sqrt{(a+d)^2 - 4(ad-bc)}}{2} = \frac{(a+d) + \sqrt{(a-d)^2 + 4bc}}{2}\\ \lambda_1 &= \frac{(a+d) - \sqrt{(a+d)^2 - 4(ad-bc)}}{2} = \frac{(a+d) - \sqrt{(a-d)^2 + 4bc}}{2} \end{align*} \]
For each of these eigenvalues we can recover the associated eigenvector by finding a solution to the simultaneous equations given by
\[ \begin{pmatrix} a-\lambda_i & b\\ c & d-\lambda_i \end{pmatrix} \times \begin{pmatrix} x_i\\ y_i \end{pmatrix} = \begin{pmatrix} 0\\ 0 \end{pmatrix} \]
As noted above, the matrix must be singular and consequently one of the equations must be a constant multiple of the other, so we can simply choose the first
\[ \left(a-\lambda_i\right)x_i + by_i = 0 \]
which is trivially solved by
\[ \begin{align*} x_i &= b\\ y_i &= \lambda_i-a \end{align*} \]
Finally, we divide it by its magnitude to get our unit eigenvector
\[ \mathbf{v}_i = \frac{1}{\sqrt{b^2 + \left(\lambda_i-a\right)^2}} \begin{pmatrix}b\\\lambda_i-a\end{pmatrix} \]

A Concrete Example

Whilst all this algebra exactly defines the eigensystems of two by two matrices, it is illuminating to consider an example with actual numbers. So, choosing
\[ \mathbf{M} = \begin{pmatrix} 4 & 2\\ 3 & 3 \end{pmatrix} \]
we have
\[ \begin{align*} \lambda_0 &= \frac{7 + \sqrt{1 + 24}}{2} = 6 \\ \lambda_1 &= \frac{7 - \sqrt{1 + 24}}{2} = 1 \end{align*} \]
and therefore
\[ \begin{align*} \mathbf{v}_0 &= \frac{1}{\sqrt{8}} \begin{pmatrix}2\\2\end{pmatrix}\\ \mathbf{v}_1 &= \frac{1}{\sqrt{13}} \begin{pmatrix}2\\-3\end{pmatrix} \end{align*} \]
Multiplying these eigenvectors by \(\mathbf{M}\) confirms that their elements are indeed multiplied by their eigenvalues
\[ \begin{align*} \mathbf{M} \times \mathbf{v}_0 &= \frac{1}{\sqrt{8}} \begin{pmatrix}4 & 2\\3 & 3\end{pmatrix} \begin{pmatrix}2\\2\end{pmatrix} = \frac{1}{\sqrt{8}} \begin{pmatrix}12\\12\end{pmatrix}\\ \mathbf{M} \times \mathbf{v}_1 &= \frac{1}{\sqrt{13}} \begin{pmatrix}4 & 2\\3 & 3\end{pmatrix} \begin{pmatrix}2\\-3\end{pmatrix} = \frac{1}{\sqrt{13}} \begin{pmatrix}2\\-3\end{pmatrix} \end{align*} \]
To visualise the direction invariant nature of the eigenvectors, program 1 plots vectors pointing away from the origin over a circle of starting points in black and the new direction that results from multiplying by the matrix m in red.

Program 1: Visualising The Eigenvectors

Note that the coincident directions of the vectors lying above the top-left to bottom-right diagonal are close to \(\pm\mathbf{v}_1\) and the nearly coincident vectors lying above and below the bottom-left to top-right diagonal are close to \(\pm\mathbf{v}_0\).

Now this visualisation is quite a handy way to illustrate some special cases.
Firstly, try changing m to
\[ \begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix} \]
Notice how every vector is direction invariant? This is because both of the eigenvalues are equal and, for such matrices, given a pair of independent eigenvectors \(\mathbf{v}_0\) and \(\mathbf{v}_1\) we have
\[ \begin{align*} \mathbf{M} \times \left(a \mathbf{v}_0 + b \mathbf{v}_1\right) &=\mathbf{M} \times \left(a \mathbf{v}_0\right) + \mathbf{M} \times \left(b \mathbf{v}_1\right)\\ &=a \mathbf{M} \mathbf{v}_0 + b \mathbf{M} \mathbf{v}_1\\ &=a \lambda \mathbf{v}_0 + b \lambda \mathbf{v}_1\\ &=\lambda \left(a \mathbf{v}_0 + b \mathbf{v}_1\right) \end{align*} \]
and so every linear combination of them, and consequently every vector in the plain, is also an eigenvector.

Next, try the matrix
\[ \begin{pmatrix} 1 & -4\\ 1 & 1 \end{pmatrix} \]
This time none of the vectors have invariant directions! This is because the characteristic equation of this matrix has complex roots
\[ \begin{align*} \lambda_0 &= \frac{2 + \sqrt{0 - 16}}{2} = 1 + 2i \\ \lambda_1 &= \frac{2 - \sqrt{0 - 16}}{2} = 1 - 2i \end{align*} \]

And What Immortal Hand Or Eye...

This will be the case whenever \(4bc<-(a-d)^2\), which is impossible if \(b\) and \(c\) have the same sign or, more importantly, are equal. In the latter case the matrix is symmetric and has eigenvalues
\[ \begin{align*} \lambda_0 &= \frac{(a+d) + \sqrt{(a-d)^2 + 4b^2}}{2}\\ \lambda_1 &= \frac{(a+d) - \sqrt{(a-d)^2 + 4b^2}}{2} \end{align*} \]
Now the eigenvectors of a symmetric matrix have some important properties which can be revealed by multiplying them by each other.
\[ \mathbf{v}_0 \times \mathbf{v}_0 = \mathbf{v}_1 \times \mathbf{v}_1 = 1\\ \mathbf{v}_0 \times \mathbf{v}_1 = \mathbf{v}_1 \times \mathbf{v}_0 = 0 \]
as shown in derivation 1.

Derivation 1: Symmetric Matrix Eigenvector Properties
Firstly, since we choose our eigenvectors to have unit length, we trivially have
\[ \mathbf{v}_0 \times \mathbf{v}_0 = \mathbf{v}_1 \times \mathbf{v}_1 = 1 \]
Secondly, by the properties of the vector product, we also have
\[ \mathbf{v}_0 \times \mathbf{v}_1 = \mathbf{v}_1 \times \mathbf{v}_0 \]
Expanding the left hand side out in full yields
\[ \mathbf{v}_0 \times \mathbf{v}_1 = \frac{1}{\sqrt{b^2 + \left(\lambda_0-a\right)^2}} \times \frac{1}{\sqrt{b^2 + \left(\lambda_1-a\right)^2}} \times \begin{pmatrix}b\\\lambda_0-a\end{pmatrix} \times \begin{pmatrix}b\\\lambda_1-a\end{pmatrix} \]
The vector product here equates to
\[ \begin{align*} \begin{pmatrix}b\\\lambda_0-a\end{pmatrix} \times \begin{pmatrix}b\\\lambda_1-a\end{pmatrix} &= b^2 + \left(\lambda_0-a\right) \times \left(\lambda_1-a\right)\\ &= \lambda_0 \lambda_1 - a\left(\lambda_0 + \lambda_1\right) + \left(a^2+b^2\right)\\ &= \tfrac{1}{4}\left[(a+d)^2 - \left((a-d)^2+4b^2\right)\right] - a(a+d) + \left(a^2+b^2\right)\\ &= \tfrac{1}{4}\left[a^2+2ad+d^2 - a^2+2ad-d^2-4b^2\right] - a^2-ad + a^2+b^2\\ &= \tfrac{1}{4}\left[4ad-4b^2\right] -ad+b^2\\ &=0 \end{align*} \]
from which the result follows.

Now if we construct a matrix \(\mathbf{V}\) whose columns are the eigenvectors \(\mathbf{v}_0\) and \(\mathbf{v}_1\)
Derivation 2: The Product Of VT And V
\[ \begin{align*} \mathbf{V}^\mathrm{T} \times \mathbf{V} &= \begin{pmatrix} v_{00} & v_{01}\\ v_{10} & v_{11} \end{pmatrix} \times \begin{pmatrix} v_{00} & v_{10}\\ v_{01} & v_{11} \end{pmatrix} \\&= \begin{pmatrix} v_{00} \times v_{00} + v_{01} \times v_{01} & v_{00} \times v_{10} + v_{01} \times v_{11}\\ v_{10} \times v_{00} + v_{11} \times v_{01} & v_{10} \times v_{10} + v_{11} \times v_{11} \end{pmatrix} \end{align*} \]
Now the elements of this matrix are simply the products of the eigenvectors, so
\[ \begin{align*} \mathbf{V}^\mathrm{T} \times \mathbf{V} &= \begin{pmatrix} \mathbf{v}_0\times\mathbf{v}_0 & \mathbf{v}_0\times\mathbf{v}_1\\ \mathbf{v}_1\times\mathbf{v}_0 & \mathbf{v}_1\times\mathbf{v}_1 \end{pmatrix} \\&= \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} \end{align*} \]
by the properties of the eigenvectors proven above.

\[ \begin{align*} \mathbf{v}_0 &= \begin{pmatrix} v_{00}\\ v_{01} \end{pmatrix} \\ \mathbf{v}_1 &= \begin{pmatrix} v_{10}\\ v_{11} \end{pmatrix} \\ \mathbf{V} &= \begin{pmatrix} v_{00} & v_{10}\\ v_{01} & v_{11} \end{pmatrix} \end{align*} \]
we have
\[ \mathbf{V}^\mathrm{T} \times \mathbf{V} = \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix} \]
as shown in derivation 2, and hence the transpose of the matrix is equal to its inverse.

If we also construct a diagonal matrix \(\mathbf{\Lambda}\) whose diagonal elements are the eigenvalues associated with the eigenvectors in the same columns in \(\mathbf{V}\)
Derivation 3: Recovering M
Given that the columns of \(\mathbf{V}\) are the eigenvectors of \(\mathbf{M}\), we have
\[ \mathbf{M} \times \mathbf{V} = \begin{pmatrix} \lambda_0 \times v_{00} & \lambda_1 \times v_{10}\\ \lambda_0 \times v_{01} & \lambda_1 \times v_{11} \end{pmatrix} \]
and therefore
\[ \begin{align*} &\mathbf{V}^\mathrm{T} \times \mathbf{M} \times \mathbf{V}\\ &\quad= \begin{pmatrix} v_{00} & v_{01}\\ v_{10} & v_{11} \end{pmatrix} \times \begin{pmatrix} \lambda_0 \times v_{00} & \lambda_1 \times v_{10}\\ \lambda_0 \times v_{01} & \lambda_1 \times v_{11} \end{pmatrix}\\ &\quad= \begin{pmatrix} \lambda_0 \times \mathbf{v}_0 \times \mathbf{v}_0 & \lambda_1 \times \mathbf{v}_0 \times \mathbf{v}_1\\ \lambda_0 \times \mathbf{v}_1 \times \mathbf{v}_0 & \lambda_1 \times \mathbf{v}_1 \times \mathbf{v}_1 \end{pmatrix}\\ &\quad= \begin{pmatrix} \lambda_0 & 0\\ 0 & \lambda_1 \end{pmatrix}\\ &\quad= \mathbf{\Lambda} \end{align*} \]
by the same properties we used in derivation 2.
Now, we trivially have
\[ \mathbf{V} \times \mathbf{V}^\mathrm{T} \times \mathbf{M} \times \mathbf{V} \times \mathbf{V}^\mathrm{T} = \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \]
and since the transpose of \(\mathbf{V}\) is equal to its inverse
\[ \mathbf{M}=\mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \]
\[ \mathbf{\Lambda} = \begin{pmatrix} \lambda_0 & 0\\ 0 & \lambda_1 \end{pmatrix} \]
we can recover the original matrix \(\mathbf{M}\) with
\[ \mathbf{M}=\mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \]
as proven in derivation 3.

This is known as the spectral decomposition of \(\mathbf{M}\) and is an extremely important property of the eigensystems of symmetric matrices. To understand why, first multiply \(\mathbf{M}\) by itself
\[ \begin{align*} \mathbf{M} \times \mathbf{M} &= \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \times \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{I} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \mathbf{\Lambda}^2 \times \mathbf{V}^\mathrm{T} \end{align*} \]
where, since \(\mathbf{\Lambda}\) is a diagonal matrix, \(\mathbf{\Lambda}^2\) is given by
\[ \mathbf{\Lambda}^2 = \begin{pmatrix} \lambda_0^2 & 0\\ 0 & \lambda_1^2 \end{pmatrix} \]
Next, we keep multiplying by \(\mathbf{M}\) to recursively show that
\[ \mathbf{M}^n = \mathbf{V} \times \mathbf{\Lambda}^n \times \mathbf{V}^\mathrm{T} \]
\[ \mathbf{\Lambda}^n = \begin{pmatrix} \lambda_0^n & 0\\ 0 & \lambda_1^n \end{pmatrix} \]
Finally, we wheel out our old favourite; Taylor's theorem. If we apply it to functions of matrices instead of functions of numbers, as we did when defining the exponential of a matrix[2], we have
\[ f(\mathbf{M}) = f(0) \times \mathbf{I} + f'(0) \times \mathbf{M} + \tfrac{1}{2}f''(0) \times \mathbf{M}^2 + \dots + \tfrac{1}{n!}f^{(n)}(0) \times \mathbf{M}^n + \dots \]
where the dashes and bracketed superscript represent the derivatives of the equivalent function of a number.
Noting that
\[ \mathbf{M}^0 = \mathbf{V} \times \mathbf{\Lambda}^0 \times \mathbf{V}^\mathrm{T} = \mathbf{V} \times \mathbf{I} \times \mathbf{V}^\mathrm{T} = \mathbf{V} \times \mathbf{V}^\mathrm{T} = \mathbf{I} \]
we have
\[ f(\mathbf{M}) = \sum_{i=0}^\infty \, \tfrac{1}{i!}f^{(i)}(0) \times \mathbf{M}^i \]
where \(\sum\) is the summation sign.

If we substitute \(\mathbf{M}\) with its spectral decomposition we get
\[ \begin{align*} f(\mathbf{M}) &= \sum_{i=0}^\infty \, \tfrac{1}{i!}f^{(i)}(0) \times \mathbf{V} \times \mathbf{\Lambda}^i \times \mathbf{V}^\mathrm{T}\\ &= \sum_{i=0}^\infty \, \mathbf{V} \times \tfrac{1}{i!}f^{(i)}(0) \times \begin{pmatrix} \lambda_0^i & 0\\ 0 & \lambda_1^i \end{pmatrix} \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \left( \sum_{i=0}^\infty \, \begin{pmatrix} \tfrac{1}{i!}f^{(i)}(0) \times \lambda_0^i & 0\\ 0 & \tfrac{1}{i!}f^{(i)}(0) \times \lambda_1^i \end{pmatrix} \right) \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \begin{pmatrix} \sum_{i=0}^\infty \, \tfrac{1}{i!}f^{(i)}(0) \times \lambda_0^i & 0\\ 0 & \sum_{i=0}^\infty \, \tfrac{1}{i!}f^{(i)}(0) \times \lambda_1^i \end{pmatrix}\times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \begin{pmatrix} f\left(\lambda_0\right) & 0\\ 0 & f\left(\lambda_1\right) \end{pmatrix} \times \mathbf{V}^\mathrm{T} \end{align*} \]
I don't know if you're as excited by this result as I am, although I rather suspect that if you have a life you aren't, but it means that we can apply any function to symmetric matrices simply by applying it to the eigenvalues and recovering the result from the spectral decomposition. Neato!
As if that wasn't enough, we can trivially compute the determinant of a matrix from its spectral decomposition by noting that
\[ \begin{align*} |\mathbf{M}| &= \left|\mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T}\right|\\ &= \left|\mathbf{V}\right| \times \left|\mathbf{\Lambda}\right| \times \left|\mathbf{V}^\mathrm{T}\right|\\ &= 1 \times \lambda_0 \times \lambda_1 \times 1\\ &= \lambda_0 \times \lambda_1 \end{align*} \]
or, in other words, the product of its eigenvalues.
And there's more! The value
\[ \kappa(\mathbf{M}) = \frac{\max\left(\left|\lambda_0\right|, \left|\lambda_1\right|\right)}{\min\left(\left|\lambda_0\right|, \left|\lambda_1\right|\right)} \]
is the condition number of \(\mathbf{M}\) and is a measure of how sensitive the inverse of \(\mathbf{M}\) is to numerical error related to the one we calculated using the Frobenius norm, \(\|\mathbf{M}\| \times \|\mathbf{M}^{-1}\|\). As before, the larger the value, the closer the matrix is to being singular.

Now, whilst this two by two case is a nice introduction to eigensystems, the question is whether we can generalise them to \(n\) by \(n\) matrices; a question which we shall address in the next post...


[1] What's Our Vector, Victor?,, 2014.

[2] The Matrix Recoded,, 2014.

Leave a comment