Ascending The Eigen


In the previous post[1] we covered eigensystems of two by two matrices, being the set of unit eigenvectors \(\mathbf{v}_i\) and eigenvalues \(\lambda_i\) that satisfy the relation
\[ \mathbf{M} \times \mathbf{v}_i = \lambda_i \times \mathbf{v}_i \]
To find the eigenvalues of a matrix \(\mathbf{M}\) we solved the characteristic equation
\[ \left|\mathbf{M} - \lambda_i \times \mathbf{I}\right| = 0 \]
Now for a two by two matrix this was a simple quadratic equation that we could solve with the equation we learnt in school. For larger matrices the characteristic equation is much harder to solve and we shall have to take an entirely different approach.

Properties Of n By n Eigensystems

The characteristic equation of an \(n\) by \(n\) matrix is an \(n^{th}\) order polynomial and must have \(n\) roots at which it equates to zero, some of which may be equal to each other and/or complex.
As with the two by two case, the eigenvalues of real symmetric matrices are guaranteed to be real, as proven in derivation 1.

Derivation 1: The Eigenvalues Of Real Symmetric Matrices
Given a real symmetric matrix \(\mathbf{M}\) assume it has a complex eigenvalue \(\lambda\) and associated complex eigenvector \(\mathbf{v}\) given by
\[ \begin{align*} \lambda &= \lambda_r + \lambda_c i\\ \mathbf{v} &= \mathbf{v}_r + \mathbf{v}_c i \end{align*} \]
Now since this is an eigenvalue/eigenvector pair, we have
\[ \begin{align*} \mathbf{M} \times \mathbf{v} &= \mathbf{M} \times \left(\mathbf{v}_r + \mathbf{v}_c i\right) = \lambda \times \mathbf{v}\\ &= \left(\lambda_r + \lambda_c i\right) \times \left(\mathbf{v}_r + \mathbf{v}_c i\right)\\ &= \left(\lambda_r \mathbf{v}_r - \lambda_c \mathbf{v}_c\right) + \left(\lambda_c \mathbf{v}_r + \lambda_r \mathbf{v}_c\right) i \end{align*} \]
and, given that \(\mathbf{M}\) is real
\[ \begin{align*} \mathbf{M} \times \mathbf{v}_r &= \lambda_r \mathbf{v}_r - \lambda_c \mathbf{v}_c\\ \mathbf{M} \times \mathbf{v}_c &= \lambda_c \mathbf{v}_r + \lambda_r \mathbf{v}_c \end{align*} \]
and hence
\[ \begin{align*} \mathbf{M} \times \mathbf{v}^\ast &= \mathbf{M} \times \left(\mathbf{v}_r - \mathbf{v}_c i\right)\\ &= \left(\lambda_r \mathbf{v}_r - \lambda_c \mathbf{v}_c\right) - \left(\lambda_c \mathbf{v}_r + \lambda_r \mathbf{v}_c\right) i\\ &= \left(\lambda_r - \lambda_c i\right) \times \left(\mathbf{v}_r - \mathbf{v}_c i\right)\\ &= \lambda^\ast \times \mathbf{v}^\ast \end{align*} \]
Now, since \(\mathbf{M}\) is symmetric, we have \(\mathbf{M}\times\mathbf{v} = \mathbf{v}\times\mathbf{M}\) and consequently \(\mathbf{v}^\ast \times \mathbf{M} \times \mathbf{v} = \mathbf{v} \times \mathbf{M} \times \mathbf{v}^\ast\) yielding
\[ \begin{align*} 0 &= \mathbf{v}^\ast \times \mathbf{M} \times \mathbf{v} - \mathbf{v} \times \mathbf{M} \times \mathbf{v}^\ast\\ & = \mathbf{v}^\ast \times \lambda \times \mathbf{v} - \mathbf{v} \times \lambda^\ast \times \mathbf{v}^\ast\\ & = \left(\lambda - \lambda^\ast\right) \times \mathbf{v} \times \mathbf{v}^\ast \end{align*} \]
Finally, since \(\mathbf{v}\) is non-zero and \(z \times z^\ast = |z|^2\) for a complex number \(z\), the vector product cannot equal zero and so
\[ \begin{align*} 0 &= \left(\lambda - \lambda^\ast\right)\\ &= \left(\lambda_r + \lambda_c i\right) - \left(\lambda_r - \lambda_c i\right)\\ &= 2 \lambda_c i \end{align*} \]
and hence the eigenvalue has no imaginary part.

The products of eigenvectors also behave similarly to the two by two case
\[ \mathbf{v}_i \times \mathbf{v}_i = 1 \quad \forall i \\ \mathbf{v}_i \times \mathbf{v}_j = 0 \quad \forall i \neq j \]
The first equality is a trivial consequence of choosing unit eigenvectors and the second is proven in derivation 2.

Derivation 2: The Product Of vi And vj
Given eigenvectors \(\mathbf{v}_i\) and \(\mathbf{v}_j\) of a real symmetric matrix \(\mathbf{M}\) having distinct eigenvalues \(\lambda_i\) and \(\lambda_j\), we have
\[ \begin{align*} \mathbf{v}_i \times \mathbf{M} \times \mathbf{v}_j &= \left(\mathbf{v}_i \times \mathbf{M}\right) \times \mathbf{v}_j = \mathbf{v}_i \times \lambda_i \times \mathbf{v}_j\\ \mathbf{v}_i \times \mathbf{M} \times \mathbf{v}_j &= \mathbf{v}_i \times \left(\mathbf{M} \times \mathbf{v}_j\right) = \mathbf{v}_i \times \lambda_j \times \mathbf{v}_j\\ \end{align*} \]
and therefore
\[ \begin{align*} 0 &= \lambda_i \times \mathbf{v}_i \times \mathbf{v}_j - \lambda_j \times \mathbf{v}_i \times \mathbf{v}_j\\ &= \left(\lambda_i - \lambda_j\right) \times \mathbf{v}_i \times \mathbf{v}_j \end{align*} \]
Since \(\lambda_i\) and \(\lambda_j\) are distinct \(\mathbf{v}_i \times \mathbf{v}_j\) must equal zero.

Now, if we have an eigenvalue that is repeated, say \(k\) times, then it is associated with a \(k\)-dimensional subspace of vectors with invariant directions from which we are free to choose orthogonal eigenvectors which have the required properties.
If we again construct a matrix \(\mathbf{V}\) whose columns are the eigenvectors of \(\mathbf{M}\) and a diagonal matrix \(\mathbf{\Lambda}\) whose leading diagonal have the eigenvalues associated with the same column of \(\mathbf{V}\) then, as for the two by two case, we have
\[ \mathbf{M} = \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \]
which we can demonstrate by noting that \(\mathbf{V} \times \mathbf{V}^\mathrm{T} = \mathbf{I}\) and that the \(i^{th}\) column of \(\mathbf{M} \times \mathbf{V}\) is \(\lambda_i \times \mathbf{v}_i\) as is that of \(\mathbf{V} \times \mathbf{\Lambda}\) and that consequently
\[ \mathbf{M} = \mathbf{M} \times \mathbf{V} \times \mathbf{V}^\mathrm{T} = \mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T} \]
which is, admittedly, a good deal simpler than the proof I gave for the two by two case!

This \(n\) by \(n\) spectral decomposition has the same properties as the two by two spectral decomposition in that we can use it to easily compute the determinant by taking the product of the eigenvalues, the condition number by taking the ratio of the largest to the smallest eigenvalue and the powers of the original matrix by raising the eigenvalues to those powers. The latter property implies that we can apply any function to a real symmetric matrix by applying it to its eigenvalues by considering its Taylor series expansion
\[ \begin{align*} f(\mathbf{M}) &= \sum_i \tfrac{1}{i!} \times f^{(i)}(0) \times \mathbf{M}^i\\ &= \sum_i \tfrac{1}{i!} \times f^{(i)}(0) \times \left(\mathbf{V} \times \mathbf{\Lambda} \times \mathbf{V}^\mathrm{T}\right)^i\\ &= \sum_i \tfrac{1}{i!} \times f^{(i)}(0) \times \mathbf{V} \times \mathbf{\Lambda}^i \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times \left(\sum_i \tfrac{1}{i!} \times f^{(i)}(0) \times \mathbf{\Lambda}^i\right) \times \mathbf{V}^\mathrm{T}\\ &= \mathbf{V} \times f(\mathbf{\Lambda}) \times \mathbf{V}^\mathrm{T}\\ \end{align*} \]
where \(f(\mathbf{\Lambda})\) is the diagonal matrix with leading elements \(f\left(\lambda_i\right)\).

The eigensystem is particularly useful for computing inverses of symmetric matrices that are nearly singular and consequently cause numerical instability in algorithms like Gaussian elimination, as we found with our implementation[2].
Such matrices have one or more eigenvalues that are many orders of magnitude smaller than the largest and consequently have reciprocals that are many orders of magnitude larger than that of the latter. This is a problem when working with the limited precision of floating point arithmetic.
The eigenvectors associated with such eigenvalues have directions that are nearly flattened down to points by the matrix and so the results of multiplying vectors with very different components in those directions by it will be numerically very similar. It makes some sense, therefore, to choose the smallest such components when computing the inverse, which we can do by replacing the reciprocals of their eigenvalues with zero. Specifically, we define
\[ \lambda_i^{-1} = \begin{cases} \frac{1}{\lambda_i} & \left|\lambda_i\right| \geqslant \left|\epsilon \lambda_\mathrm{max}\right|\\ 0 & \mathrm{otherwise} \end{cases} \]
where \(\lambda_\mathrm{max}\) is the eigenvalue with the greatest magnitude and \(\epsilon\) is a user supplied tolerance.

Note that we can also exploit this to improve the numerical stability of inverting asymmetric square matrices by noting that
\[ \begin{array}{lcccc} \left(\mathbf{M}^\mathrm{T} \times \mathbf{M}\right)_{ij} &=& \sum_k \mathbf{M}^\mathrm{T}_{ik} \times \mathbf{M}_{kj} &=& \sum_k \mathbf{M}_{ki} \times \mathbf{M}_{kj}\\ &=& \sum_k \mathbf{M}_{kj} \times \mathbf{M}_{ki} &=& \sum_k \mathbf{M}^\mathrm{T}_{jk} \times \mathbf{M}_{ki}\\ &=& \left(\mathbf{M}^\mathrm{T} \times \mathbf{M}\right)_{ji} \end{array} \]
and so \(\mathbf{M}^\mathrm{T} \times \mathbf{M}\) is a square symmetric matrix and can be stably inverted using its eigensystem. We can then recover the inverse of \(\mathbf{M}\) with the left inverse
\[ \left(\mathbf{M}^\mathrm{T} \times \mathbf{M}\right)^{-1} \times \mathbf{M}^\mathrm{T} = \mathbf{M}^{-1} \times \mathbf{M}^{\mathrm{T}^{-1}} \times \mathbf{M}^\mathrm{T} = \mathbf{M}^{-1} \]
Finally, to put the icing on the cake, it turns out that it's much easier to construct the spectral decomposition of a real symmetric matrix than it is to solve its characteristic equation.

Let's Twist Again

The key to finding the spectral decomposition is to apply a series of matrix transformations to \(\mathbf{M}\) to reduce it to a diagonal matrix, much as we did when seeking the inverse of a matrix although this time of the form
\[ \begin{align*} \mathbf{M}_0 &= \mathbf{M}\\ \mathbf{M}_{i+1} &= \mathbf{R}_i^\mathrm{T} \times \mathbf{M}_i \times \mathbf{R}_i \end{align*} \]
Now, if we can choose each \(\mathbf{R}_i\) to have columns that are orthogonal unit vectors and so that for some \(n\), \(\mathbf{M}_n\) is diagonal, then
\[ \begin{align*} \mathbf{\Lambda} &= \mathbf{M}_n\\ \mathbf{V} &= \prod_{i=1}^n \mathbf{R}_i \end{align*} \]
where \(\prod\) is the product sign.
We can do this by choosing \(\mathbf{R}_i\) so as to set an opposing pair of off-diagonal elements of \(\mathbf{M}_{i+1}\) to zero. Such \(\mathbf{R}_i\) are known as Jacobi rotations[3] and take the form
\[ \mathbf{R} = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots & & \vdots & & \vdots\\ 0 & \cdots & \cos \theta & \cdots & 0 & \cdots & \sin \theta & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots & & \vdots & & \vdots\\ 0 & \cdots & 0 & \cdots & 1 & \cdots & 0 & \cdots & 0\\ \vdots & & \vdots & & \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & -\sin \theta & \cdots & 0 & \cdots & \cos \theta & \cdots & 0\\ \vdots & & \vdots & & \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\ \end{pmatrix} \]
being a rotation in the plane defined by the row-column indices of the pair of elements we wish to set to zero and the identity transformation in all other dimensions. Specifically, if we wish to set the (non-zero) elements at row-column indices \((k, l)\) and \((l, k)\) to zero then, assuming \(k < l\), the elements of \(\mathbf{R}\) are given by
\[ \begin{align*} r_{ii} &= 1 && i \neq k \wedge i \neq l\\ r_{ii} &= \cos \theta && i = k \vee i = l\\ r_{ij} &= \sin \theta && i = k \wedge j = l\\ r_{ij} &= -\sin \theta && i = l \wedge j = k\\ r_{ij} &= 0 && \mathrm{otherwise} \end{align*} \]
where \(\wedge\) means and and \(\vee\) means or.
Now the columns of the identity part of the matrix are trivially of unit length and orthogonal. That the same is true of the rotation part follows from the expressions
\[ \begin{align*} \cos \theta \times \cos \theta + \sin \theta \times \sin \theta &= 1\\ \cos \theta \times \sin \theta - \sin \theta \times \cos \theta &= 0 \end{align*} \]
Derivation 3 shows that to zero out the selected elements we can choose a value for \(\theta\) that satisfies
\[ m_{kl} \cot^2 \theta - \left(m_{ll} - m_{kk}\right) \cot \theta - m_{kl} = 0 \]
where \(\cot \theta\) is the cotangent of \(\theta\) and is given by
\[ \cot \theta = \frac{1}{\tan \theta} = \frac{\cos \theta}{\sin \theta} \]
Derivation 3: The Equation That tan θ Must Satisfy
Since the identity part of \(\mathbf{R}\) has no effect upon the result, consider just the two by two rotation part of the transformation, noting that since \(\mathbf{M}\) is symmetric then \(m_{lk}=m_{kl}\). Using the shorthand
\[ \begin{align*} c_\theta &= \cos \theta\\ s_\theta &= \sin \theta\\ \end{align*} \]
we have
\[ \begin{align*} &\begin{pmatrix} c_\theta & -s_\theta\\ s_\theta & c_\theta \end{pmatrix} \times \begin{pmatrix} m_{kk} & m_{kl}\\ m_{kl} & m_{ll} \end{pmatrix} \times \begin{pmatrix} c_\theta & s_\theta\\ -s_\theta & c_\theta \end{pmatrix}\\ &\quad=\begin{pmatrix} c_\theta & -s_\theta\\ s_\theta & c_\theta \end{pmatrix} \times \begin{pmatrix} m_{kk} c_\theta - m_{kl} s_\theta & m_{kk} s_\theta + m_{kl} c_\theta\\ m_{kl} c_\theta - m_{ll} s_\theta & m_{kl} s_\theta + m_{ll} c_\theta \end{pmatrix}\\ &\quad=\begin{pmatrix} m_{kk} c_\theta^2 - 2 m_{kl} c_\theta s_\theta + m_{ll} s_\theta^2 & m_{kk} c_\theta s_\theta + m_{kl} c_\theta^2 - m_{kl} s_\theta^2 - m_{ll} c_\theta s_\theta\\ m_{kk} c_\theta s_\theta - m_{kl} s_\theta^2 + m_{kl} c_\theta^2 - m_{ll} c_\theta s_\theta & m_{kk}s_\theta^2 + 2 m_{kl} c_\theta s_\theta + m_{ll} c_\theta^2 \end{pmatrix} \end{align*} \]
and therefore require that
\[ m_{kk} \cos \theta \sin \theta + m_{kl} \cos^2 \theta - m_{kl} \sin^2 \theta - m_{ll} \cos \theta \sin \theta = 0 \]
Rearranging yields
\[ \begin{align*} m_{kl} \left(\cos^2 \theta - \sin^2 \theta\right) &= \left(m_{ll} - m_{kk}\right) \cos \theta \sin \theta\\ \frac{\cos^2 \theta - \sin^2 \theta}{\cos \theta \sin \theta} &= \frac{m_{ll} - m_{kk}}{m_{kl}}\\ \frac{\cos \theta}{\sin \theta} - \frac{\sin \theta}{\cos \theta} &= \frac{m_{ll} - m_{kk}}{m_{kl}}\\ \cot \theta - \frac{1}{\cot \theta} &= \frac{m_{ll} - m_{kk}}{m_{kl}} \end{align*} \]
and hence
\[ m_{kl} \cot^2 \theta - \left(m_{ll} - m_{kk}\right) \cot \theta - m_{kl} = 0 \]

This is a simple quadratic equation in \(\cot \theta\) which, as you should know by now, has the solutions
\[ \cot \theta = \frac{\left(m_{ll} - m_{kk}\right) \pm \sqrt{\left(m_{ll} - m_{kk}\right)^2+4 m_{kl}^2}}{2 m_{kl}} \]
from which we can easily recover \(\tan \theta\)
\[ \tan \theta = \frac{1}{\cot \theta} = \frac{2m_{kl}}{\left(m_{ll} - m_{kk}\right) \pm \sqrt{\left(m_{ll} - m_{kk}\right)^2+4m_{kl}^2}} \]
Now as \(\theta\) tends to zero, \(\sin \theta\) tends to zero and \(\cos \theta\) tends to one, so the smaller the value of \(\theta\) the closer the rotation part of \(\mathbf{R}\), and consequently \(\mathbf{R}\) itself, is to the identity matrix. For the sake of stability we should like \(\mathbf{R}\) to have as small an effect as possible upon the elements we are not setting to zero and so, given that the tangent is the sine divided by the cosine, we should choose the smaller value of \(\tan \theta\). This means that we should choose the larger denominator and consequently the square root with the same sign as \(\left(m_{ll} - m_{kk}\right)\)
\[ \tan \theta = \begin{cases} \dfrac{2m_{kl}}{\left(m_{ll} - m_{kk}\right) + \sqrt[+]{\left(m_{ll} - m_{kk}\right)^2+4m_{kl}^2}} & \quad m_{ll} \geqslant m_{kk}\\ \dfrac{2m_{kl}}{\left(m_{ll} - m_{kk}\right) + \sqrt[-]{\left(m_{ll} - m_{kk}\right)^2+4m_{kl}^2}} & \quad m_{ll} < m_{kk} \end{cases} \]
The curly brace here is the mathematical notation for a conditional expression, in which the choices are followed by the conditions for selecting them.
Note that the reason we expressed the quadratic equation in terms of the cotangent instead of the tangent is that the expression for the latter requires the square root term to have the opposite sign to the term it is added to, presenting a wonderful opportunity for cancellation error! Yet another example of how approaching the numerical solution of a mathematical problem from different directions has significant implications for its accuracy.

Fortunately for the sake of efficiency, we can calculate the values of \(\cos \theta\) and \(\sin \theta\) directly from that of \(\tan \theta\) by exploiting the identities
\[ \begin{align*} \frac{1}{\sqrt{\tan^2 \theta + 1}} &= \frac{1}{\sqrt{\frac{\sin^2 \theta}{\cos^2 \theta} + 1}} = \frac{\sqrt{\cos^2 \theta}}{\sqrt{\sin^2 \theta + \cos^2 \theta}} = \frac{\sqrt{\cos^2 \theta}}{\sqrt{1}} = \cos \theta\\ \cos \theta \times \tan \theta &= \cos \theta \times \frac{\sin \theta}{\cos \theta} = \sin \theta \end{align*} \]

Coming To Terms With It All

Having found values for \(\cos \theta\) and \(\sin \theta\) we're ready to perform the Jacobi rotation, but to do so we'll need some equations for its elements.
Given the result of a Jacobi rotation
\[ \mathbf{M}^\prime = \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} \]
we have for indices \(i\) and \(j\), neither of which are equal to \(k\) or \(l\)
\[ \begin{align*} m^\prime_{kk} &= m_{kk} \cos^2 \theta + m_{ll} \sin^2 \theta - 2m_{kl} \cos \theta \sin \theta\\ m^\prime_{ll} &= m_{kk} \sin^2 \theta + m_{ll} \cos^2 \theta + 2m_{kl} \cos \theta \sin \theta\\ m^\prime_{ik} &= m^\prime_{ki} = m_{ik} \cos \theta - m_{il} \sin \theta\\ m^\prime_{il} &= m^\prime_{li} = m_{ik} \sin \theta + m_{il} \cos \theta\\ m^\prime_{kl} &= m^\prime_{lk} = 0\\ m^\prime_{ij} &= m_{ij} \end{align*} \]
as shown by derivation 4.

Derivation 4: The Terms Of The Jacobi Rotation
The first thing to note is that since the only rows and columns of \(\mathbf{R}\) that differ from the identity matrix are the \(k^\mathrm{th}\) and \(l^\mathrm{th}\), only the \(k^\mathrm{th}\) and \(l^\mathrm{th}\) rows and columns of \(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\) will differ from those of \(\mathbf{M}\).
The second thing to note is that since the result is also symmetric, once we have the elements of columns \(k\) and \(l\) we also have the elements of those rows.

Having so noted, we have
\[ \begin{align*} \left(\mathbf{M} \times \mathbf{R}\right)_{ik} &= \sum_p m_{ip} r_{pk} = m_{ik} r_{kk} + m_{il} r_{lk} = m_{ik} \cos \theta - m_{il} \sin \theta\\ \left(\mathbf{M} \times \mathbf{R}\right)_{il} &= \sum_p m_{ip} r_{pl} = m_{ik} r_{kl} + m_{il} r_{ll} = m_{ik} \sin \theta + m_{il} \cos \theta \end{align*} \]
Now for \(i\) not equal to either \(k\) or \(l\) we trivially have
\[ \begin{align*} \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{ik} &= \left(\mathbf{M} \times \mathbf{R}\right)_{ik} = m_{ik} \cos \theta - m_{il} \sin \theta\\ \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{il} &= \left(\mathbf{M} \times \mathbf{R}\right)_{il} = m_{ik} \sin \theta + m_{il} \cos \theta \end{align*} \]
Next we have
\[ \begin{align*} \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{kk} &= \sum_p \mathbf{R}^\mathrm{T}_{kp} \times \left(\mathbf{M} \times \mathbf{R}\right)_{pk} = \sum_p \mathbf{R}_{pk} \times \left(\mathbf{M} \times \mathbf{R}\right)_{pk}\\ &= \mathbf{R}_{kk} \times \left(\mathbf{M} \times \mathbf{R}\right)_{kk} + \mathbf{R}_{lk} \times \left(\mathbf{M} \times \mathbf{R}\right)_{lk}\\ &= \cos \theta \times \left(m_{kk} \cos \theta - m_{kl} \sin \theta\right) - \sin \theta \times \left(m_{lk} \cos \theta - m_{ll} \sin \theta\right)\\ &= m_{kk} \cos^2 \theta + m_{ll} \sin^2 \theta - 2m_{kl} \cos \theta \sin \theta \end{align*} \]
\[ \begin{align*} \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{ll} &= \sum_p \mathbf{R}^\mathrm{T}_{lp} \times \left(\mathbf{M} \times \mathbf{R}\right)_{pl} = \sum_p \mathbf{R}_{pl} \times \left(\mathbf{M} \times \mathbf{R}\right)_{pl}\\ &= \mathbf{R}_{kl} \times \left(\mathbf{M} \times \mathbf{R}\right)_{kl} + \mathbf{R}_{ll} \times \left(\mathbf{M} \times \mathbf{R}\right)_{ll}\\ &= \sin \theta \times \left(m_{kk} \sin \theta + m_{kl} \cos \theta \right) + \cos \theta \times \left(m_{lk} \sin \theta + m_{ll} \cos \theta \right)\\ &= m_{kk} \sin^2 \theta + m_{ll} \cos^2 \theta + 2m_{kl} \cos \theta \sin \theta \end{align*} \]
And finally
\[ \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{kl} = \left(\mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}\right)_{lk} = 0 \]
by the definition of \(\theta\).

Note that From the first step of derivation 4, we can also see that the accumulation of the rotations
\[ \mathbf{V}^\prime = \mathbf{V} \times \mathbf{R} \]
has elements
\[ \begin{align*} v^\prime_{ik} &= v_{ik} \cos \theta - v_{il} \sin \theta\\ v^\prime_{il} &= v_{ik} \sin \theta + v_{il} \cos \theta\\ v^\prime_{ij} &= v_{ij} \end{align*} \]
for \(j\) not equal to \(k\) or \(l\).

Now, zeroing a pair of elements may undo those zeroed in a previous step, but it will still leave the off-diagonal elements smaller than they were overall, as proven in derivation 5, so we can be certain that repeated sweeps through the matrix will ultimately converge to the spectral decomposition.

Derivation 5: The Size Of The Off-Diagonal Elements
Noting that \(m^\prime_{kl} = m^\prime_{lk} = 0\), the sum of the squares of the off-diagonal elements of the rows and columns \(k\) and \(l\) of \(\mathbf{M}^\prime\) is given by
\[ \begin{align*} 2 \sum_{i \neq k,l} m^{\prime 2}_{ik} + m^{\prime 2}_{il} &= 2 \sum_{i \neq k,l} \left(m_{ik} \cos \theta - m_{il} \sin \theta\right)^2 + \left(m_{ik} \sin \theta + m_{il} \cos \theta\right)^2\\ &= 2 \sum_{i \neq k,l} \Big( m^2_{ik} \cos^2 \theta - 2 m_{ik} m_{il} \cos \theta\ \sin \theta + m^2_{il} \sin^2 \theta\\ & \phantom{2 \sum_{i \neq k,l}} \quad \quad \quad \quad + m^2_{ik} \sin^2 \theta + 2 m_{ik} m_{il} \sin \theta\ \cos \theta + m^2_{il} \cos^2 \theta \Big)\\ &= 2 \sum_{i \neq k,l} m^2_{ik} \left(\cos^2 \theta + \sin^2 \theta\right) + m^2_{il} \left(\sin^2 \theta + \cos^2 \theta\right)\\ &= 2 \sum_{i \neq k,l} m^2_{ik} + m^2_{il} \end{align*} \]
Finally, since the elements \(m_{kl} = m_{lk} \neq 0\) and the elements of the rows and columns of \(\mathbf{M}\) and \(\mathbf{M}^\prime\) other than \(k\) and \(l\) are equal, the sum of the squares of the off-diagonal elements of \(\mathbf{M}^\prime\) must be \(2m_{kl}^2\) less than the sum of the squares of the off-diagonal elements of \(\mathbf{M}\).

Ideally then, we should zero the largest off-diagonal element of each \(\mathbf{M}_i\) so as to reduce the overall size of the off-diagonal elements by as much as possible. However, even exploiting the symmetry of the matrix, iterating over half of the off-diagonal elements to find it is an \(O\left(n^2\right)\) operation which, given that the cost of the Jacobi rotation is only \(O(n)\), is a rather unattractive proposition.
Fortunately, with a little book-keeping we can bring the cost down to a much more reasonable \(O(n)\); specifically by keeping track of the column index of the largest element to the left of the diagonal in each row. Now this has a one-off \(O\left(n^2\right)\) cost to set up but, crucially, reduces the cost of finding the largest lower off-diagonal element of the matrix to \(O(n)\) and, given that the rotation only affects rows and columns \(k\) and \(l\), an \(O(n)\) cost to update after each step.

So now we have a provably convergent algorithm for finding the spectral decomposition, and consequently the eigenvalues and eigenvectors, of a real symmetric matrix and in the next post we'll finally get around to implementing it.


[1] At The Foot Of The Eigen,, 2014.

[2] The Matrix Recoded,, 2014.

[3] Golub, G. and Van Loan, C. Matrix Computations (3rd ed.), John Hopkins University Press, 1996.

Leave a comment