Spryer Francis

| 0 Comments

Last time[1] we saw how we could use a sequence of Householder transformations to reduce a symmetric real matrix \(\mathbf{M}\) to a symmetric tridiagonal matrix, having zeros everywhere other than upon the leading, upper and lower diagonals, which we could then further reduce to a diagonal matrix \(\boldsymbol{\Lambda}\) using a sequence of Givens rotations to iteratively transform the elements upon the upper and lower diagonals to zero so that the columns of the accumulated transformations \(\mathbf{V}\) were the unit eigenvectors of \(\mathbf{M}\) and the elements on the leading diagonal of the result were their associated eigenvalues, satisfying
\[ \mathbf{M} \times \mathbf{V} = \mathbf{V} \times \boldsymbol{\Lambda} \]
and, since the transpose of \(\mathbf{V}\) is its own inverse
\[ \mathbf{M} = \mathbf{V} \times \boldsymbol{\Lambda} \times \mathbf{V}^\mathrm{T} \]
which is known as the spectral decomposition of \(\mathbf{M}\).
Unfortunately, the way that we used Givens rotations to diagonalise tridiagonal symmetric matrices wasn't particularly efficient and I concluded by stating that it could be significantly improved with a relatively minor change. In this post we shall see what it is and why it works.

Now, Givens rotations are matrices of the form
\[ \mathbf{G} = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots & & \vdots & & \vdots\\ 0 & \cdots & \cos \theta & \cdots & 0 & \cdots & \sin \theta & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots & & \vdots & & \vdots\\ 0 & \cdots & 0 & \cdots & 1 & \cdots & 0 & \cdots & 0\\ \vdots & & \vdots & & \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & -\sin \theta & \cdots & 0 & \cdots & \cos \theta & \cdots & 0\\ \vdots & & \vdots & & \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\ \end{pmatrix} \]
which, for an angle \(\theta\) and indices \(k\) less than \(l\), have the elements
\[ \begin{align*} g_{ii} &= 1 && i \neq k \wedge i \neq l\\ g_{ii} &= \cos \theta && i = k \vee i = l\\ g_{ij} &= \sin \theta && i = k \wedge j = l\\ g_{ij} &= -\sin \theta && i = l \wedge j = k\\ g_{ij} &= 0 && \text{otherwise} \end{align*} \]
where \(\wedge\) means and and \(\vee\) means or. Recall that these also have the property that their transposes are their inverses and have determinants equal to one
\[ \begin{align*} \mathbf{G}^\mathrm{T} \times \mathbf{G} &= \mathbf{G} \times \mathbf{G}^\mathrm{T} = \mathbf{I}\\ |\mathbf{G}| &= 1 \end{align*} \]
Pre-multiplying a matrix \(\mathbf{M}\) with the transpose of a Givens rotation yields
\[ \mathbf{M}^\prime = \mathbf{G}^\mathrm{T} \times \mathbf{M} \]
with the elements
\[ m^\prime_{ij} = \sum_{h=0}^{n-1} g^\mathrm{T}_{ih} \times m_{hj} = \sum_{h=0}^{n-1} g_{hi} \times m_{hj} = \begin{cases} m_{kj} \times \cos\theta - m_{lj} \times \sin\theta & i = k\\ m_{kj} \times \sin\theta + m_{lj} \times \cos\theta & i = l\\ m_{ij} & \text{otherwise} \end{cases} \]
and if we want to set \(m^\prime_{lk}\) to zero we require that
Figure 1: The Right-Angled Triangle
\[ \begin{align*} m_{kk} \times \sin\theta + m_{lk} \times \cos\theta &= 0\\ \tan \theta = \frac{\sin \theta}{\cos \theta} &= -\frac{m_{lk}}{m_{kk}} \end{align*} \]
We can use the same trick that we used last time to directly define the sine and cosine as
\[ \begin{align*} \sin \theta &= -\frac{m_{lk}}{\sqrt{m^2_{kk} + m^2_{lk}}}\\ \cos \theta &= \phantom{-}\frac{m_{kk}}{\sqrt{m^2_{kk} + m^2_{lk}}} \end{align*} \]
as illustrated by the right-angled triangle in figure 1.

If \(\mathbf{M}\) is tridiagonal and we want \(m^\prime_{10}\) to equal zero then, for \(j\) greater than two, we have
\[ \begin{align*} m^\prime_{00} &= m_{00} \times \cos \theta - m_{10} \times \sin \theta\\ m^\prime_{01} &= m_{01} \times \cos \theta - m_{11} \times \sin \theta\\ m^\prime_{02} &= m_{02} \times \cos \theta - m_{12} \times \sin \theta\\ m^\prime_{0j} &= 0 \end{align*} \]
and
\[ \begin{align*} m^\prime_{10} &= 0\\ m^\prime_{11} &= m_{01} \times \sin \theta + m_{11} \times \cos \theta\\ m^\prime_{12} &= m_{02} \times \sin \theta + m_{12} \times \cos \theta\\ m^\prime_{1j} &= 0 \end{align*} \]
since for all such \(j\) both \(m_{0j}\) and \(m_{1j}\) equal zero, with all of the remaining elements of \(\mathbf{M}^\prime\) being equal to those of \(\mathbf{M}\) since only the first and second rows are transformed, so that it takes the form
\[ \begin{pmatrix} m^\prime_{00} & m^\prime_{01} & m^\prime_{02} & 0 & 0 & \dots\\ 0 & m^\prime_{11} & m^\prime_{12} & 0 & 0 & \dots\\ 0 & m^\prime_{12} & m^\prime_{22} & m^\prime_{23} & 0 & \dots\\ 0 & 0 & m^\prime_{23} & m^\prime_{33} & m^\prime_{34} & \dots\\ 0 & 0 & 0 & m^\prime_{34} & m^\prime_{44} & \dots\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix} \]
Similarly, if we apply another Givens rotation to \(\mathbf{M}^\prime\) to yield a matrix \(\mathbf{M}^{\prime\prime}\) such that \(m^{\prime\prime}_{12}\) equals zero then its second and third rows contain the elements
\[ \begin{align*} m^\prime_{11} &= m_{11} \times \cos \theta - m_{21} \times \sin \theta\\ m^\prime_{12} &= m_{12} \times \cos \theta - m_{22} \times \sin \theta\\ m^\prime_{13} &= m_{13} \times \cos \theta - m_{23} \times \sin \theta\\ m^\prime_{1j} &= 0 \end{align*} \]
and
Derivation 1: The Elements Of R
Suppose that we have a matrix \(\mathbf{M}\) whose \(k^\mathrm{th}\) row's only non-zero element is a one on the leading diagonal and whose following row has zeros up to and including the \(k^\mathrm{th}\) column so that they look like
\[ \begin{pmatrix} \cdots & 0 & 0 & 1 & 0 & 0 & \cdots\\ \cdots & 0 & 0 & 0 & m_{k+1,k+1} & m_{k+1,k+2} & \cdots \end{pmatrix} \]
If we pre-multiply \(\mathbf{M}\) by the Givens rotation \(\mathbf{G}_k\) whose \(k^\mathrm{th}\) and \((k+1)^\mathrm{th}\) rows are
\[ \begin{pmatrix} \cdots & 0 & 0 & \phantom{-}\cos \theta & \sin \theta & 0 & \cdots\\ \cdots & 0 & 0 & -\sin \theta & \cos \theta & 0 & \cdots \end{pmatrix} \]
to yield
\[ \mathbf{M}^\prime = \mathbf{G}_k \times \mathbf{M} \]
then its elements are given by
\[ m^\prime_{i,j} = \begin{cases} \cos \theta & i=k \wedge j=k\\ m_{k+1,j} \times \sin \theta & i=k \wedge j>k\\ -\sin \theta & i=k+1 \wedge j=k\\ m_{k+1,j} \times \cos \theta & i=k+1 \wedge j>k\\ m_{i,j} & \text{otherwise} \end{cases} \]
so that the first non-zero element on its \(k^\mathrm{th}\) row is on its leading diagonal and the first on its \((k+1)^\mathrm{th}\) row is on its lower diagonal.
Since \(\mathbf{G}_{n-2}\) satisfies our supposed properties of \(\mathbf{M}\) for \(k\) equal to \(n-2\), \(\mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\) will have the properties of \(\mathbf{M}^\prime\) and will consequently satisfy the former's properties for \(k\) equal to \(n-3\). This means that as we accumulate \(\mathbf{R}\) from the last product to the first we will introduce non-zero elements on and above the lower diagonal whilst leaving zeros beneath it.
\[ \begin{align*} m^\prime_{21} &= 0\\ m^\prime_{22} &= m_{12} \times \cos \theta - m_{22} \times \sin \theta\\ m^\prime_{23} &= m_{13} \times \cos \theta - m_{23} \times \sin \theta\\ m^\prime_{2j} &= 0 \end{align*} \]
for \(j\) less than one and greater than three since \(m^\prime_{1j}\) and \(m^\prime_{2j}\) equal zero in such cases, and the remaining rows contain the same elements as \(\mathbf{M}^\prime\). \(\mathbf{M}^{\prime\prime}\) thus takes the form
\[ \begin{pmatrix} m^{\prime\prime}_{00} & m^{\prime\prime}_{01} & m^{\prime\prime}_{02} & 0 & 0 & \cdots\\ 0 & m^{\prime\prime}_{11} & m^{\prime\prime}_{12} & m^{\prime\prime}_{13} & 0 & \cdots\\ 0 & 0 & m^{\prime\prime}_{22} & m^{\prime\prime}_{23} & 0 & \cdots\\ 0 & 0 & m^{\prime\prime}_{23} & m^{\prime\prime}_{33} & m^{\prime\prime}_{34} & \cdots\\ 0 & 0 & 0 & m^{\prime\prime}_{34} & m^{\prime\prime}_{44} & \cdots\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix} \]
and we can consequently keep applying Givens rotations to yield a matrix whose elements are equal to zero below the leading diagonal and above the first two upper diagonals.
If we accumulate them into a matrix
\[ \mathbf{R} = \mathbf{G}_0 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n-3} \times \mathbf{G}_{n-2} \]
then all of the elements beneath its lower diagonal will equal zero, as proven by derivation 1.
As is the case for each of the component rotations \(\mathbf{G}_i\) the transpose of \(\mathbf{R}\) is equal to its inverse so that
\[ \mathbf{R}^\mathrm{T} \times \mathbf{R} = \mathbf{R} \times \mathbf{R}^\mathrm{T} = \mathbf{I} \]
This trivially follows from the fact that the transpose of a product of matrices is the product of their transposes in reverse order and consequently
\[ \begin{align*} \mathbf{R}^\mathrm{T} \times \mathbf{R} &= \left(\mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{G}^\mathrm{T}_0\right) \times \left(\mathbf{G}_0 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\right)\\ &= \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \left(\mathbf{G}^\mathrm{T}_0 \times \mathbf{G}_0\right) \times \mathbf{G}_1 \times \dots \mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\\ &= \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{I} \times \mathbf{G}_1 \times \dots \mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\\ &= \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\\ &= \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \dots \times \mathbf{I} \times \dots \times \mathbf{G}_{n-3} \times \mathbf{G}_{n-2}\\ &= \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}^\mathrm{T}_{n-3} \times \mathbf{G}_{n-3} \times \mathbf{G}_{n-2} = \mathbf{G}^\mathrm{T}_{n-2} \times \mathbf{G}_{n-2} = \mathbf{I} \end{align*} \]
If we apply such rotations to a symmetric tridiagonal matrix \(\mathbf{M}\) to yield
\[ \mathbf{M}^\ast = \mathbf{R}^\mathrm{T} \times \mathbf{M} \]
then
\[ \mathbf{M}^\ast \times \mathbf{R} = \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} \]
which, as we saw last time, is also symmetric and tridiagonal.

Get A Shift On

If we define
\[ \mathbf{M}^\dagger = \mathbf{R}^\mathrm{T} \times \left(\mathbf{M} - s \times \mathbf{I}\right) \]
for some scalar \(s\), known as a shift, then we have
\[ \begin{align*} \mathbf{M}^\dagger \times \mathbf{R} &= \mathbf{R}^\mathrm{T} \times \left(\mathbf{M} - s \times \mathbf{I}\right) \times \mathbf{R}\\ &= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} - \mathbf{R}^\mathrm{T} \times s \times \mathbf{I} \times \mathbf{R}\\ &= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} - s \times \mathbf{R}^\mathrm{T} \times \mathbf{R}\\ &= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} - s \times \mathbf{I} \end{align*} \]
and consequently
\[ \mathbf{M}^\dagger \times \mathbf{R} + s \times \mathbf{I} = \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R} \]
If \(\mathbf{v}\) is an eigenvector of \(\mathbf{M}\) with an eigenvalue of \(\lambda\) then
\[ \begin{align*} \mathbf{M} \times \mathbf{v} &= \lambda \times \mathbf{v}\\ \mathbf{M} \times \mathbf{v} &- \lambda \times \mathbf{v} = \mathbf{0} \end{align*} \]
which we can rewrite as
\[ \left(\mathbf{M} - \lambda \times \mathbf{I}\right) \times \mathbf{v} = \mathbf{0} \]
Assuming that there are no zeros on the upper and lower diagonals of \(\mathbf{M}\) then, if \(s\) equals \(\lambda\) the element in the bottom right corner of \(\mathbf{M}^\dagger\) must equal zero, as proven by derivation 2.

Derivation 2: The Bottom Right Element Of M
If we define the \(n\) by \(n\) symmetric matrix
\[ \hat{\mathbf{M}}^{(0)} = \mathbf{M} - \lambda \times \mathbf{I} \]
and apply \(k\) Givens rotations to yield a matrix \(\hat{\mathbf{M}}^{(k)}\) that has zeros for the first \(k\) elements of its lower diagonal, then we can create a matrix that has zeros for its first \(k+1\) lower diagonal elements with a further Givens rotation
\[ \hat{\mathbf{M}}^{(k+1)} = \mathbf{G}_k^\mathrm{T} \times \hat{\mathbf{M}}^{(k)} \]
The \((k+1)^\mathrm{th}\) element on its leading diagonal is consequently
\[ \begin{align*} \hat{m}^{(k+1)}_{k,k} &= \hat{m}^{(k)}_{k,k} \times \cos \theta - \hat{m}^{(k)}_{k+1,k} \times \sin \theta\\ &= \hat{m}^{(k)}_{k,k} \times \frac{\hat{m}^{(k)}_{k,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}} + \hat{m}^{(k)}_{k+1,k} \times \frac{\hat{m}^{(k)}_{k+1,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}}\\ &= \frac{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}} = \sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}} > 0 \end{align*} \]
Noting that
\[ \mathbf{M}^\dagger = \hat{\mathbf{M}}^{(n-1)} \]
it follows by induction that the first \(n-1\) elements on its leading diagonal must be greater than zero.
Furthermore, the determinant of a product of matrices equals the product of their determinants and each Givens rotation has a determinant equal to one and so we have
\[ \left|\mathbf{M}^\dagger\right| = \left|\hat{\mathbf{M}}^{(0)}\right| \]
Now the eigenvalue of \(\hat{\mathbf{M}}^{(0)}\) associated with its eigenvector \(\mathbf{v}\) is equal to zero and, since the determinant of a symmetric matrix is equal to the product of its eigenvalues, the determinant of \(\mathbf{M}^\dagger\) must also equal zero.
Finally, \(\mathbf{M}^\dagger\) doesn't have any non-zero elements below its leading diagonal and so is upper triangular. Given that the determinant of a triangular matrix is the product of the elements on its leading diagonal and the first \(n-1\) of them are greater than zero, it must be the case that the last equals zero.

Since every element beneath the leading diagonal of \(\mathbf{M}^\dagger\) equals zero, this means that its last row doesn't have any non-zero elements and so neither does \(\mathbf{M}^\dagger \times \mathbf{R}\) which, being symmetric, cannot have any non-zero elements in its last column either. Adding \(\lambda \times \mathbf{I}\) to it yields a matrix that has a value of \(\lambda\) for the last element of its leading diagonal and zeros for every other element in its last row and column so that the last column of \(\mathbf{R}\) must be the eigenvector \(\mathbf{v}\).
Now we can find the eigenvector associated with another eigenvalue of \(\mathbf{M}\) by applying this process to all but the last row and column of \(\mathbf{M}^\dagger\) and consequently, by repetition, discover them all.

Whilst this is a tremendously efficient scheme, it does have a minor drawback which I'm sure hasn't escaped your notice; we don't know what the eigenvalues are.
We can, however, make an educated guess at a number that might be close to one of them. For example, the eigenvalue of the one by one sub-matrix that is the bottom right corner of \(\mathbf{M}\) is trivially the value of that element and so we might try iteratively using that for \(s\), as demonstrated by program 1.

Program 1: Using A Shift

By setting shift to false you can compare the number of iterations that this scheme takes to that of our original approach to diagonalisation, with the halt button allowing you to stop it if it ends up taking too much time.
We can generally do even better by choosing to shift by the eigenvalue of the bottom right two by two sub-matrix of \(\mathbf{M}\) that is closest to its bottom right element, known as a Wilkinson shift[2] and given by
\[ \begin{align*} d &= \frac{m_{n-2,m-2}-m_{n-1,n-1}}{2}\\ s &= m_{n-1,n-1} + d - \mathrm{sign}(d) \times \sqrt{d^2 + m_{n-1,n-2}^2} \end{align*} \]
as proven in derivation 3

Derivation 3: The Wilkinson Shift
The eigensystem of a two by two symmetric matrix satisfies
\[ \begin{pmatrix} a & b\\ b & c \end{pmatrix} \times \mathbf{v} = \lambda \times \mathbf{v} \]
which we can rewrite as
\[ \left( \begin{pmatrix} a & b\\ b & c \end{pmatrix} - \lambda \times \mathbf{I} \right) \times \mathbf{v} = \begin{pmatrix} a-\lambda & b\\ b & c-\lambda \end{pmatrix} \times \mathbf{v} = \mathbf{0} \]
and so
\[ \begin{vmatrix} a-\lambda & b\\ b & c-\lambda \end{vmatrix} = (a-\lambda) \times (c-\lambda) - b^2 = \lambda^2 - (a+c) \times \lambda + a \times c - b^2 = 0 \]
Solving the quadratic equation for \(\lambda\) yields
\[ \begin{align*} \lambda &= \frac{a+c \pm \sqrt{(a+c)^2 - 4 \times \left(a \times c - b^2\right)}}{2}\\ &= \frac{a+c}{2} \pm \frac{\sqrt{a^2 + 2 \times a \times c + c^2 - 4 \times a \times c - 4 \times b^2}}{2}\\ &= \frac{a+c}{2} \pm \frac{\sqrt{a^2 - 2 \times a \times c + c^2 - 4 \times b^2}}{2}\\ &= \frac{a+c}{2} \pm \frac{\sqrt{(a-c)^2 - 4 \times b^2}}{2} = c + \frac{a-c}{2} \pm \sqrt{\left(\frac{a-c}{2}\right)^2 - b^2} \end{align*} \]
which is closest to \(c\) when the square root takes the opposite sign to \(\frac{a-c}{2}\).

and used in program 2.

Program 2: Using A Wilkinson Shift

This, together with the prior Householder transformations, is a special case of Francis's QR transformation[3][4] which, with one last modification, is the preferred method for finding eigensystems of symmetric matrices. Specifically, it turns out that we don't actually need to explicitly perform the shift but can instead start with a Givens rotation of the first two rows and columns using
\[ \begin{align*} \sin \theta &= -\frac{m_{1,0}}{\sqrt{\left(m_{0,0}-s\right)^2 + m_{1,0}^2}}\\ \cos \theta &= \phantom{-}\frac{m_{0,0}-s}{\sqrt{\left(m_{0,0}-s\right)^2 + m_{1,0}^2}} \end{align*} \]
where
\[ \begin{align*} d &= \frac{m_{n-2,n-2}-m_{n-1,n-1}}{2}\\ s &= m_{n-1,n-1} - \frac{m_{n-1,n-2}^2}{d + \mathrm{sign}(d) \times \sqrt{d^2 + m_{n-1,n-2}^2}} \end{align*} \]
and then iteratively apply further rotations to the \(k^\mathrm{th}\) and \((k+1)^\mathrm{th}\) rows and columns using
\[ \begin{align*} \sin \theta &= -\frac{m_{k+1,k-1}}{\sqrt{m_{k,k-1}^2+m_{k+1,k-1}^2}}\\ \cos \theta &= \phantom{-}\frac{m_{k,k-1}}{\sqrt{m_{k,k-1}^2+m_{k+1,k-1}^2}} \end{align*} \]
for \(k\) from one to \(n-2\) to transform the elements beneath the lower and above the upper diagonals to zero. This is known as an implicit shift and is demonstrated by program 3.

Program 3: Using An Implicit Shift

Now I must confess that I don't know exactly why this is equivalent to the original transformation and have simply accepted the word of my linear algebra textbooks. Nevertheless, I shall be using it to implement the spectral decomposition.

References

[1] FAO The Householder, www.thusspakeak.com, 2019.

[2] Golub, G. and Van Loan, C., Matrix Computations (3rd ed.), John Hopkins University Press, 1996.

[3] Francis, J., The QR Transformation - Part 1, The Computer Journal, Vol. 4, No. 3, Oxford University Press, 1961.

[4] Francis, J., The QR Transformation - Part 2, The Computer Journal, Vol. 4, No. 4, Oxford University Press, 1961.

Leave a comment