Last time^{[1]} we saw how we could use a sequence of Householder transformations to reduce a symmetric real matrix \(\mathbf{M}\) to a symmetric tridiagonal matrix, having zeros everywhere other than upon the leading, upper and lower diagonals, which we could then further reduce to a diagonal matrix \(\boldsymbol{\Lambda}\) using a sequence of Givens rotations to iteratively transform the elements upon the upper and lower diagonals to zero so that the columns of the accumulated transformations \(\mathbf{V}\) were the unit eigenvectors of \(\mathbf{M}\) and the elements on the leading diagonal of the result were their associated eigenvalues, satisfying
Unfortunately, the way that we used Givens rotations to diagonalise tridiagonal symmetric matrices wasn't particularly efficient and I concluded by stating that it could be significantly improved with a relatively minor change. In this post we shall see what it is and why it works.
Now, Givens rotations are matrices of the form
If \(\mathbf{M}\) is tridiagonal and we want \(m^\prime_{10}\) to equal zero then, for \(j\) greater than two, we have
If we accumulate them into a matrix
As is the case for each of the component rotations \(\mathbf{G}_i\) the transpose of \(\mathbf{R}\) is equal to its inverse so that
Since every element beneath the leading diagonal of \(\mathbf{M}^\dagger\) equals zero, this means that its last row doesn't have any nonzero elements and so neither does \(\mathbf{M}^\dagger \times \mathbf{R}\) which, being symmetric, cannot have any nonzero elements in its last column either. Adding \(\lambda \times \mathbf{I}\) to it yields a matrix that has a value of \(\lambda\) for the last element of its leading diagonal and zeros for every other element in its last row and column so that the last column of \(\mathbf{R}\) must be the eigenvector \(\mathbf{v}\).
Now we can find the eigenvector associated with another eigenvalue of \(\mathbf{M}\) by applying this process to all but the last row and column of \(\mathbf{M}^\dagger\) and consequently, by repetition, discover them all.
Whilst this is a tremendously efficient scheme, it does have a minor drawback which I'm sure hasn't escaped your notice; we don't know what the eigenvalues are.
We can, however, make an educated guess at a number that might be close to one of them. For example, the eigenvalue of the one by one submatrix that is the bottom right corner of \(\mathbf{M}\) is trivially the value of that element and so we might try iteratively using that for \(s\), as demonstrated by program 1.
By setting
We can generally do even better by choosing to shift by the eigenvalue of the bottom right two by two submatrix of \(\mathbf{M}\) that is closest to its bottom right element, known as a Wilkinson shift^{[2]} and given by
and used in program 2.
This, together with the prior Householder transformations, is a special case of Francis's QR transformation^{[3][4]} which, with one last modification, is the preferred method for finding eigensystems of symmetric matrices. Specifically, it turns out that we don't actually need to explicitly perform the shift but can instead start with a Givens rotation of the first two rows and columns using
Now I must confess that I don't know exactly why this is equivalent to the original transformation and have simply accepted the word of my linear algebra textbooks. Nevertheless, I shall be using it to implement the spectral decomposition.
[2] Golub, G. and Van Loan, C., Matrix Computations (3rd ed.), John Hopkins University Press, 1996.
[3] Francis, J., The QR Transformation  Part 1, The Computer Journal, Vol. 4, No. 3, Oxford University Press, 1961.
[4] Francis, J., The QR Transformation  Part 2, The Computer Journal, Vol. 4, No. 4, Oxford University Press, 1961.
\[
\mathbf{M} \times \mathbf{V} = \mathbf{V} \times \boldsymbol{\Lambda}
\]
and, since the transpose of \(\mathbf{V}\) is its own inverse
\[
\mathbf{M} = \mathbf{V} \times \boldsymbol{\Lambda} \times \mathbf{V}^\mathrm{T}
\]
which is known as the spectral decomposition of \(\mathbf{M}\).Unfortunately, the way that we used Givens rotations to diagonalise tridiagonal symmetric matrices wasn't particularly efficient and I concluded by stating that it could be significantly improved with a relatively minor change. In this post we shall see what it is and why it works.
Now, Givens rotations are matrices of the form
\[
\mathbf{G} =
\begin{pmatrix}
1 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 0\\
\vdots & \ddots & \vdots & & \vdots & & \vdots & & \vdots\\
0 & \cdots & \cos \theta & \cdots & 0 & \cdots & \sin \theta & \cdots & 0\\
\vdots & & \vdots & \ddots & \vdots & & \vdots & & \vdots\\
0 & \cdots & 0 & \cdots & 1 & \cdots & 0 & \cdots & 0\\
\vdots & & \vdots & & \vdots & \ddots & \vdots & & \vdots\\
0 & \cdots & \sin \theta & \cdots & 0 & \cdots & \cos \theta & \cdots & 0\\
\vdots & & \vdots & & \vdots & & \vdots & \ddots & \vdots\\
0 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\
\end{pmatrix}
\]
which, for an angle \(\theta\) and indices \(k\) less than \(l\), have the elements
\[
\begin{align*}
g_{ii} &= 1 && i \neq k \wedge i \neq l\\
g_{ii} &= \cos \theta && i = k \vee i = l\\
g_{ij} &= \sin \theta && i = k \wedge j = l\\
g_{ij} &= \sin \theta && i = l \wedge j = k\\
g_{ij} &= 0 && \text{otherwise}
\end{align*}
\]
where \(\wedge\) means and and \(\vee\) means or. Recall that these also have the property that their transposes are their inverses and have determinants equal to one
\[
\begin{align*}
\mathbf{G}^\mathrm{T} \times \mathbf{G} &= \mathbf{G} \times \mathbf{G}^\mathrm{T} = \mathbf{I}\\
\mathbf{G} &= 1
\end{align*}
\]
Premultiplying a matrix \(\mathbf{M}\) with the transpose of a Givens rotation yields
\[
\mathbf{M}^\prime = \mathbf{G}^\mathrm{T} \times \mathbf{M}
\]
with the elements
\[
m^\prime_{ij} = \sum_{h=0}^{n1} g^\mathrm{T}_{ih} \times m_{hj}
= \sum_{h=0}^{n1} g_{hi} \times m_{hj}
=
\begin{cases}
m_{kj} \times \cos\theta  m_{lj} \times \sin\theta & i = k\\
m_{kj} \times \sin\theta + m_{lj} \times \cos\theta & i = l\\
m_{ij} & \text{otherwise}
\end{cases}
\]
and if we want to set \(m^\prime_{lk}\) to zero we require that
Figure 1: The RightAngled Triangle
\[
\begin{align*}
m_{kk} \times \sin\theta + m_{lk} \times \cos\theta &= 0\\
\tan \theta = \frac{\sin \theta}{\cos \theta} &= \frac{m_{lk}}{m_{kk}}
\end{align*}
\]
We can use the same trick that we used last time to directly define the sine and cosine as
\[
\begin{align*}
\sin \theta &= \frac{m_{lk}}{\sqrt{m^2_{kk} + m^2_{lk}}}\\
\cos \theta &= \phantom{}\frac{m_{kk}}{\sqrt{m^2_{kk} + m^2_{lk}}}
\end{align*}
\]
as illustrated by the rightangled triangle in figure 1.If \(\mathbf{M}\) is tridiagonal and we want \(m^\prime_{10}\) to equal zero then, for \(j\) greater than two, we have
\[
\begin{align*}
m^\prime_{00} &= m_{00} \times \cos \theta  m_{10} \times \sin \theta\\
m^\prime_{01} &= m_{01} \times \cos \theta  m_{11} \times \sin \theta\\
m^\prime_{02} &= m_{02} \times \cos \theta  m_{12} \times \sin \theta\\
m^\prime_{0j} &= 0
\end{align*}
\]
and
\[
\begin{align*}
m^\prime_{10} &= 0\\
m^\prime_{11} &= m_{01} \times \sin \theta + m_{11} \times \cos \theta\\
m^\prime_{12} &= m_{02} \times \sin \theta + m_{12} \times \cos \theta\\
m^\prime_{1j} &= 0
\end{align*}
\]
since for all such \(j\) both \(m_{0j}\) and \(m_{1j}\) equal zero, with all of the remaining elements of \(\mathbf{M}^\prime\) being equal to those of \(\mathbf{M}\) since only the first and second rows are transformed, so that it takes the form
\[
\begin{pmatrix}
m^\prime_{00} & m^\prime_{01} & m^\prime_{02} & 0 & 0 & \dots\\
0 & m^\prime_{11} & m^\prime_{12} & 0 & 0 & \dots\\
0 & m^\prime_{12} & m^\prime_{22} & m^\prime_{23} & 0 & \dots\\
0 & 0 & m^\prime_{23} & m^\prime_{33} & m^\prime_{34} & \dots\\
0 & 0 & 0 & m^\prime_{34} & m^\prime_{44} & \dots\\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots
\end{pmatrix}
\]
Similarly, if we apply another Givens rotation to \(\mathbf{M}^\prime\) to yield a matrix \(\mathbf{M}^{\prime\prime}\) such that \(m^{\prime\prime}_{12}\) equals zero then its second and third rows contain the elements
\[
\begin{align*}
m^\prime_{11} &= m_{11} \times \cos \theta  m_{21} \times \sin \theta\\
m^\prime_{12} &= m_{12} \times \cos \theta  m_{22} \times \sin \theta\\
m^\prime_{13} &= m_{13} \times \cos \theta  m_{23} \times \sin \theta\\
m^\prime_{1j} &= 0
\end{align*}
\]
and
Derivation 1: The Elements Of R
Suppose that we have a matrix \(\mathbf{M}\) whose \(k^\mathrm{th}\) row's only nonzero element is a one on the leading diagonal and whose following row has zeros up to and including the \(k^\mathrm{th}\) column so that they look like
Since \(\mathbf{G}_{n2}\) satisfies our supposed properties of \(\mathbf{M}\) for \(k\) equal to \(n2\), \(\mathbf{G}_{n3} \times \mathbf{G}_{n2}\) will have the properties of \(\mathbf{M}^\prime\) and will consequently satisfy the former's properties for \(k\) equal to \(n3\). This means that as we accumulate \(\mathbf{R}\) from the last product to the first we will introduce nonzero elements on and above the lower diagonal whilst leaving zeros beneath it.
\[
\begin{pmatrix}
\cdots & 0 & 0 & 1 & 0 & 0 & \cdots\\
\cdots & 0 & 0 & 0 & m_{k+1,k+1} & m_{k+1,k+2} & \cdots
\end{pmatrix}
\]
If we premultiply \(\mathbf{M}\) by the Givens rotation \(\mathbf{G}_k\) whose \(k^\mathrm{th}\) and \((k+1)^\mathrm{th}\) rows are
\[
\begin{pmatrix}
\cdots & 0 & 0 & \phantom{}\cos \theta & \sin \theta & 0 & \cdots\\
\cdots & 0 & 0 & \sin \theta & \cos \theta & 0 & \cdots
\end{pmatrix}
\]
to yield
\[
\mathbf{M}^\prime = \mathbf{G}_k \times \mathbf{M}
\]
then its elements are given by
\[
m^\prime_{i,j} =
\begin{cases}
\cos \theta & i=k \wedge j=k\\
m_{k+1,j} \times \sin \theta & i=k \wedge j>k\\
\sin \theta & i=k+1 \wedge j=k\\
m_{k+1,j} \times \cos \theta & i=k+1 \wedge j>k\\
m_{i,j} & \text{otherwise}
\end{cases}
\]
so that the first nonzero element on its \(k^\mathrm{th}\) row is on its leading diagonal and the first on its \((k+1)^\mathrm{th}\) row is on its lower diagonal.Since \(\mathbf{G}_{n2}\) satisfies our supposed properties of \(\mathbf{M}\) for \(k\) equal to \(n2\), \(\mathbf{G}_{n3} \times \mathbf{G}_{n2}\) will have the properties of \(\mathbf{M}^\prime\) and will consequently satisfy the former's properties for \(k\) equal to \(n3\). This means that as we accumulate \(\mathbf{R}\) from the last product to the first we will introduce nonzero elements on and above the lower diagonal whilst leaving zeros beneath it.
\[
\begin{align*}
m^\prime_{21} &= 0\\
m^\prime_{22} &= m_{12} \times \cos \theta  m_{22} \times \sin \theta\\
m^\prime_{23} &= m_{13} \times \cos \theta  m_{23} \times \sin \theta\\
m^\prime_{2j} &= 0
\end{align*}
\]
for \(j\) less than one and greater than three since \(m^\prime_{1j}\) and \(m^\prime_{2j}\) equal zero in such cases, and the remaining rows contain the same elements as \(\mathbf{M}^\prime\). \(\mathbf{M}^{\prime\prime}\) thus takes the form
\[
\begin{pmatrix}
m^{\prime\prime}_{00} & m^{\prime\prime}_{01} & m^{\prime\prime}_{02} & 0 & 0 & \cdots\\
0 & m^{\prime\prime}_{11} & m^{\prime\prime}_{12} & m^{\prime\prime}_{13} & 0 & \cdots\\
0 & 0 & m^{\prime\prime}_{22} & m^{\prime\prime}_{23} & 0 & \cdots\\
0 & 0 & m^{\prime\prime}_{23} & m^{\prime\prime}_{33} & m^{\prime\prime}_{34} & \cdots\\
0 & 0 & 0 & m^{\prime\prime}_{34} & m^{\prime\prime}_{44} & \cdots\\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots
\end{pmatrix}
\]
and we can consequently keep applying Givens rotations to yield a matrix whose elements are equal to zero below the leading diagonal and above the first two upper diagonals.If we accumulate them into a matrix
\[
\mathbf{R} = \mathbf{G}_0 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n3} \times \mathbf{G}_{n2}
\]
then all of the elements beneath its lower diagonal will equal zero, as proven by derivation 1.As is the case for each of the component rotations \(\mathbf{G}_i\) the transpose of \(\mathbf{R}\) is equal to its inverse so that
\[
\mathbf{R}^\mathrm{T} \times \mathbf{R} = \mathbf{R} \times \mathbf{R}^\mathrm{T} = \mathbf{I}
\]
This trivially follows from the fact that the transpose of a product of matrices is the product of their transposes in reverse order and consequently
\[
\begin{align*}
\mathbf{R}^\mathrm{T} \times \mathbf{R}
&= \left(\mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{G}^\mathrm{T}_0\right) \times \left(\mathbf{G}_0 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n3} \times \mathbf{G}_{n2}\right)\\
&= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \left(\mathbf{G}^\mathrm{T}_0 \times \mathbf{G}_0\right) \times \mathbf{G}_1 \times \dots \mathbf{G}_{n3} \times \mathbf{G}_{n2}\\
&= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{I} \times \mathbf{G}_1 \times \dots \mathbf{G}_{n3} \times \mathbf{G}_{n2}\\
&= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \dots \mathbf{G}^\mathrm{T}_1 \times \mathbf{G}_1 \times \dots \mathbf{G}_{n3} \times \mathbf{G}_{n2}\\
&= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \dots \times \mathbf{I} \times \dots \times \mathbf{G}_{n3} \times \mathbf{G}_{n2}\\
&= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}^\mathrm{T}_{n3} \times \mathbf{G}_{n3} \times \mathbf{G}_{n2}
= \mathbf{G}^\mathrm{T}_{n2} \times \mathbf{G}_{n2}
= \mathbf{I}
\end{align*}
\]
If we apply such rotations to a symmetric tridiagonal matrix \(\mathbf{M}\) to yield
\[
\mathbf{M}^\ast = \mathbf{R}^\mathrm{T} \times \mathbf{M}
\]
then
\[
\mathbf{M}^\ast \times \mathbf{R} = \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}
\]
which, as we saw last time, is also symmetric and tridiagonal.
Get A Shift On
If we define
\[
\mathbf{M}^\dagger = \mathbf{R}^\mathrm{T} \times \left(\mathbf{M}  s \times \mathbf{I}\right)
\]
for some scalar \(s\), known as a shift, then we have
\[
\begin{align*}
\mathbf{M}^\dagger \times \mathbf{R}
&= \mathbf{R}^\mathrm{T} \times \left(\mathbf{M}  s \times \mathbf{I}\right) \times \mathbf{R}\\
&= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}  \mathbf{R}^\mathrm{T} \times s \times \mathbf{I} \times \mathbf{R}\\
&= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}  s \times \mathbf{R}^\mathrm{T} \times \mathbf{R}\\
&= \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}  s \times \mathbf{I}
\end{align*}
\]
and consequently
\[
\mathbf{M}^\dagger \times \mathbf{R} + s \times \mathbf{I} = \mathbf{R}^\mathrm{T} \times \mathbf{M} \times \mathbf{R}
\]
If \(\mathbf{v}\) is an eigenvector of \(\mathbf{M}\) with an eigenvalue of \(\lambda\) then
\[
\begin{align*}
\mathbf{M} \times \mathbf{v} &= \lambda \times \mathbf{v}\\
\mathbf{M} \times \mathbf{v} & \lambda \times \mathbf{v} = \mathbf{0}
\end{align*}
\]
which we can rewrite as
\[
\left(\mathbf{M}  \lambda \times \mathbf{I}\right) \times \mathbf{v} = \mathbf{0}
\]
Assuming that there are no zeros on the upper and lower diagonals of \(\mathbf{M}\) then, if \(s\) equals \(\lambda\) the element in the bottom right corner of \(\mathbf{M}^\dagger\) must equal zero, as proven by derivation 2.
Derivation 2: The Bottom Right Element Of M^{†}
If we define the \(n\) by \(n\) symmetric matrix
Furthermore, the determinant of a product of matrices equals the product of their determinants and each Givens rotation has a determinant equal to one and so we have
Finally, \(\mathbf{M}^\dagger\) doesn't have any nonzero elements below its leading diagonal and so is upper triangular. Given that the determinant of a triangular matrix is the product of the elements on its leading diagonal and the first \(n1\) of them are greater than zero, it must be the case that the last equals zero.
\[
\hat{\mathbf{M}}^{(0)} = \mathbf{M}  \lambda \times \mathbf{I}
\]
and apply \(k\) Givens rotations to yield a matrix \(\hat{\mathbf{M}}^{(k)}\) that has zeros for the first \(k\) elements of its lower diagonal, then we can create a matrix that has zeros for its first \(k+1\) lower diagonal elements with a further Givens rotation
\[
\hat{\mathbf{M}}^{(k+1)} = \mathbf{G}_k^\mathrm{T} \times \hat{\mathbf{M}}^{(k)}
\]
The \((k+1)^\mathrm{th}\) element on its leading diagonal is consequently
\[
\begin{align*}
\hat{m}^{(k+1)}_{k,k} &= \hat{m}^{(k)}_{k,k} \times \cos \theta  \hat{m}^{(k)}_{k+1,k} \times \sin \theta\\
&= \hat{m}^{(k)}_{k,k} \times \frac{\hat{m}^{(k)}_{k,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}}
+ \hat{m}^{(k)}_{k+1,k} \times \frac{\hat{m}^{(k)}_{k+1,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}}\\
&= \frac{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}{\sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}}}
= \sqrt{\hat{m}^{(k)\,2}_{k,k}+\hat{m}^{(k)\,2}_{k+1,k}} > 0
\end{align*}
\]
Noting that
\[
\mathbf{M}^\dagger = \hat{\mathbf{M}}^{(n1)}
\]
it follows by induction that the first \(n1\) elements on its leading diagonal must be greater than zero.Furthermore, the determinant of a product of matrices equals the product of their determinants and each Givens rotation has a determinant equal to one and so we have
\[
\left\mathbf{M}^\dagger\right = \left\hat{\mathbf{M}}^{(0)}\right
\]
Now the eigenvalue of \(\hat{\mathbf{M}}^{(0)}\) associated with its eigenvector \(\mathbf{v}\) is equal to zero and, since the determinant of a symmetric matrix is equal to the product of its eigenvalues, the determinant of \(\mathbf{M}^\dagger\) must also equal zero.Finally, \(\mathbf{M}^\dagger\) doesn't have any nonzero elements below its leading diagonal and so is upper triangular. Given that the determinant of a triangular matrix is the product of the elements on its leading diagonal and the first \(n1\) of them are greater than zero, it must be the case that the last equals zero.
Since every element beneath the leading diagonal of \(\mathbf{M}^\dagger\) equals zero, this means that its last row doesn't have any nonzero elements and so neither does \(\mathbf{M}^\dagger \times \mathbf{R}\) which, being symmetric, cannot have any nonzero elements in its last column either. Adding \(\lambda \times \mathbf{I}\) to it yields a matrix that has a value of \(\lambda\) for the last element of its leading diagonal and zeros for every other element in its last row and column so that the last column of \(\mathbf{R}\) must be the eigenvector \(\mathbf{v}\).
Now we can find the eigenvector associated with another eigenvalue of \(\mathbf{M}\) by applying this process to all but the last row and column of \(\mathbf{M}^\dagger\) and consequently, by repetition, discover them all.
Whilst this is a tremendously efficient scheme, it does have a minor drawback which I'm sure hasn't escaped your notice; we don't know what the eigenvalues are.
We can, however, make an educated guess at a number that might be close to one of them. For example, the eigenvalue of the one by one submatrix that is the bottom right corner of \(\mathbf{M}\) is trivially the value of that element and so we might try iteratively using that for \(s\), as demonstrated by program 1.
Program 1: Using A Shift



By setting
shift
to false
you can compare the number of iterations that this scheme takes to that of our original approach to diagonalisation, with the halt button allowing you to stop it if it ends up taking too much time.We can generally do even better by choosing to shift by the eigenvalue of the bottom right two by two submatrix of \(\mathbf{M}\) that is closest to its bottom right element, known as a Wilkinson shift^{[2]} and given by
\[
\begin{align*}
d &= \frac{m_{n2,m2}m_{n1,n1}}{2}\\
s &= m_{n1,n1} + d  \mathrm{sign}(d) \times \sqrt{d^2 + m_{n1,n2}^2}
\end{align*}
\]
as proven in derivation 3
Derivation 3: The Wilkinson Shift
The eigensystem of a two by two symmetric matrix satisfies
\[
\begin{pmatrix}
a & b\\
b & c
\end{pmatrix}
\times \mathbf{v} = \lambda \times \mathbf{v}
\]
which we can rewrite as
\[
\left(
\begin{pmatrix}
a & b\\
b & c
\end{pmatrix}
 \lambda \times \mathbf{I}
\right)
\times \mathbf{v}
=
\begin{pmatrix}
a\lambda & b\\
b & c\lambda
\end{pmatrix}
\times \mathbf{v}
= \mathbf{0}
\]
and so
\[
\begin{vmatrix}
a\lambda & b\\
b & c\lambda
\end{vmatrix}
= (a\lambda) \times (c\lambda)  b^2
= \lambda^2  (a+c) \times \lambda + a \times c  b^2 = 0
\]
Solving the quadratic equation for \(\lambda\) yields
\[
\begin{align*}
\lambda &= \frac{a+c \pm \sqrt{(a+c)^2  4 \times \left(a \times c  b^2\right)}}{2}\\
&= \frac{a+c}{2} \pm \frac{\sqrt{a^2 + 2 \times a \times c + c^2  4 \times a \times c  4 \times b^2}}{2}\\
&= \frac{a+c}{2} \pm \frac{\sqrt{a^2  2 \times a \times c + c^2  4 \times b^2}}{2}\\
&= \frac{a+c}{2} \pm \frac{\sqrt{(ac)^2  4 \times b^2}}{2}
= c + \frac{ac}{2} \pm \sqrt{\left(\frac{ac}{2}\right)^2  b^2}
\end{align*}
\]
which is closest to \(c\) when the square root takes the opposite sign to \(\frac{ac}{2}\).
and used in program 2.
Program 2: Using A Wilkinson Shift



This, together with the prior Householder transformations, is a special case of Francis's QR transformation^{[3][4]} which, with one last modification, is the preferred method for finding eigensystems of symmetric matrices. Specifically, it turns out that we don't actually need to explicitly perform the shift but can instead start with a Givens rotation of the first two rows and columns using
\[
\begin{align*}
\sin \theta &= \frac{m_{1,0}}{\sqrt{\left(m_{0,0}s\right)^2 + m_{1,0}^2}}\\
\cos \theta &= \phantom{}\frac{m_{0,0}s}{\sqrt{\left(m_{0,0}s\right)^2 + m_{1,0}^2}}
\end{align*}
\]
where
\[
\begin{align*}
d &= \frac{m_{n2,n2}m_{n1,n1}}{2}\\
s &= m_{n1,n1}  \frac{m_{n1,n2}^2}{d + \mathrm{sign}(d) \times \sqrt{d^2 + m_{n1,n2}^2}}
\end{align*}
\]
and then iteratively apply further rotations to the \(k^\mathrm{th}\) and \((k+1)^\mathrm{th}\) rows and columns using
\[
\begin{align*}
\sin \theta &= \frac{m_{k+1,k1}}{\sqrt{m_{k,k1}^2+m_{k+1,k1}^2}}\\
\cos \theta &= \phantom{}\frac{m_{k,k1}}{\sqrt{m_{k,k1}^2+m_{k+1,k1}^2}}
\end{align*}
\]
for \(k\) from one to \(n2\) to transform the elements beneath the lower and above the upper diagonals to zero. This is known as an implicit shift and is demonstrated by program 3.
Program 3: Using An Implicit Shift



Now I must confess that I don't know exactly why this is equivalent to the original transformation and have simply accepted the word of my linear algebra textbooks. Nevertheless, I shall be using it to implement the spectral decomposition.
References
[1] FAO The Householder, www.thusspakeak.com, 2019.[2] Golub, G. and Van Loan, C., Matrix Computations (3rd ed.), John Hopkins University Press, 1996.
[3] Francis, J., The QR Transformation  Part 1, The Computer Journal, Vol. 4, No. 3, Oxford University Press, 1961.
[4] Francis, J., The QR Transformation  Part 2, The Computer Journal, Vol. 4, No. 4, Oxford University Press, 1961.
Leave a comment