FAO The Householder

| 0 Comments

Some years ago[1][2][3] we saw how we could use the Jacobi algorithm to find the eigensystem of a real valued symmetric matrix \(\mathbf{M}\), which is defined as the set of pairs of non-zero vectors \(\mathbf{v}_i\) and scalars \(\lambda_i\) that satisfy
\[ \mathbf{M} \times \mathbf{v}_i = \lambda_i \times \mathbf{v}_i \]
known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product
\[ \mathbf{M} = \mathbf{V} \times \boldsymbol{\Lambda} \times \mathbf{V}^\mathrm{T} \]
where the columns of \(\mathbf{V}\) are the unit eigenvectors, \(\boldsymbol{\Lambda}\) is a diagonal matrix whose \(i^\mathrm{th}\) diagonal element is the eigenvalue associated with the \(i^\mathrm{th}\) column of \(\mathbf{V}\) and the T superscript denotes the transpose, in which the rows and columns of the matrix are swapped.
You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with
\[ f\left(\mathbf{M}\right) = \mathbf{V} \times f\left(\boldsymbol{\Lambda}\right) \times \mathbf{V}^\mathrm{T} \]
where \(f\left(\boldsymbol{\Lambda}\right)\) is the diagonal matrix whose \(i^\mathrm{th}\) diagonal element is the result of applying \(f\) to the \(i^\mathrm{th}\) diagonal element of \(\boldsymbol{\Lambda}\).
You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it.

The Householder Transformation

The key operation in this scheme is the Householder transformation[4] which reflects an \(n\) dimensional vector through an \(n-1\) dimensional hyperplane that passes through the origin, being the point that lies at zero on every axis and is the implied starting point from which that vector is drawn.

The simplest way to define hyperplanes is as the set of vectors \(\mathbf{x}\) whose elements \(x_i\) are solutions to a linear equation of the form
\[ \sum_{i=0}^{n-1} a_i \times x_i = b \]
where \(\sum\) is the summation sign and at least one of the \(a_i\) is not equal to zero. For example, a two dimensional plane in three dimensions is the set of points \(\left(x_0,x_1,x_2\right)\) that satisfy
\[ a_0 \times x_0 + a_1 \times x_1 + a_2 \times x_2 = b \]
which, provided that \(a_2\) is non-zero, we can arrange into the familiar form of a dependent variable on the left and independent variables on the right with
\[ \begin{align*} x_2 &= -\frac{a_0}{a_2} \times x_0 - \frac{a_1}{a_2} \times x_1 + \frac{b}{a_2}\\ &= a_0^\prime \times x_0 + a_1^\prime \times x_1 + b^\prime \end{align*} \]
For a hyperplane to pass through the origin we require every \(x_i\) equalling zero to be a solution and so \(b\) must also equal zero and, if we create a vector \(\mathbf{a}\) whose elements are the coefficients \(a_i\), we have
\[ \mathbf{a} \times \mathbf{x} = 0 \]
for every vector \(\mathbf{x}\) in the hyperplane. Derivation 1 shows that
\[ \mathbf{a} \times \mathbf{x} = |\mathbf{a}| \times |\mathbf{x}| \times \cos \theta \]
where the vertical bars stand for the length of the vector between them and \(\theta\) is the angle between \(\mathbf{a}\) and each \(\mathbf{x}\) in the two dimensional planes that they share, and so for any of them that isn't at the origin we have
\[ \begin{align*} \cos \theta &= 0\\ \theta &= \pm \frac{\pi}2 \end{align*} \]
Derivation 1: The Relationship Between The Vector Product And The Cosine
Firstly, we trivially have
\[ \frac{\mathbf{a}}{|\mathbf{a}|} \times \frac{\mathbf{x}}{|\mathbf{x}|} = \cos\theta \]
and so we only need to show that
\[ \mathbf{u} \times \mathbf{v} = \cos\theta \]
for vectors \(\mathbf{u}\) and \(\mathbf{v}\) having a length of one.
Next, if we rotate them both through an angle \(\alpha_{ij}\) about the origin in the two dimensional plane defined by the \(i^\mathrm{th}\) and \(j^\mathrm{th}\) axes then their \(i^\mathrm{th}\) and \(j^\mathrm{th}\) elements are transformed into
\[ \begin{align*} u^\prime_i &= \cos \alpha_{ij} \times u_i - \sin \alpha_{ij} \times u_j & u^\prime_j &= \sin \alpha_{ij} \times u_i + \cos \alpha_{ij} \times u_j\\ v^\prime_i &= \cos \alpha_{ij} \times v_i - \sin \alpha_{ij} \times v_j & v^\prime_j &= \sin \alpha_{ij} \times v_i + \cos \alpha_{ij} \times v_j\\ \end{align*} \]
whilst their remaining elements are unchanged.
Now we have
\[ \begin{align*} u^\prime_i \times v^\prime_i &= \left(\cos \alpha_{ij} \times u_i - \sin \alpha_{ij} \times u_j\right) \times \left(\cos \alpha_{ij} \times v_i - \sin \alpha_{ij} \times v_j\right)\\ &= \small{\cos^2 \alpha_{ij} \times u_i \times v_i - \cos \alpha_{ij} \times \sin \alpha_{ij} \times u_i \times v_j - \sin \alpha_{ij} \times \cos \alpha_{ij} \times u_j \times v_i + \sin^2 \alpha_{ij} \times u_j \times v_j}\\ u^\prime_j \times v^\prime_j &= \left(\sin \alpha_{ij} \times u_i + \cos \alpha_{ij} \times u_j\right) \times \left(\sin \alpha_{ij} \times v_i + \cos \alpha_{ij} \times v_j\right)\\ &= \small{\sin^2 \alpha_{ij} \times u_i \times v_i + \sin \alpha_{ij} \times \cos \alpha_{ij} \times u_i \times v_j + \cos \alpha_{ij} \times \sin \alpha_{ij} \times u_j \times v_i + \cos^2 \alpha_{ij} \times u_j \times v_j} \end{align*} \]
and so
\[ \begin{align*} u^\prime_i \times v^\prime_i + u^\prime_j \times v^\prime_j &= \left(\cos^2 \alpha_{ij} + \sin^2 \alpha_{ij}\right) \times u_i \times v_i + \left(\sin^2 \alpha_{ij} + \cos^2 \alpha_{ij}\right) \times u_j \times v_j\\ &= u_i \times v_i + u_j \times v_j \end{align*} \]
The product is consequently unaffected by any rotation about the origin and we are free to rotate \(\mathbf{u}\) and \(\mathbf{v}\) so that the former lies along the positive half of the first axis and the latter lies in the plane formed by the first two axes, so that
\[ \begin{align*} u_0 &= 1 & u_1 &= 0\\ v_0 &= \cos \theta & v_1 &= \sin \theta \end{align*} \]
and, since all of the remaining elements equal zero, we have
\[ \mathbf{u} \times \mathbf{v} = 1 \times \cos \theta + 0 \times \sin \theta = \cos \theta \]
as required.

Since every vector can be considered as lying at any angle to the origin, this means that \(\mathbf{a}\) is at right angles, or orthogonal, to all of the vectors in the hyperplane and consequently to the hyperplane itself.

If we define the vector
\[ \hat{\mathbf{a}} = \frac{\mathbf{a}}{|\mathbf{a}|} \]
then, for any vector \(\mathbf{y}\), we have
\[ \hat{\mathbf{a}} \times \mathbf{y} = \frac{|\mathbf{a}|}{|\mathbf{a}|} \times |\mathbf{y}| \times \cos \theta = |\mathbf{y}| \times \cos \theta \]
which is the distance that \(\mathbf{y}\) lies above or below the hyperplane and so we can project it onto it by subtracting a vector of that length in the direction of \(\hat{\mathbf{a}}\)
\[ \mathbf{y}^\prime = \mathbf{y} - \left(\hat{\mathbf{a}} \times \mathbf{y}\right) \times \hat{\mathbf{a}} \]
To reflect it through it we simply subtract that vector again to yield
\[ \mathbf{y}^{\prime\prime} = \mathbf{y}^\prime - \left(\hat{\mathbf{a}} \times \mathbf{y}\right) \times \hat{\mathbf{a}} = \mathbf{y} - 2 \times \left(\hat{\mathbf{a}} \times \mathbf{y}\right) \times \hat{\mathbf{a}} \]
Noting that the outer product of a pair of vectors \(\mathbf{x}\) and \(\mathbf{y}\) is defined as the matrix
\[ \mathbf{M} = \mathbf{x} \otimes \mathbf{y} \]
Derivation 2: In The Form Of A Matrix
Firstly, we have
\[ \left(\hat{\mathbf{a}} \times \mathbf{y}\right) \times \hat{a}_i = \left(\sum_{j=0}^{n-1} \hat{a}_j \times y_j\right) \times \hat{a}_i = \sum_{j=0}^{n-1} \hat{a}_i \times \hat{a}_j \times y_j \]
Next, given
\[ \begin{align*} \hat{\mathbf{A}} &= \hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\\ \mathbf{y}^\ast &= \hat{\mathbf{A}} \times \mathbf{y} = \left(\hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\right) \times \mathbf{y} \end{align*} \]
we have
\[ y^\ast_i = \sum_{j=0}^{n-1} \hat{A}_{ij} \times y_j = \sum_{j=0}^{n-1} \hat{a}_i \times \hat{a}_j \times y_j \]
whose elements are given by
\[ M_{ij} = x_i \times y_j \]
we can express this reflection with
\[ \mathbf{y}^{\prime\prime} = \mathbf{H} \times \mathbf{y} \]
where
\[ \mathbf{H} = \mathbf{I} - 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} \]
since \(\mathbf{I}\) is the identity matrix, satisfying
\[ \mathbf{I} \times \mathbf{y} = \mathbf{y} \]
and
\[ \left(\hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\right) \times \mathbf{y} = \left(\hat{\mathbf{a}} \times \mathbf{y}\right) \times \hat{\mathbf{a}} \]
as proven in derivation 2.

Program 1 demonstrates this matrix representation of the Householder transformation in two dimensions, where hyperplanes are simply lines, for vectors that lie at various angles to the positive \(x\)-axis, by drawing the hyperplane in silver, the vectors in green and their reflections, as calculated by their multiplication by it, in red.

Program 1: Householder In Two Dimensions

Derivation 3: Basic Properties Of â ⊗ â
Given \(\hat{\mathbf{A}} = \hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\) we have
\[ \hat{A}_{ij} = \hat{a}_i \times \hat{a}_j = \hat{a}_j \times \hat{a}_i = \hat{A}_{ji} \]
Furthermore, given \(\hat{\mathbf{A}}^2 = \hat{\mathbf{A}} \times \hat{\mathbf{A}}\) we have
\[ \begin{align*} \hat{A}^2_{ij} &= \sum_{k=0}^{n-1} \hat{A}_{ik} \times \hat{A}_{kj} = \sum_{k=0}^{n-1} \hat{a}_i \times \hat{a}_k \times \hat{a}_k \times \hat{a}_j\\ &= \hat{a}_i \times \hat{a}_j \times \sum_{k=0}^{n-1} \hat{a}_k \times \hat{a}_k = \hat{a}_i \times \hat{a}_j = \hat{A}_{ij} \end{align*} \]
From the properties of \(\hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\) given in derivation 3 we find that it, like the identity matrix, is symmetric and so \(\mathbf{H}\) must be too. Furthermore, the square of \(\mathbf{H}\) is given by
\[ \begin{align*} \mathbf{H}^2 &= \mathbf{H} \times \mathbf{H}\\ &= \left(\mathbf{I} - 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\right) \times \left(\mathbf{I} - 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\right)\\ &= \mathbf{I}^2 - 2 \times \mathbf{I} \times 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} + 4 \times \left(\hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\right)^2\\ &= \mathbf{I} - 4 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} + 4 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}}\\ &= \mathbf{I} \end{align*} \]
and so
\[ \mathbf{H} = \mathbf{H}^\mathrm{T} = \mathbf{H}^{-1} \]
Since \(\mathbf{H}\) defines a reflection through the hyperplane that is orthogonal to \(\hat{\mathbf{a}}\) it must be the case that
\[ \mathbf{H} \times \hat{\mathbf{a}} = -\hat{\mathbf{a}} \]
and consequently \(\hat{\mathbf{a}}\) is a unit eigenvector of it with an eigenvalue of minus one. For any vector \(\mathbf{x}\) that lies upon that hyperplane we trivially have
\[ \mathbf{H} \times \mathbf{x} = \mathbf{x} \]
since there is no part of them to reflect, and so we can choose a set of \(n-1\) orthogonal unit vectors that lie upon it to yield eigenvectors that all have an eigenvalue of plus one.
Since the determinant of a real symmetric matrix is the product of its eigenvalues this means that it equals
\[ |\mathbf{H}| = -1 \]

Tridiagonalisation

The advantage of reflections over the rotations that were used in the Jacobi algorithm is that we can use them to transform an entire row or column of a matrix in a single step. In particular, if we have a reflection that leaves the first element of every vector as it is and transforms a specific vector \(\mathbf{x}\) so that its second element equals \(s_0\) and all of the remaining elements equal zero
\[ \mathbf{H}_0 \times \begin{pmatrix}x_0\\x_1\\x_2\\x_3\\\vdots\\x_{n-1}\end{pmatrix} = \begin{pmatrix}x_0\\s_0\\0\\0\\\vdots\\0\end{pmatrix} \]
then, if \(\mathbf{x}\) equals the first row and column of a symmetric matrix \(\mathbf{M}\), we have
\[ \mathbf{H}_0 \times \small \begin{pmatrix} m_{0,0} & m_{0,1} & m_{0,2} & m_{0,3} & \dots & m_{0,n-1}\\ m_{0,1} & m_{1,1} & m_{1,2} & m_{1,3} & \dots & m_{1,n-1}\\ m_{0,2} & m_{1,2} & m_{2,2} & m_{2,3} & \dots & m_{2,n-1}\\ m_{0,3} & m_{1,3} & m_{2,3} & m_{3,3} & \dots & m_{3,n-1}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ m_{0,n-1} & m_{1,n-1} & m_{2,n-1} & m_{3,n-1} & \dots & m_{n-1,n-1} \end{pmatrix} = \small \begin{pmatrix} m_{0,0} & m_{0,1} & m_{0,2} & m_{0,3} & \dots & m_{0,n-1}\\ s_0 & m^\prime_{1,1} & m^\prime_{1,2} & m^\prime_{1,3} & \dots & m^\prime_{1,n-1}\\ 0 & m^\prime_{1,2} & m^\prime_{2,2} & m^\prime_{2,3} & \dots & m^\prime_{2,n-1}\\ 0 & m^\prime_{1,3} & m^\prime_{2,3} & m^\prime_{3,3} & \dots & m^\prime_{3,n-1}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & m^\prime_{1,n-1} & m^\prime_{2,n-1} & m^\prime_{3,n-1} & \dots & m^\prime_{n-1,n-1} \end{pmatrix} \]
and
\[ \small \begin{pmatrix} m_{0,0} & m_{0,1} & m_{0,2} & m_{0,3} & \dots & m_{0,n-1}\\ s_0 & m^\prime_{1,1} & m^\prime_{1,2} & m^\prime_{1,3} & \dots & m^\prime_{1,n-1}\\ 0 & m^\prime_{1,2} & m^\prime_{2,2} & m^\prime_{2,3} & \dots & m^\prime_{2,n-1}\\ 0 & m^\prime_{1,3} & m^\prime_{2,3} & m^\prime_{3,3} & \dots & m^\prime_{3,n-1}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & m^\prime_{1,n-1} & m^\prime_{2,n-1} & m^\prime_{3,n-1} & \dots & m^\prime_{n-1,n-1} \end{pmatrix} \times \mathbf{H}_0^\mathrm{T} = \small \begin{pmatrix} m_{0,0} & s_0 & 0 & 0 & \dots & 0\\ s_0 & m^{\prime\prime}_{1,1} & m^{\prime\prime}_{1,2} & m^{\prime\prime}_{1,3} & \dots & m^{\prime\prime}_{1,n-1}\\ 0 & m^{\prime\prime}_{1,2} & m^{\prime\prime}_{2,2} & m^{\prime\prime}_{2,3} & \dots & m^{\prime\prime}_{2,n-1}\\ 0 & m^{\prime\prime}_{1,3} & m^{\prime\prime}_{2,3} & m^{\prime\prime}_{3,3} & \dots & m^{\prime\prime}_{3,n-1}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & m^{\prime\prime}_{1,n-1} & m^{\prime\prime}_{2,n-1} & m^{\prime\prime}_{3,n-1} & \dots & m^{\prime\prime}_{n-1,n-1} \end{pmatrix} \]
Now let us suppose that we have another reflection that leaves the first two elements of every vector as they are and transforms another vector \(\mathbf{x}\) to
\[ \mathbf{H}_1 \times \begin{pmatrix}x_0\\x_1\\x_2\\x_3\\\vdots\\x_{n-1}\end{pmatrix} = \begin{pmatrix}x_0\\x_1\\s_1\\0\\\vdots\\0\end{pmatrix} \]
If that \(\mathbf{x}\) equals the second row and column of the matrix
\[ \mathbf{H}_0 \times \mathbf{M} \times \mathbf{H}_0^\mathrm{T} = \mathbf{H}_0 \times \mathbf{M} \times \mathbf{H}_0 \]
that we calculated above, then
\[ \mathbf{H}_1 \times \mathbf{H}_0 \times \mathbf{M} \times \mathbf{H}_0 \times \mathbf{H}_1 = \begin{pmatrix} m_{0,0} & s_0 & 0 & 0 & \dots & 0\\ s_0 & m^{\prime\prime}_{1,1} & s_1 & 0 & \dots & 0\\ 0 & s_1 & m^{\prime\prime\prime}_{2,2} & m^{\prime\prime\prime}_{2,3} & \dots & m^{\prime\prime\prime}_{2,n-1}\\ 0 & 0 & m^{\prime\prime\prime}_{2,3} & m^{\prime\prime\prime}_{3,3} & \dots & m^{\prime\prime\prime}_{3,n-1}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & m^{\prime\prime\prime}_{2,n-1} & m^{\prime\prime\prime}_{3,n-1} & \dots & m^{\prime\prime\prime}_{n-1,n-1} \end{pmatrix} \]
If we keep going in the same fashion and define the product of the \(\mathbf{H}_i\) as
\[ \mathbf{Q} = \mathbf{H}_0 \times \mathbf{H}_1 \times \dots \times \mathbf{H}_{n-3} \]
then its transpose is given by
\[ \mathbf{Q}^\mathrm{T} = \left(\mathbf{H}_0 \times \mathbf{H}_1 \times \dots \times \mathbf{H}_{n-3}\right)^\mathrm{T} = \mathbf{H}_{n-3}^\mathrm{T} \times \dots \times \mathbf{H}_1^\mathrm{T} \times \mathbf{H}_0^\mathrm{T} = \mathbf{H}_{n-3} \times \dots \times \mathbf{H}_1 \times \mathbf{H}_0 \]
and we end up with a symmetric tridiagonal matrix with a leading diagonal of \(\mathbf{d}\) and upper and lower diagonals of \(\mathbf{s}\)
\[ \mathbf{Q}^\mathrm{T} \times \mathbf{M} \times \mathbf{Q} = \begin{pmatrix} d_0 & s_0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & 0 & 0\\ s_0 & d_1 & s_1 & 0 & 0 & \dots & 0 & 0 & 0 & 0 & 0\\ 0 & s_1 & d_2 & s_2 & 0 & \dots & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & s_2 & d_3 & s_3 & \dots & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & s_3 & d_4 & \dots & 0 & 0 & 0 & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & 0 & 0 & 0 & \dots & d_{n-5} & s_{n-5} & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \dots & s_{n-5} & d_{n-4} & s_{n-4} & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & \dots & 0 & s_{n-4} & d_{n-3} & s_{n-3} & 0\\ 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & s_{n-3} & d_{n-2} & s_{n-2}\\ 0 & 0 & 0 & 0 & 0 & \dots & 0 & 0 & 0 & s_{n-2} & d_{n-1} \end{pmatrix} \]
Note that we only need \(n-2\) steps since they will leave the last two rows and columns with zeros for all but the last two elements which is already tridiagonal, much like any two by two matrix.

The easiest way to define an \(\mathbf{H}_i\) matrix is with the unit vector \(\hat{\mathbf{a}}\) that is orthogonal to the hyperplanes through which it reflects vectors
\[ \mathbf{H}_i = \mathbf{I} - 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} \]
Now we trivially require that
\[ \sum_{j=0}^{n-1} \hat{a}_j^2 = 1 \]
and, if \(\mathbf{x}\) is the \(i^\mathrm{th}\) column of the matrix after the first \(i\) steps, counting the columns from zero, and
\[ \mathbf{x}^\prime = \mathbf{x} - 2 \times \left(\hat{\mathbf{a}} \times \mathbf{x}\right) \times \hat{\mathbf{a}} \]
that
\[ x^\prime_j = \begin{cases} x_j & j < i+1\\ s & j = i+1\\ 0 & j > i+1 \end{cases} \]
Rearranging yields
\[ \hat{\mathbf{a}} = \frac{\mathbf{x} - \mathbf{x}^\prime}{2 \times \hat{\mathbf{a}} \times \mathbf{x}} \]
and so the elements of \(\hat{\mathbf{a}}\) must also satisfy
\[ \hat{a}_j = \frac{x_j - x^\prime_j}{2 \times \hat{\mathbf{a}} \times \mathbf{x}} = \begin{cases} 0 & j < i+1\\ \\ \dfrac{x_j - s}{2 \times \hat{\mathbf{a}} \times \mathbf{x}} & j = i+1\\ \\ \dfrac{x_j}{2 \times \hat{\mathbf{a}} \times \mathbf{x}} & j > i+1 \end{cases} \]
which implies that
\[ s = \pm \left( \sum_{j=i+1}^{n-1} \; x_j^2\right)^\frac12 \]
as proven by derivation 4.

Derivation 4: The Value Of s
Firstly, we have
\[ \begin{align*} \hat{\mathbf{a}} \times \mathbf{x} &= \sum_{j=0}^{n-1} \hat{a}_j \times x_j = \frac{\left(x_{i+1} - s\right) \times x_{i+1} + \sum_{j=i+2}^{n-1} \; x_j^2}{2 \times \hat{\mathbf{a}} \times \mathbf{x}} = \frac{\sum_{j=i+1}^{n-1} \; x_j^2 - s \times x_{i+1}}{2 \times \hat{\mathbf{a}} \times \mathbf{x}}\\ \left(\hat{\mathbf{a}} \times \mathbf{x}\right)^2 &= \tfrac12 \times \sum_{j=i+1}^{n-1} \; x_j^2 - \tfrac12 \times s \times x_{i+1}\\ \hat{\mathbf{a}} \times \mathbf{x} &= \left(\tfrac12 \times \sum_{j=i+1}^{n-1} \; x_j^2 - \tfrac12 \times s \times x_{i+1}\right)^\frac12\\ 2 \times \hat{\mathbf{a}} \times \mathbf{x} &= \left(2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}\right)^\frac12 \end{align*} \]
Next, we have
\[ \begin{align*} \sum_{j=0}^{n-1} \hat{a}_j^2 &= \frac{\left(x_{i+1}-s\right)^2 + \sum_{j=i+2}^{n-1} \; x_j^2}{\left(2 \times \hat{\mathbf{a}} \times \mathbf{x}\right)^2} = \frac{x_{i+1}^2 - 2 \times s \times x_{i+1} + s^2 + \sum_{j=i+2}^{n-1} \; x_j^2}{\left(2 \times \hat{\mathbf{a}} \times \mathbf{x}\right)^2}\\ &= \frac{s^2 + \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}}{2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}} = \frac{s^2 - \sum_{j=i+1}^{n-1} \; x_j^2 + 2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}}{2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}}\\ &= \frac{s^2 - \sum_{j=i+1}^{n-1} \; x_j^2}{2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}} + 1 \end{align*} \]
and to satisfy
\[ \sum_{j=0}^{n-1} \hat{a}_j^2 = 1 \]
require that
\[ \begin{align*} s^2 &- \sum_{j=i+1}^{n-1} \; x_j^2 = 0\\ s &= \pm \left( \sum_{j=i+1}^{n-1} \; x_j^2\right)^\frac12 \end{align*} \]

During the proof we also found that
\[ 2 \times \hat{\mathbf{a}} \times \mathbf{x} = \left(2 \times \sum_{j=i+1}^{n-1} \; x_j^2 - 2 \times s \times x_{i+1}\right)^\frac12 \]
which we can rewrite as
\[ 2 \times \hat{\mathbf{a}} \times \mathbf{x} = \left(2 \times s^2 - 2 \times s \times x_{i+1}\right)^\frac12 = \left(2 \times s \times \left(s - x_{i+1}\right)\right)^\frac12 \]
To avoid cancellation error in this and \(\hat{a}_{i+1}\) it's prudent to give \(s\) the opposite sign to \(x_{i+1}\) so that
\[ s = \begin{cases} +\left( \sum_{j=i+1}^{n-1} \; x_j^2\right)^\frac12 & x_{i+1} \leqslant 0\\ -\left( \sum_{j=i+1}^{n-1} \; x_j^2\right)^\frac12 & x_{i+1} > 0 \end{cases} \]
For example, given the three by three symmetric matrix
\[ \mathbf{M} = \begin{pmatrix} 1 & -4 & 3\\ -4 & 2 & -1\\ 3 & -1 & 2 \end{pmatrix} \]
we have
\[ s = +\left(\left(-4\right)^2 + 3^2\right)^\frac12 = 5 \]
and consequently
\[ \begin{align*} 2 \times \hat{\mathbf{a}} \times \mathbf{x} &= \left(2 \times 5 \times \left(5 + 4\right)\right)^\frac12 = 90^\frac12\\ \hat{\mathbf{a}} &= \begin{pmatrix} 0\\ \frac{-4-5}{90^\frac12}\\ \frac{3}{90^\frac12} \end{pmatrix} = \begin{pmatrix} 0\\ -\frac{9}{90^\frac12}\\ \frac{3}{90^\frac12} \end{pmatrix} = \begin{pmatrix} 0\\ -\frac{3}{10^\frac12}\\ \frac{1}{10^\frac12} \end{pmatrix} \end{align*} \]
and an outer product of
\[ \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} = \begin{pmatrix} 0 & 0 & 0\\ 0 & \frac{9}{10} & -\frac{3}{10}\\ 0 & -\frac{3}{10} & \frac{1}{10} \end{pmatrix} \]
This gives us the Householder transformation matrix
\[ \mathbf{H}_0 = \mathbf{I} - 2 \times \hat{\mathbf{a}} \otimes \hat{\mathbf{a}} = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1-\frac{9}{5} & -\frac{3}{5}\\ 0 & -\frac{3}{5} & 1-\frac{1}{5} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0\\ 0 & -\frac{4}{5} & \frac{3}{5}\\ 0 & \frac{3}{5} & \frac{4}{5} \end{pmatrix} \]
Pre multiplying \(\mathbf{M}\) by it first yields
\[ \mathbf{H}_0 \times \mathbf{M} = \begin{pmatrix} 1 & 0 & 0\\ 0 & -\frac{4}{5} & \frac{3}{5}\\ 0 & \frac{3}{5} & \frac{4}{5} \end{pmatrix} \times \begin{pmatrix} 1 & -4 & 3\\ -4 & 2 & -1\\ 3 & -1 & 2 \end{pmatrix} = \begin{pmatrix} 1 & -4 & 3\\ 5 & -\frac{11}{5} & 2\\ 0 & \frac{2}{5} & 1 \end{pmatrix} \]
and post multiplying the result by it then gives us a tridiagonal matrix
\[ \mathbf{H}_0 \times \mathbf{M} \times \mathbf{H}_0 = \begin{pmatrix} 1 & -4 & 3\\ 5 & -\frac{11}{5} & 2\\ 0 & \frac{2}{5} & 1 \end{pmatrix} \times \begin{pmatrix} 1 & 0 & 0\\ 0 & -\frac{4}{5} & \frac{3}{5}\\ 0 & \frac{3}{5} & \frac{4}{5} \end{pmatrix} = \begin{pmatrix} 1 & 5 & 0\\ 5 & \frac{74}{25} & \frac{7}{25}\\ 0 & \frac{7}{25} & \frac{26}{25} \end{pmatrix} \]
as expected.

For another, program 2 prints the accumulated Householder matrices above the transformed matrix at each step of the tridiagonalisation of a random seven by seven symmetric matrix.

Program 2: Tridiagonalisation Steps

Try Diagonalisation

To transform a tridiagonal matrix into a diagonal matrix we'll need to use the same kinds of rotations that we used in the Jacobi algorithm, although in this and every other context they're known as Givens rotations[5]. Recall that these take the form
\[ \mathbf{G} = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots & & \vdots & & \vdots\\ 0 & \cdots & \cos \theta & \cdots & 0 & \cdots & \sin \theta & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots & & \vdots & & \vdots\\ 0 & \cdots & 0 & \cdots & 1 & \cdots & 0 & \cdots & 0\\ \vdots & & \vdots & & \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & -\sin \theta & \cdots & 0 & \cdots & \cos \theta & \cdots & 0\\ \vdots & & \vdots & & \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\ \end{pmatrix} \]
having the elements
\[ \begin{align*} g_{ii} &= 1 && i \neq k \wedge i \neq l\\ g_{ii} &= \cos \theta && i = k \vee i = l\\ g_{ij} &= \sin \theta && i = k \wedge j = l\\ g_{ij} &= -\sin \theta && i = l \wedge j = k\\ g_{ij} &= 0 && \text{otherwise} \end{align*} \]
for some \(\theta\) and some \(k\) less than \(l\), where \(\wedge\) means and and \(\vee\) means or, and satisfy
\[ \begin{align*} \mathbf{G}^\mathrm{T} \times \mathbf{G} &= \mathbf{G} \times \mathbf{G}^\mathrm{T} = \mathbf{I}\\ |\mathbf{G}| &= 1 \end{align*} \]
Post-multiplying a symmetric matrix \(\mathbf{M}\) by a Givens rotation and pre-multiplying the result with its transpose yields
\[ \mathbf{M}^\prime = \mathbf{G}^\mathrm{T} \times \mathbf{M} \times \mathbf{G} \]
which during our derivation of the Jacobi algorithm we showed has the elements
\[ \begin{align*} m^\prime_{kk} &= m_{kk} \times \cos^2 \theta + m_{ll} \times \sin^2 \theta - 2 \times m_{kl} \times \cos \theta \times \sin \theta\\ m^\prime_{ll} &= m_{kk} \times \sin^2 \theta + m_{ll} \times \cos^2 \theta + 2 \times m_{kl} \times \cos \theta \times \sin \theta\\ m^\prime_{kl} &= m^\prime_{lk} = \left(m_{kk}-m_{ll}\right) \times \cos \theta \times \sin \theta + m_{kl} \times \left(\cos^2 \theta - \sin^2 \theta\right)\\ m^\prime_{ik} &= m^\prime_{ki} = m_{ik} \times \cos \theta - m_{il} \times \sin \theta\\ m^\prime_{il} &= m^\prime_{li} = m_{ik} \times \sin \theta + m_{il} \times \cos \theta\\ m^\prime_{ij} &= m_{ij} \end{align*} \]
where neither \(i\) nor \(j\) are equal to \(k\) or \(l\), and consequently only the \(k^\mathrm{th}\) and \(l^\mathrm{th}\) rows and columns are affected.

If \(\mathbf{M}\) is symmetric and tridiagonal and we apply such Givens rotations to the first two rows and columns then we have
\[ \begin{align*} m^\prime_{00} &= m_{00} \times \cos^2 \theta + m_{11} \times \sin^2 \theta - 2 \times m_{01} \times \cos \theta \times \sin \theta\\ m^\prime_{11} &= m_{00} \times \sin^2 \theta + m_{11} \times \cos^2 \theta + 2 \times m_{01} \times \cos \theta \times \sin \theta\\ m^\prime_{01} &= m^\prime_{10} = \left(m_{00}-m_{11}\right) \times \cos \theta \times \sin \theta + m_{01} \times \left(\cos^2 \theta - \sin^2 \theta\right)\\ m^\prime_{i0} &= m^\prime_{0i} = m_{i0} \times \cos \theta - m_{i1} \times \sin \theta\\ m^\prime_{i1} &= m^\prime_{1i} = m_{i0} \times \sin \theta + m_{i1} \times \cos \theta \end{align*} \]
Noting that both \(m_{i0}\) and \(m_{i1}\) equal zero for \(i\) greater than two, the only elements other than \(m^\prime_{00}\), \(m^\prime_{01}\) \(m^\prime_{10}\) and \(m^\prime_{11}\) that differ from those in \(\mathbf{M}\) are
\[ \begin{align*} m^\prime_{20} &= m^\prime_{02} = m_{20} \times \cos \theta - m_{21} \times \sin \theta\\ m^\prime_{21} &= m^\prime_{12} = m_{20} \times \sin \theta + m_{21} \times \cos \theta \end{align*} \]
so that \(\mathbf{M}^\prime\) takes the form
\[ \begin{pmatrix} m^\prime_{00} & m^\prime_{01} & m^\prime_{02} & 0 & 0 & \dots\\ m^\prime_{01} & m^\prime_{11} & m^\prime_{12} & 0 & 0 & \dots\\ m^\prime_{02} & m^\prime_{12} & m^\prime_{22} & m^\prime_{23} & 0 & \dots\\ 0 & 0 & m^\prime_{23} & m^\prime_{33} & m^\prime_{34} & \dots\\ 0 & 0 & 0 & m^\prime_{34} & m^\prime_{44} & \dots\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix} \]
Now suppose that we apply further Givens rotations to the second and third rows and columns to yield a matrix \(\mathbf{M}^{\prime\prime}\) having the elements
\[ \begin{align*} m^{\prime\prime}_{11} &= m^\prime_{11} \times \cos^2 \theta + m^\prime_{22} \times \sin^2 \theta - 2 \times m^\prime_{12} \times \cos \theta \times \sin \theta\\ m^{\prime\prime}_{22} &= m^\prime_{11} \times \sin^2 \theta + m^\prime_{22} \times \cos^2 \theta + 2 \times m^\prime_{12} \times \cos \theta \times \sin \theta\\ m^{\prime\prime}_{12} &= m^{\prime\prime}_{21} = \left(m^\prime_{11}-m^\prime_{22}\right) \times \cos \theta \times \sin \theta + m^\prime_{12} \times \left(\cos^2 \theta - \sin^2 \theta\right)\\ m^{\prime\prime}_{i1} &= m^{\prime\prime}_{1i} = m^\prime_{i1} \times \cos \theta - m^\prime_{i2} \times \sin \theta\\ m^{\prime\prime}_{i2} &= m^{\prime\prime}_{2i} = m^\prime_{i1} \times \sin \theta + m^\prime_{i2} \times \cos \theta \end{align*} \]
that differ from those of \(\mathbf{M}^\prime\). This time we have \(m^\prime_{i1}\) and \(m^\prime_{i2}\) equal to zero for \(i\) greater than three so that the only elements other than \(m^{\prime\prime}_{11}\), \(m^{\prime\prime}_{12}\), \(m^{\prime\prime}_{21}\) and \(m^{\prime\prime}_{22}\) that are changed are
\[ \begin{align*} m^{\prime\prime}_{01} &= m^{\prime\prime}_{10} = m^\prime_{01} \times \cos \theta - m^\prime_{02} \times \sin \theta\\ m^{\prime\prime}_{02} &= m^{\prime\prime}_{20} = m^\prime_{01} \times \sin \theta + m^\prime_{02} \times \cos \theta\\ m^{\prime\prime}_{13} &= m^{\prime\prime}_{31} = m^\prime_{13} \times \cos \theta - m^\prime_{23} \times \sin \theta\\ m^{\prime\prime}_{23} &= m^{\prime\prime}_{32} = m^\prime_{13} \times \sin \theta + m^\prime_{23} \times \cos \theta \end{align*} \]
We can set \(m^{\prime\prime}_{02}\) and \(m^{\prime\prime}_{20}\) to zero by choosing \(\theta\) so that
\[ \begin{align*} m^\prime_{01} \times \sin \theta &+ m^\prime_{02} \times \cos \theta = 0\\ m^\prime_{01} \times \sin \theta &= -m^\prime_{02} \times \cos \theta\\ \tan \theta = \frac{\sin \theta}{\cos \theta} &= -\frac{m^\prime_{02}}{m^\prime_{01}} \end{align*} \]
Figure 1: The Right-Angled Triangle
Note that we don't have to solve for \(\theta\) but can simply define its sine and cosine as
\[ \begin{align*} \sin \theta &= -\frac{m^\prime_{02}}{\sqrt{m^{\prime 2}_{01} + m^{\prime 2}_{02}}}\\ \cos \theta &= \phantom{-}\frac{m^\prime_{01}}{\sqrt{m^{\prime 2}_{01} + m^{\prime 2}_{02}}} \end{align*} \]
which follows from the rules of trigonometry applied to the right-angled triangle in figure 1. If we do so then \(\mathbf{M}^{\prime\prime}\) will take the form
\[ \begin{pmatrix} m^{\prime\prime}_{00} & m^{\prime\prime}_{01} & 0 & 0 & 0 & \dots\\ m^{\prime\prime}_{01} & m^{\prime\prime}_{11} & m^{\prime\prime}_{12} & m^{\prime\prime}_{13} & 0 & \dots\\ 0 & m^{\prime\prime}_{12} & m^{\prime\prime}_{22} & m^{\prime\prime}_{23} & 0 & \dots\\ 0 & m^{\prime\prime}_{13} & m^{\prime\prime}_{23} & m^{\prime\prime}_{33} & m^{\prime\prime}_{34} & \dots\\ 0 & 0 & 0 & m^{\prime\prime}_{34} & m^{\prime\prime}_{44} & \dots\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{pmatrix} \]
An important point to note is that calculating the hypotenuse of that triangle with
\[ \sqrt{m^{\prime 2}_{01} + m^{\prime 2}_{02}} \]
isn't entirely numerically satisfactory since the term inside the square root might overflow to infinity. We can rewrite it as
\[ \sqrt{m^{\prime 2}_{01} + m^{\prime 2}_{02}} = \sqrt{m^{\prime 2}_{01} \times \left(1+\frac{m^{\prime 2}_{02}}{m^{\prime 2}_{01}}\right)} = \left|m^\prime_{01}\right| \times \sqrt{1+\left(\frac{m^{\prime}_{02}}{m^{\prime}_{01}}\right)^2} \]
where the vertical bars stand for the magnitude of the value between them, so that if \(m^{\prime}_{01}\) is greater than or equal to \(m^{\prime}_{02}\) then the fraction inside the square root must lie between zero and one and consequently the term inside the square root cannot overflow.
Now this will fail if both \(m^{\prime}_{01}\) and \(m^{\prime}_{02}\) equal zero or infinity since the fraction inside the square root will equal NaN and so we must treat those as special cases as well as making sure that we otherwise make the larger of the pair the denominator, as illustrated by listing 1.

Listing 1: ak.hypot
ak.hypot = Math.hypot;
if(ak.nativeType(ak.hypot)===ak.UNDEFINED_T) {
 ak.hypot = function(x, y) {
  x = Math.abs(x);
  y = Math.abs(y);
  if(!isFinite(x) || !isFinite(y)) return x+y;
  if(x>y) {y/=x; return x*Math.sqrt(1+y*y);}
  if(y>0) {x/=y; return y*Math.sqrt(1+x*x);}
  return 0;
 };
}

This function was added to the Math object in JavaScript 6 and so ak.hypot, defined in AK.js, is an alias for that if it is defined and is a function that behaves equivalently otherwise.

With that out of the way we can keep applying Givens rotations to shift the pair of non-zero elements above and below the upper and lower diagonals down towards the bottom right hand corner of the resulting matrices until they finally drop out of the last to yield another tridiagonal matrix, known as chasing the bulge, as demonstrated by program 3 which uses ak.hypot to calculate the length of the triangle's hypotenuse.

Program 3: Chasing The Bulge

We can consequently use Givens rotations in cycles, transforming the first non-zero upper and lower diagonal elements to zero using a step from the Jacobi algorithm and then chasing the bulge, until we converge upon a diagonal matrix \(\mathbf{D}\), as demonstrated by program 4, albeit starting from the first off-diagonal elements with magnitudes greater than or equal to that of e so that its result is only approximately diagonal.

Program 4: Diagonalising A Tridiagonal Matrix

If we accumulate those rotations into a matrix \(\mathbf{R}\) then, for our original symmetric matrix \(\mathbf{M}\) and its accumulated Householder transformations \(\mathbf{Q}\), we have
\[ \begin{align*} \mathbf{D} &= \mathbf{R}^\mathrm{T} \times \left(\mathbf{Q}^\mathrm{T} \times \mathbf{M} \times \mathbf{Q}\right) \times \mathbf{R} =\left(\mathbf{R}^\mathrm{T} \times \mathbf{Q}^\mathrm{T}\right) \times \mathbf{M} \times \left(\mathbf{Q} \times \mathbf{R}\right)\\ \mathbf{M} &= \left(\mathbf{Q} \times \mathbf{R}\right) \times \mathbf{D} \times \left(\mathbf{R}^\mathrm{T} \times \mathbf{Q}^\mathrm{T}\right) \end{align*} \]
since
\[ \begin{align*} \left(\mathbf{Q} \times \mathbf{R}\right) \times \left(\mathbf{R}^\mathrm{T} \times \mathbf{Q}^\mathrm{T}\right) = \mathbf{Q} \times \left(\mathbf{R} \times \mathbf{R}^\mathrm{T}\right) \times \mathbf{Q}^\mathrm{T} = \mathbf{Q} \times \mathbf{I} \times \mathbf{Q}^\mathrm{T} = \mathbf{Q} \times \mathbf{Q}^\mathrm{T} = \mathbf{I} \end{align*} \]
and so \(\mathbf{Q} \times \mathbf{R}\) equals the matrix \(\mathbf{V}\), whose columns are the eigenvectors of \(\mathbf{M}\), and \(\mathbf{D}\) equals \(\boldsymbol{\Lambda}\), whose diagonal elements are their associated eigenvalues.

Unfortunately this isn't a particularly efficient scheme, which you will certainly notice if you reduce the magnitude of e, but it turns out that we can significantly improve its rate of convergence with a relatively trivial modification. Showing why that modification works is a good deal less trivial, however, as we shall see in the next post.

References

[1] At The Foot Of The Eigen, www.thusspakeak.com, 2014.

[2] Ascending The Eigen, www.thusspakeak.com, 2014.

[3] Conquering The Eigen, www.thusspakeak.com, 2014.

[4] Householder, A. S., Unitary Triangularization of a Nonsymmetric Matrix, Journal of the ACM, Vol. 5, Issue 4, 1958.

[5] Golub, G. and Van Loan, C. Matrix Computations (3rd ed.), John Hopkins University Press, 1996.

Leave a comment