Funky Givens

| 0 Comments

We have recently[1][2] been looking at how we can use a special case of Francis's QR transformation[3][4] to reduce a real symmetric matrix \(\mathbf{M}\) to a diagonal matrix \(\boldsymbol{\Lambda}\) by first applying Householder transformations to put it in tridiagonal form and then using shifted Givens rotations to zero out the off diagonal elements.
The columns of the matrix of transformations \(\mathbf{V}\) and the elements on the leading diagonal of \(\boldsymbol{\Lambda}\) are the unit eigenvectors and eigenvalues of \(\mathbf{M}\) respectively and they consequently satisfy
\[ \mathbf{M} \times \mathbf{V} = \mathbf{V} \times \boldsymbol{\Lambda} \]
and, since the product of \(\mathbf{V}\) and its transpose is the identity matrix
\[ \mathbf{M} = \mathbf{V} \times \boldsymbol{\Lambda} \times \mathbf{V}^\mathrm{T} \]
which is known as the spectral decomposition of \(\mathbf{M}\).
Last time[5] we saw how we could efficiently apply the Householder transformations in-place, replacing the elements of \(\mathbf{M}\) with those of the matrix of accumulated transformations \(\mathbf{Q}\) and creating a pair of arrays to represent the leading and off diagonal elements of the tridiagonal matrix. This time we shall see how we can similarly improve the implementation of the Givens rotations.

Givens Diagonalisation

Recall that Givens rotations are matrices of the form
\[ \mathbf{G} = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 0\\ \vdots & \ddots & \vdots & & \vdots & & \vdots & & \vdots\\ 0 & \cdots & \cos \theta & \cdots & 0 & \cdots & \sin \theta & \cdots & 0\\ \vdots & & \vdots & \ddots & \vdots & & \vdots & & \vdots\\ 0 & \cdots & 0 & \cdots & 1 & \cdots & 0 & \cdots & 0\\ \vdots & & \vdots & & \vdots & \ddots & \vdots & & \vdots\\ 0 & \cdots & -\sin \theta & \cdots & 0 & \cdots & \cos \theta & \cdots & 0\\ \vdots & & \vdots & & \vdots & & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 & \cdots & 0 & \cdots & 0 & \cdots & 1\\ \end{pmatrix} \]
which have the elements
\[ \begin{align*} g_{ii} &= 1 && i \neq k \wedge i \neq l\\ g_{ii} &= \cos \theta && i = k \vee i = l\\ g_{ij} &= \sin \theta && i = k \wedge j = l\\ g_{ij} &= -\sin \theta && i = l \wedge j = k\\ g_{ij} &= 0 && \text{otherwise} \end{align*} \]
for an angle \(\theta\) and indices \(k\) less than \(l\), where \(\wedge\) means and and \(\vee\) means or. To efficiently transform the off diagonal elements to zero we used an implicit Wilkinson shift[6], which for the first and second row and column used an angle that satisfied
\[ \begin{align*} \sin \theta &= -\frac{m_{1,0}}{\sqrt{\left(m_{0,0}-s\right)^2 + m_{1,0}^2}}\\ \cos \theta &= \phantom{-}\frac{m_{0,0}-s}{\sqrt{\left(m_{0,0}-s\right)^2 + m_{1,0}^2}} \end{align*} \]
where
\[ \begin{align*} d &= \frac{m_{n-2,n-2}-m_{n-1,n-1}}{2}\\ s &= m_{n-1,n-1} - \frac{m_{n-1,n-2}^2}{d + \mathrm{sign}(d) \times \sqrt{d^2 + m_{n-1,n-2}^2}} \end{align*} \]
and, for subsequent rows and columns, an angle defined by
\[ \begin{align*} \sin \theta &= -\frac{m_{k+1,k-1}}{\sqrt{m_{k,k-1}^2+m_{k+1,k-1}^2}}\\ \cos \theta &= \phantom{-}\frac{m_{k,k-1}}{\sqrt{m_{k,k-1}^2+m_{k+1,k-1}^2}} \end{align*} \]
for \(k\) from one to \(n-2\).

To set the last off diagonal elements of a tridiagonal matrix \(\mathbf{M}\) to zero we iteratively applied such sequences of rotations with
\[ \mathbf{M} \leftarrow \mathbf{G}^\mathrm{T} \times \mathbf{M} \times \mathbf{G} \]
where the left-arrow means replaced with, as demonstrated by program 1 which prints the accumulated rotations \(\mathbf{Q}\) above the transformed matrix \(\mathbf{M}\), tracking the number of iterations and the last off diagonal element, finally printing the distance between the original and transformed matrices.

Program 1: Givens Rotations

The iterative shifting of the pair of non-zero elements above and below the upper and lower diagonals down toward and off of the bottom right corner of \(\mathbf{M}\) is known as chasing the bulge.

Rotating In-Place

To directly operate upon the elements of the matrices rather than use our ak.matrix operators we should first note that, for a Givens rotation \(\mathbf{G}\) of rows and columns \(k\) and \(l\), the matrix
\[ \mathbf{M}^\prime = \mathbf{G}^\mathrm{T} \times \mathbf{M} \]
has the elements
\[ m^\prime_{ij} = \sum_{h=0}^{n-1} g^\mathrm{T}_{ih} \times m_{hj} = \sum_{h=0}^{n-1} g_{hi} \times m_{hj} = \begin{cases} m_{kj} \times \cos\theta - m_{lj} \times \sin\theta & i = k\\ m_{kj} \times \sin\theta + m_{lj} \times \cos\theta & i = l\\ m_{ij} & \text{otherwise} \end{cases} \]
and the matrix
\[ \mathbf{M}^{\prime\prime} = \mathbf{M} \times \mathbf{G} \]
has
\[ m^{\prime\prime}_{ij} = \sum_{h=0}^{n-1} m_{ih} \times g_{hj} = \begin{cases} m_{ik} \times \cos\theta - m_{il} \times \sin\theta & j = k\\ m_{ik} \times \sin\theta + m_{il} \times \cos\theta & j = l\\ m_{ij} & \text{otherwise} \end{cases} \]
Now, if we replace the the matrices \(\mathbf{M}\) and \(\mathbf{Q}\) with two-dimensional arrays we can update them in-place using the sequence of transformations
\[ \begin{align*} \mathbf{M} &\leftarrow \mathbf{G}^\mathrm{T} \times \mathbf{M}\\ \mathbf{M} &\leftarrow \mathbf{M} \times \mathbf{G}\\ \mathbf{Q} &\leftarrow \mathbf{Q} \times \mathbf{G} \end{align*} \]
instead of the matrix multiplications at each step, which we can implement with
for(cc=0;cc<n;++cc) { r0 = m[c][cc]*c0 - m[c+1][cc]*s0; r1 = m[c][cc]*s0 + m[c+1][cc]*c0; m[c][cc] = r0; m[c+1][cc] = r1; } for(rr=0;rr<n;++rr) { r0 = m[rr][c]*c0 - m[rr][c+1]*s0; r1 = m[rr][c]*s0 + m[rr][c+1]*c0; m[rr][c] = r0; m[rr][c+1] = r1; } for(rr=0;rr<n;++rr) { r0 = q[rr][c]*c0 - q[rr][c+1]*s0; r1 = q[rr][c]*s0 + q[rr][c+1]*c0; q[rr][c] = r0; q[rr][c+1] = r1; }

as demonstrated by program 2.

Program 2: In-Place Givens Rotations

Refactoring Away

Now that we're working with arrays rather than matrices we have the freedom to start refactoring the code, removing unnecessary operations. For starters, we know that the matrix \(\mathbf{M}\) is tridiagonal at each step, except for the bulge, and so we're needlessly calculating terms that must equal zero when we multiply it by the Givens rotations. To remove those terms we can simply replace the loops
for(rr=0;rr<n;++rr) { r0 = m[rr][c]*c0 - m[rr][c+1]*s0; r1 = m[rr][c]*s0 + m[rr][c+1]*c0; m[rr][c] = r0; m[rr][c+1] = r1; } for(rr=0;rr<n;++rr) { r0 = q[rr][c]*c0 - q[rr][c+1]*s0; r1 = q[rr][c]*s0 + q[rr][c+1]*c0; q[rr][c] = r0; q[rr][c+1] = r1; }

with a pair that set bounds on the iteration that only go out as far as the bulge
lb = Math.max(c-2, 0); ub = Math.min(c+3, n); for(cc=lb;cc<ub;++cc) { r0 = m[c][cc]*c0 - m[c+1][cc]*s0; r1 = m[c][cc]*s0 + m[c+1][cc]*c0; m[c][cc] = r0; m[c+1][cc] = r1; } lb = Math.max(c-2, 0); ub = Math.min(c+3, n); for(rr=lb;rr<ub;++rr) { r0 = m[rr][c]*c0 - m[rr][c+1]*s0; r1 = m[rr][c]*s0 + m[rr][c+1]*c0; m[rr][c] = r0; m[rr][c+1] = r1; }

as show by program 3.

Program 3: No Further Than The Bulge

By the rules of matrix multiplication, the elements of
\[ \mathbf{M}^{\prime} = \mathbf{G}^\mathrm{T} \times \mathbf{M} \times \mathbf{G} \]
are given by
\[ m^{\prime}_{rc} = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} g^\mathrm{T}_{ri} \times m_{ij} \times g_{jc} = \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{ir} \times g_{jc} \]
We can consequently calculate the result of a Givens rotation by expanding these sums, ignoring any terms that we know must equal zero as we do so.

During the initial rotation the elements \(g_{i0}\) equal zero for \(i\) greater than one and so the top left element of \(\mathbf{M}^\prime\) is given by
\[ \begin{align*} m^{\prime}_{00} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i0} \times g_{j0} = \sum_{i=0}^1 \sum_{j=0}^1 m_{ij} \times g_{i0} \times g_{j0}\\ &= m_{00} \times g_{00} \times g_{00} + m_{01} \times g_{00} \times g_{10} + m_{10} \times g_{10} \times g_{00} + m_{11} \times g_{10} \times g_{10}\\ &= m_{00} \times \cos^2\theta - m_{01} \times \cos\theta \times \sin\theta - m_{10} \times \sin\theta \times \cos\theta + m_{11} \times \sin^2\theta\\ &= m_{00} \times \cos^2\theta + m_{11} \times \sin^2\theta - 2 \times m_{01} \times \cos\theta \times \sin\theta \end{align*} \]
Similarly, its first upper diagonal element is
\[ \begin{align*} m^{\prime}_{01} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i0} \times g_{j1} = \sum_{i=0}^1 \sum_{j=0}^1 m_{ij} \times g_{i0} \times g_{j1}\\ &= m_{00} \times g_{00} \times g_{01} + m_{01} \times g_{00} \times g_{11} + m_{10} \times g_{10} \times g_{01} + m_{11} \times g_{10} \times g_{11}\\ &= m_{00} \times \cos\theta \times \sin\theta + m_{01} \times \cos^2\theta - m_{01} \times \sin^2\theta - m_{11} \times \sin\theta \times \cos\theta\\ &= \left(m_{00}-m_{11}\right) \times \cos\theta \times \sin\theta + m_{01} \times \left(\cos^2\theta - \sin^2\theta\right) \end{align*} \]
and, by symmetry, so is its first lower diagonal element.
The only non-zero element in the third row and column of \(\mathbf{G}\) is a one on the leading diagonal and so the bulge equals
\[ \begin{align*} m^{\prime}_{02} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i0} \times g_{j2} = \sum_{i=1}^1 \sum_{j=2}^2 m_{ij} \times g_{i0} \times g_{j2}\\ &= m_{12} \times g_{10} \times g_{22}\\ &= -m_{12} \times \sin\theta \end{align*} \]
The elements \(g_{i1}\) also equal zero for \(i\) greater than one and so the second leading diagonal element comes to
\[ \begin{align*} m^{\prime}_{11} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j1} = \sum_{i=0}^1 \sum_{j=0}^1 m_{ij} \times g_{i1} \times g_{j1}\\ &= m_{00} \times g_{01} \times g_{01} + m_{01} \times g_{01} \times g_{11} + m_{10} \times g_{11} \times g_{01} + m_{11} \times g_{11} \times g_{11}\\ &= m_{00} \times \sin^2\theta + m_{01} \times \sin\theta \times \cos\theta + m_{10} \times \cos\theta \times \sin\theta + m_{11} \times \cos^2\theta\\ &= m_{00} \times \sin^2\theta + m_{11} \times \cos^2\theta + 2 \times m_{01} \times \cos\theta \times \sin\theta \end{align*} \]
Finally, the second off diagonal element is given by
\[ \begin{align*} m^{\prime}_{12} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j2} = \sum_{i=1}^1 \sum_{j=2}^2 m_{ij} \times g_{i1} \times g_{j2}\\ &= m_{12} \times g_{11} \times g_{22}\\ &= m_{12} \times \cos\theta \end{align*} \]
Noting that these are the only elements that are changed during the initial rotation, we can replace the loops for it with
m00 = m[0][0]; m01 = m[0][1]; m11 = m[1][1]; m12 = m[1][2]; m[0][0] = m00*c0*c0 + m11*s0*s0 - 2*m01*c0*s0; m[1][1] = m00*s0*s0 + m11*c0*c0 + 2*m01*c0*s0; m[0][1] = m[1][0] = (m00-m11)*c0*s0 + m01*(c0*c0-s0*s0); m[0][2] = m[2][0] = -m12*s0; m[1][2] = m[2][1] = m12*c0;

which program 4 shows yields the same results.

Program 4: Expanding Out The Initial Rotation

Expanding the sums in the second Givens rotation proceeds in much the same way, although this time the only non-zero elements in the first row and column and those after the third of \(\mathbf{G}\) are ones on the leading diagonal and the second and third rows and columns are zero everywhere except in the second and third places.
The first, second and third off diagonal elements of \(\mathbf{M}^\prime\) are therefore given by
\[ \begin{align*} m^{\prime}_{10} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j0} = \sum_{i=1}^2 \sum_{j=0}^0 m_{ij} \times g_{i1} \times g_{j0}\\ &= m_{10} \times g_{11} \times g_{00} + m_{20} \times g_{21} \times g_{00}\\ &= m_{01} \times \cos\theta - m_{02} \times \sin\theta\\ \\ m^{\prime}_{12} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j2} = \sum_{i=1}^2 \sum_{j=1}^2 m_{ij} \times g_{i1} \times g_{j2}\\ &= m_{11} \times g_{11} \times g_{12} + m_{12} \times g_{11} \times g_{22} + m_{21} \times g_{21} \times g_{12} + m_{22} \times g_{21} \times g_{22}\\ &= m_{11} \times \cos\theta \times \sin\theta + m_{12} \times \cos^2\theta - m_{21} \times \sin^2\theta - m_{22} \times \sin\theta \times \cos\theta\\ &= \left(m_{11}-m_{22}\right) \times \cos\theta \times \sin\theta + m_{12} \times \left(\cos^2\theta - \sin^2\theta\right)\\ \\ m^{\prime}_{23} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i2} \times g_{j3} = \sum_{i=2}^2 \sum_{j=3}^3 m_{ij} \times g_{i2} \times g_{j3}\\ &= m_{23} \times g_{22} \times g_{33}\\ &= m_{23} \times \cos\theta \end{align*} \]
and the second and third on the leading diagonal by
\[ \begin{align*} m^{\prime}_{11} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j1} = \sum_{i=1}^2 \sum_{j=1}^2 m_{ij} \times g_{i1} \times g_{j1}\\ &= m_{11} \times g_{11} \times g_{11} + m_{12} \times g_{11} \times g_{21} + m_{21} \times g_{21} \times g_{11} + m_{22} \times g_{21} \times g_{21}\\ &= m_{11} \times \cos^2\theta - m_{12} \times \cos\theta \times \sin\theta - m_{21} \times \sin\theta \times \cos\theta + m_{22} \times \sin^2\theta\\ &= m_{11} \times \cos^2\theta + m_{22} \times \sin^2\theta - 2 \times m_{12} \times \cos\theta \times \sin\theta\\ \\ m^{\prime}_{22} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i2} \times g_{j2} = \sum_{i=1}^2 \sum_{j=1}^2 m_{ij} \times g_{i2} \times g_{j2}\\ &= m_{11} \times g_{12} \times g_{12} + m_{12} \times g_{12} \times g_{22} + m_{21} \times g_{22} \times g_{12} + m_{22} \times g_{22} \times g_{22}\\ &= m_{11} \times \sin^2\theta + m_{12} \times \sin\theta \times \cos\theta + m_{21} \times \cos\theta \times \sin\theta + m_{22} \times \cos^2\theta\\ &= m_{11} \times \sin^2\theta + m_{22} \times \cos^2\theta + 2 \times m_{12} \times \cos\theta \times \sin\theta \end{align*} \]
Finally, the new bulge equals
\[ \begin{align*} m^{\prime}_{13} &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} m_{ij} \times g_{i1} \times g_{j3} = \sum_{i=2}^2 \sum_{j=3}^3 m_{ij} \times g_{i1} \times g_{j3}\\ &= m_{23} \times g_{21} \times g_{33}\\ &= -m_{23} \times \sin\theta \end{align*} \]
the old one is replaced with a zero and all remaining elements are the same as those in \(\mathbf{M}\).
Since all subsequent rotations take the same form we can use these expansions for them too, provided that we ignore the bulge when it falls out of the bottom of the matrix, which is done by replacing the loops for them with
m01 = m[c-1][c]; m02 = m[c-1][c+1]; m11 = m[c][c]; m12 = m[c][c+1]; m22 = m[c+1][c+1]; m[c][c] = m11*c0*c0 + m22*s0*s0 - 2*m12*c0*s0; m[c+1][c+1] = m11*s0*s0 + m22*c0*c0 + 2*m12*c0*s0; m[c][c-1] = m[c-1][c] = m01*c0 - m02*s0; m[c][c+1] = m[c+1][c] = (m11-m22)*c0*s0 + m12*(c0*c0-s0*s0); m[c-1][c+1] = m[c+1][c-1] = 0; if(c<n-2) { m23 = m[c+1][c+2]; m[c+2][c] = m[c][c+2] = -m23*s0; m[c+2][c+1] = m[c+1][c+2] = m23*c0; }

in program 5.

Program 5: Expanding Out Subsequent Rotations

Now we don't actually need to store the bulge in \(\mathbf{M}\) at each step since we know that it will always be replaced with a zero. Since we're already copying it into z before the second and subsequent rotations we might as well just keep it there, which we can do by replacing the implementation of the rotations with
if(c===0) { m00 = m[0][0]; m01 = m[0][1]; m11 = m[1][1]; m12 = m[1][2]; m[0][0] = m00*c0*c0 + m11*s0*s0 - 2*m01*c0*s0; m[1][1] = m00*s0*s0 + m11*c0*c0 + 2*m01*c0*s0; m[0][1] = m[1][0] = (m00-m11)*c0*s0 + m01*(c0*c0-s0*s0); m[1][2] = m[2][1] = m12*c0; z = -m12*s0; } else { m01 = m[c-1][c]; m11 = m[c][c]; m12 = m[c][c+1]; m22 = m[c+1][c+1]; m[c][c] = m11*c0*c0 + m22*s0*s0 - 2*m12*c0*s0; m[c+1][c+1] = m11*s0*s0 + m22*c0*c0 + 2*m12*c0*s0; m[c][c-1] = m[c-1][c] = m01*c0 - z*s0; m[c][c+1] = m[c+1][c] = (m11-m22)*c0*s0 + m12*(c0*c0-s0*s0); if(c<n-2) { m23 = m[c+1][c+2]; m[c+2][c+1] = m[c+1][c+2] = m23*c0; z = -m23*s0; } }

and removing the statement copying the bulge into z
if(c===0) { if(Math.abs(m[n-1][n-2])<=e) return false; x = m[c][c] - wilkinson(m); z = m[c+1][c]; } else { x = m[c][c-1]; }

as done in program 6.

Program 6: Keeping The Bulge Separate

Since we know that \(\mathbf{M}\) is symmetric at every step we can simply update its lower diagonal at each step with
if(c===0) { m00 = m[0][0]; m10 = m[1][0]; m11 = m[1][1]; m21 = m[2][1]; m[0][0] = m00*c0*c0 + m11*s0*s0 - 2*m10*c0*s0; m[1][1] = m00*s0*s0 + m11*c0*c0 + 2*m10*c0*s0; m[1][0] = (m00-m11)*c0*s0 + m10*(c0*c0-s0*s0); m[2][1] = m21*c0; z = -m21*s0; } else { m10 = m[c][c-1]; m11 = m[c][c]; m21 = m[c+1][c]; m22 = m[c+1][c+1]; m[c][c] = m11*c0*c0 + m22*s0*s0 - 2*m21*c0*s0; m[c+1][c+1] = m11*s0*s0 + m22*c0*c0 + 2*m21*c0*s0; m[c][c-1] = m10*c0 - z*s0; m[c+1][c] = (m11-m22)*c0*s0 + m21*(c0*c0-s0*s0); if(c<n-2) { m23 = m[c+2][c+1]; m[c+2][c+1] = m23*c0; z = -m23*s0; } }

and copy its elements into the upper diagonal at the end of the iteration with
if(c===0 && Math.abs(m[n-1][n-2])<=e) { for(cc=0;cc<n-1;++cc) m[cc][cc+1] = m[cc+1][cc]; }

as demonstrated by program 7.

Program 7: Ignoring The Upper Diagonal

We can save ourselves a few multiplication operations by calculating the values of \(\sin^2\theta\), \(\cos^2\theta\), \(\cos\theta \times \sin\theta\) and \(2 \times m_{c+1,c} \times \cos\theta \times \sin\theta\) and saving them in the local variables s0s0, c0c0, c0s0 and mcs with
h = ak.hypot(x, z); s0 = -z/h; s0s0 = s0*s0; c0 = x/h; c0c0 = c0*c0; c0s0 = c0*s0; if(c===0) { m00 = m[0][0]; m10 = m[1][0]; m11 = m[1][1]; m21 = m[2][1]; mcs = 2*m10*c0s0; m[0][0] = m00*c0c0 + m11*s0s0 - mcs; m[1][1] = m00*s0s0 + m11*c0c0 + mcs; m[1][0] = (m00-m11)*c0s0 + m10*(c0c0-s0s0); m[2][1] = m21*c0; z = -m21*s0; } else { m10 = m[c][c-1]; m11 = m[c][c]; m21 = m[c+1][c]; m22 = m[c+1][c+1]; mcs = 2*m21*c0s0; m[c][c] = m11*c0c0 + m22*s0s0 - mcs; m[c+1][c+1] = m11*s0s0 + m22*c0c0 + mcs; m[c][c-1] = m10*c0 - z*s0; m[c+1][c] = (m11-m22)*c0s0 + m21*(c0c0-s0s0); if(c<n-2) { m23 = m[c+2][c+1]; m[c+2][c+1] = m23*c0; z = -m23*s0; } }

as used by program 8.

Program 8: Fewer Multiplication Operations

Whilst this is a reasonably efficient implementation it isn't particularly robust. In particular, if the first row and column of \(\mathbf{M}\) only contain zeros then at the start of the first rotation z, and consequently s0, will equal zero and so z will still be zero at its end. When the second rotation starts x will also be set to zero and therefore s0 and c0 will be NaNs which will then propagate through \(\mathbf{M}\), as demonstrated by program 9.

Program 9: Oopsie

Note that the termination criterion has been changed to
if(!(Math.abs(m[n-1][n-2])>e)) return false;

to ensure that the program stops if the last off diagonal element is NaN.
In such cases we can simply ignore the first row and column, working instead on the sub-matrix formed from the remaining ones, since they are already diagonal and we aren't going to need to transform them. In fact, we should skip past every initial row and column that are zero on their off diagonals for the same reason.
To do so we simply replace the initial column of zero with one of o which we search for at the start of the process with
for(o=0;o<n-2 && Math.abs(m[o+1][o])<=e;++o);

although we now also have to take care not to read and write past the end of \(\mathbf{M}\) by changing the first step of each iteration to
if(c===o) { m00 = m[c][c]; m10 = m[c+1][c]; m11 = m[c+1][c+1]; mcs = 2*m10*c0s0; m[c][c] = m00*c0c0 + m11*s0s0 - mcs; m[c+1][c+1] = m00*s0s0 + m11*c0c0 + mcs; m[c+1][c] = (m00-m11)*c0s0 + m10*(c0c0-s0s0); if(c<n-2) { m21 = m[c+2][c+1]; m[c+2][c+1] = m21*c0; z = -m21*s0; } }

Program 10 shows the results of changing the initial index in this way.

Program 10: Starting At An Offset

In a similar vein, if there's a zero somewhere further along the off diagonal then we should first transform the sub-matrix in the rows and columns before it and then transform the one in the rows and columns after it, avoiding another source of NaNs. To do this we can replace the size of the matrix n with the size of the sub-matrix nn that we should be working on by searching for such a zero with
for(nn=o+1;nn<n-1 && Math.abs(m[nn+1][nn])>e;++nn); ++nn;

as done in program 11.

Program 11: Stopping At A Zero Off Diagonal

Note that we still have to update all of the rows of \(\mathbf{Q}\), not just those up to the stopping point.

Now, we're currently using the absolute termination criterion
!(Math.abs(m[nn-1][nn-2])>e)

which is unsuitable if the matrix has large elements. A more reasonable approach is to use
!(Math.abs(m[nn-1][nn-2])>e*(1+Math.abs(m[nn-2][nn-2])+Math.abs(m[nn-1][nn-1])))

which will tend to an absolute comparison when the diagonal elements are small and to a relative comparison to their magnitudes when they are large, as used in program 12.

Program 12: A Better Termination Criterion

Finally, since we're no longer using any elements of \(\mathbf{M}\) beyond the upper and lower diagonals we might as well operate directly on the arrays d0 and d1 returned by our implementation of the Householder tridiagonalisation algorithm containing its diagonal and off diagonal elements respectively.
The givens function defined in listing 1 does just that, also bringing the Wilkinson shift inline and returning the offset at the end, or NaN if the loop terminates because an element of \(\mathbf{M}\) is NaN, propagating those on the off diagonal of the latter into the associated elements on its diagonal.

Listing 1: The givens Function
function givens(q, d0, d1, o, e, steps) {
 var n = q.length;
 var step = 0;
 var nn, c, r, r0, r1, qr, qr0, qr1, d01, d02, d12, d, x, z, h;
 var s0, c0, s0s0, c0c0, c0s0;
 var m00, m10, m11, m21, m22, m23, mcs;

 while(o<n-2 && Math.abs(d1[o])<=e*(1+Math.abs(d0[o])+Math.abs(d0[o+1]))) ++o;
 c = o;
 
 for(nn=o+1;nn<n-1 && Math.abs(d1[nn])>e*(1+Math.abs(d0[nn])+Math.abs(d0[nn+1]));++nn);
 ++nn;
 
 while(c!==o || Math.abs(d1[nn-2])>e*(1+Math.abs(d0[nn-2])+Math.abs(d0[nn-1]))) {
  if(step++===steps) throw new Error('failure to converge in givens');

  if(c===o) {
   d01 = d0[nn-1]; d02 = d0[nn-2]; d12 = d1[nn-2];
   d = (d02-d01)/2;
   x = d>0 ? d0[c] - d01 + d12*d12/(d + ak.hypot(d, d12))
           : d0[c] - d01 + d12*d12/(d - ak.hypot(d, d12));
   z = d1[c];
  }
  else {
   x = d1[c-1];
  }
 
  h = ak.hypot(x, z);
  s0 = -z/h; s0s0 = s0*s0;
  c0 =  x/h; c0c0 = c0*c0;
  c0s0 = c0*s0;
 
  if(c===o) {
   m00 = d0[c]; m10 = d1[c]; m11 = d0[c+1]; 
   mcs = 2*m10*c0s0;
 
   d0[c]   = m00*c0c0 + m11*s0s0 - mcs;
   d0[c+1] = m00*s0s0 + m11*c0c0 + mcs;
   d1[c]   = (m00-m11)*c0s0 + m10*(c0c0-s0s0);
 
   if(c<nn-2) {
    m21 = d1[c+1];
    d1[c+1] = m21*c0;
    z = -m21*s0;
   }
  }
  else {
   m10 = d1[c-1]; m11 = d0[c];
   m21 = d1[c];   m22 = d0[c+1];
   mcs = 2*m21*c0s0;
 
   d0[c]   = m11*c0c0 + m22*s0s0 - mcs;
   d0[c+1] = m11*s0s0 + m22*c0c0 + mcs;
   d1[c-1] = m10*c0 - z*s0;
   d1[c]   = (m11-m22)*c0s0 + m21*(c0c0-s0s0);
 
   if(c<nn-2) {
    m23 = d1[c+1];
    d1[c+1] = m23*c0;
    z = -m23*s0;
   }
  }
 
  for(r=0;r<n;++r) {
   qr = q[r]; qr0 = qr[c]; qr1 = qr[c+1];
   qr[c]   = qr0*c0 - qr1*s0;
   qr[c+1] = qr0*s0 + qr1*c0;
  }
 
  if(++c===nn-1) c = o;
 }

 if(isNaN(d1[nn-2])) d0[nn-1] = d0[nn-2] = ak.NaN;
 if(isNaN(d0[nn-1]) || isNaN(d0[nn-2])) return ak.NaN;
 return o;
}

Note that it also sets an upper bound on the number of iterations with the steps argument, throwing an error if it is exceeded.

Program 13 applies it to a matrix with a similar form to those in programs 11 and 12, having zeros in the first row and column and an off diagonal zero before the end, printing the accumulated rotations above the arrays d0 and d1.

Program 13: Using The givens Function

To diagonalise a symmetric real matrix \(\mathbf{M}\) we repeatedly call the givens function with the updated offsets until it reaches the bottom right corner of the matrix, as demonstrated by program 14 which also prints the sum of the magnitudes of the elements of d1 at the end.

Program 14: Diagonalising M

Note that the program terminates when the offset is no longer less than n-2 since this can only happen if we encounter a NaN, if every element of d1 was zero before we called the givens function or its only non-zero element was its last, which the function will have then transformed to zero.

Wilkinson and Reinsch's[7] ALGOL procedure tred2, which is probably best known from its translations into FORTRAN[8] and C[9], goes further a little further than the givens function by removing many of the local variables and eliminating some of the arithmetic operations used to apply the Givens rotations to \(\mathbf{M}\). Since the accumulation of those rotations in \(\mathbf{Q}\) is an order of magnitude more costly, my preference is to keep them as they are and so we shall be using it as it is when we add the spectral decomposition to the ak library in the next post.

References

[1] FAO The Householder, www.thusspakeak.com, 2019.

[2] Spryer Francis, www.thusspakeak.com, 2019.

[3] Francis, J., The QR Transformation - Part 1, The Computer Journal, Vol. 4, No. 3, Oxford University Press, 1961.

[4] Francis, J., The QR Transformation - Part 2, The Computer Journal, Vol. 4, No. 4, Oxford University Press, 1961.

[5] A Well Managed Household, www.thusspakeak.com, 2020.

[6] Golub, G. and Van Loan, C., Matrix Computations (3rd ed.), John Hopkins University Press, 1996.

[7] Wilkinson, J. & Reinsch, C., Handbook for Automatic Computation, Volume II: Linear Algebra, Springer, 1971

[8] Garbow, Burton S., EISPACK — A package of matrix eigensystem routines, Computer Physics Communications, Vol. 7, 1974

[9] Press, W.H. et al, Numerical Recipes in C (2nd ed.), Cambridge University Press, 1992.

Leave a comment