Who Are You Calling Average?

| 0 Comments

Figure 1: A Total Bust!
The Victorians, for all of their advances in science and engineering, had some pretty peculiar beliefs, one of which was phrenology; the notion that you could accurately determine a person's psychological makeup from the shape of their skull. That this is utterly misguided may seem obvious these days, but keep in mind that we've had over a century in which to figure that out.
That said, this obsession with correlating physiological features with psychological ones did result in a rather interesting observation; the remarkable polymath Sir Francis Galton, who amongst his many discoveries independently came up with the very notion of correlation, hypothesised that by repeatedly exposing a photographic film to faces of criminals he might capture an image of the archetypical criminal. Armed with such an image we might have been able to do away with the tedious business of gathering evidence and putting suspects on trial by simply locking up everybody who looked a bit dodgy.

Figure 2: An Average Looking Couple
What he actually found was that such composite faces were rather attractive, as illustrated by the slightly blurry couple in figure 2 whose pixels are simply the averages of those of the faces framing them (a selection from Kennedy's face database[1]).
Now, given how much easier it is to manipulate digital images than it is to manipulate photographic negatives (it only took me a few hours to move and resize those faces to align their eyes and mouths, for example), it was only a matter of time before a couple of bright sparks suggested doing something a little more sophisticated than simply taking averages.

The Many Faces Of The Eigen

Sirovich and Kirby[2] noted that by concatenating the pixel values in the rows of a greyscale facial image with \(c\) columns we can easily transform it into a vector \(\mathbf{w}_k\) with elements
\[ \left(\mathbf{w}_k\right)_{i \times c + j} = \mathrm{face}_k[i][j] \]
and can consequently perform a principal component analysis on a collection of them.
As you will no doubt recall from a previous post[3], a principal component analysis determines the axes that best account for the variation in a sample of \(n\) vectors and is computed with the eigensystem of the covariance matrix \(\mathbf{\Sigma}\) having elements
\[ \mathbf{\Sigma}_{ij} = \frac{1}{n} \sum_{k=1}^n \left(\mathbf{w}_k - \mathbf{\bar{w}}\right)_i \times \left(\mathbf{w}_k - \mathbf{\bar{w}}\right)_j \]
where \(\mathbf{\bar{w}}\) is the sample mean and \(\sum\) is, somewhat confusingly in this case, the summation sign.
By keeping just those principal components \(\mathbf{v}_j\) that account for most of the variation in the sample we can approximate an image's vector \(\mathbf{w}_k\) with another vector \(\mathbf{y}_k\) that is much smaller
\[ \mathbf{w}_k \approx \mathbf{\bar{w}} + \sum_{j} \left(\mathbf{y}_k\right)_j \times \mathbf{v}_j \]
where the elements of \(\mathbf{y}_k\) are simply the distances that \(\mathbf{w}_k - \mathbf{\bar{w}}\) lies along the components, which we can easily find with the vector product
\[ \left(\mathbf{y}_k\right)_j = \left(\mathbf{w}_k - \mathbf{\bar{w}}\right) \times \mathbf{v}_j \]
Figure 3: A Shadow Gallery
Note that we can do exactly the same thing with faces that weren't in the original sample, enabling us to generate a relatively small ID code that directly translates to an approximate image of an individual's face.

The images that result from translating these eigenvectors back into bitmaps are known as eigenfaces and in figure 3 the shadowy images surrounding the averages of our male and female faces are the eight that account for most of their variation.

Whilst eigenfaces promise a extremely efficient way to approximate facial images, the algorithm for generating them, as currently defined, is extremely inefficient. For example, the images in figure 3 have 200 rows and 150 columns and so we must find the eigenvectors of a 30,000 by 30,000 element covariance matrix. Finding the eigenvectors of an \(n\) by \(n\) matrix is an \(O\left(n^3\right)\) operation, so this requires something along the lines of 1013 arithmetic operations.

Linear Algebra To The Rescue!

Fortunately, with a little linear algebra we can significantly reduce the computational expense of generating eigenfaces.

Recall that we can compute the covariance matrix \(\mathbf{\Sigma}\) from a matrix \(\mathbf{W}\) whose rows are the vectors \(\mathbf{w}_k - \mathbf{\bar{w}}\)
\[ \begin{align*} \mathbf{W}_{ki} &= \left(\mathbf{w}_k - \mathbf{\bar{w}}\right)_i\\ \mathbf{\Sigma} &= \frac{1}{n} \left(\mathbf{W}^\mathrm{T} \times \mathbf{W}\right) \end{align*} \]
where \(\mathbf{W}^\mathrm{T}\) is the transpose of \(\mathbf{W}\) in which its rows are swapped with its columns. Now if we define a new symmetric matrix \(\mathbf{\Sigma}^\prime\) with
\[ \mathbf{\Sigma}^\prime = \frac{1}{n} \left(\mathbf{W} \times \mathbf{W}^\mathrm{T}\right) \]
it will have eigenvalues \(\lambda^\prime_i\) and eigenvectors \(\mathbf{v}^\prime_i\) defined by

Derivation 1: The Relationship Between The Eigensystems Of Σ' And Σ
Given the eigensystem of \(\mathbf{\Sigma}^\prime = \dfrac{1}{n} \left(\mathbf{W} \times \mathbf{W}^\mathrm{T}\right)\)
\[ \mathbf{\Sigma}^\prime \times \mathbf{v}^\prime_i = \lambda^\prime_i \times \mathbf{v}^\prime_i \]
we have
\[ \begin{align*} \frac{1}{n} \left(\mathbf{W} \times \mathbf{W}^\mathrm{T}\right) \times \mathbf{v}_i^\prime &= \lambda_i^\prime \times \mathbf{v}_i^\prime\\ \mathbf{W}^\mathrm{T} \times \frac{1}{n} \left(\mathbf{W} \times \mathbf{W}^\mathrm{T}\right) \times \mathbf{v}_i^\prime &= \mathbf{W}^\mathrm{T} \times \lambda_i^\prime \times \mathbf{v}_i^\prime\\ \frac{1}{n} \left(\mathbf{W}^\mathrm{T} \times \mathbf{W}\right) \times \left(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\right) &= \lambda_i^\prime \times \left(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\right)\\ \mathbf{\Sigma} \times \left(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\right) &= \lambda_i^\prime \times \left(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\right) \end{align*} \]
and consequently \(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\) are eigenvectors of \(\mathbf{\Sigma}\) with eigenvalues \(\lambda_i^\prime\).
\(\displaystyle{ \mathbf{\Sigma}^\prime \times \mathbf{v}_i^\prime = \lambda_i^\prime \times \mathbf{v}_i^\prime }\)

and \(\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\) will be eigenvectors of \(\mathbf{\Sigma}\) with eigenvalues \(\lambda^\prime_i\), as shown in derivation 1.

This is tremendously convenient since the number of rows and columns of \(\mathbf{\Sigma}^\prime\) are equal to the number of images rather than the number of pixels in those images, a much smaller number, and so computing its eigensystem is far, far easier. The eigenfaces in example 3, for example, were derived from 36 faces and so only required something of the order of 105 operations to compute, some eight decimal orders of magnitude fewer than the naive approach!

Note that by convention we choose eigenvectors that are of unit length so have
\[ \begin{align*} \lambda_i &= \lambda^\prime_i\\ \mathbf{v}_i &= \frac{\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime}{\left|\mathbf{W}^\mathrm{T} \times \mathbf{v}_i^\prime\right|} \end{align*} \]
where the vertical bars represent the length of the vector between them.

Some Helper Functions

Before we can start implementing this algorithm for generating eigenfaces, we'll need a way to convert bitmaps into the numerical vectors that it needs. Given that a bitmap is two dimensional, it made sense to me to use a matrix of pixel intensities instead.
To manipulate pixels in JavaScript, we use JavaScript's ImageData class which contains read/write pixel data for a Canvas object. The bytes representing the red, green, blue and alpha components of each pixel are contained, in that order, in its array-like data property, running four bytes at a time row-wise from the top-left to the bottom right. To transform these into a single intensity value between zero and one, we simply add the red, green and blue bytes and divide by the maximum possible result, 765. To factor in transparency we also multiply the result by the alpha byte divided by 255.
Listing 1 gives the ak.imageDataToMatrix function which implements this conversion from an ImageData object to an ak.matrix of pixel intensities.

Listing 1: ak.imageDataToMatrix
ak.imageDataToMatrix = function(img) {
 var rows = img.height;
 var cols = img.width;
 var data = img.data;
 var arr = new Array(rows);
 var i, r, c, row, x;

 i = 0;
 for(r=0;r<rows;++r) {
  row = new Array(cols);
  for(c=0;c<cols;++c) {
   x  = data[i++];
   x += data[i++];
   x += data[i++];
   x *= data[i++]/255;

   row[c] = x/765;
  }
  arr[r] = row;
 }
 return ak.matrix(arr);
};

When converting back from an intensity matrix to a bitmap, we must be satisfied with greyscale since we've thrown away the colour information. Specifically, we shall set each colour byte to the same value and make the bitmap fully opaque by setting the alpha byte to 255. By default we'll assume that the intensity ranges between zero and one, as they do in the results of ak.imageDataToMatrix, and will truncate the matrix elements to that range, but we shall also allow optional max and min arguments to allow the user to specify a different range should they wish to, as shown in listing 2.

Listing 2: ak.matrixToImageData
ak.matrixToImageData = function(matrix, max, min) {
 var canvas = document.createElement('canvas');
 var rows, cols, arr, img, data;
 var s, arr, i, r, c, row, x;

 if(ak.nativeType(max)===ak.UNDEFINED_T) max = 1.0;
 if(ak.nativeType(min)===ak.UNDEFINED_T) min = 0.0;
 s = 1/(max-min);

 if(!isFinite(s) || s<=0) {
  throw new Error('invalid range in ak.matrixToImageData');
 }

 if(ak.type(matrix)!==ak.MATRIX_T) {
  throw new Error('invalid matrix in ak.matrixToImageData');
 }

 rows = matrix.rows();
 cols = matrix.cols();
 arr  = matrix.toArray();

 img  = canvas.getContext('2d').createImageData(cols, rows);
 data = img.data;

 i = 0;
 for(r=0;r<rows;++r) {
  row = arr[r];
  for(c=0;c<cols;++c) {

   x = Math.min(Math.max((row[c]-min)*s, 0), 1);
   x = Math.round(x*255);

   data[i++] = x;
   data[i++] = x;
   data[i++] = x;
   data[i++] = 255;
  }
 }
 return img;
};

Note that to create the resulting ImageData object, we need a Canvas object which we create on the first line. After checking that the intensity range and matrix are valid, we create an ImageData object with the same dimensions as the latter and populate it with opaque greyscale pixels as described above.

In addition to converting bitmaps to and from intensity matrices, we need a reliable mechanism for loading them in the first place. We can't simply create Image objects and point them at image files since they are loaded asynchronously by the browser. We therefore need something like our ak.asynchLoop[4] to ensure that the bitmaps are ready before we try to use them.
The function ak.imageDataLoader, given in listing 3, takes the path of an image file, or an array of them, and a function f to be called with an array of ImageData objects containing their pixel data once they have all been loaded.

Listing 3: ak.imageDataLoader
ak.imageDataLoader = function(images, f) {
 if(ak.nativeType(images)!==ak.ARRAY_T) images = [images];
 else images = images.slice(0);

 imageDataLoader(0, images, f);
};

Note that we simply force the images argument to contain a unique array before forwarding to imageDataLoader, given in listing 4, to do the actual work.

Listing 4: imageDataLoader
function imageDataLoader(i, images, f) {
 var img, canvas, ctx;

 if(i<images.length) {
  img = document.createElement('img');
  img.src = images[i];
  images[i] = img;

  img.onerror = function() {images[i]=undefined; imageDataLoader(i+1, images, f);};
  img.onload  = function() {imageDataLoader(i+1, images, f);};
 }
 else {
  canvas = document.createElement('canvas');
  for(i=0;i<images.length;++i) {
   if(images[i]) {
    canvas.width  = images[i].naturalWidth;
    canvas.height = images[i].naturalHeight;

    ctx = canvas.getContext('2d');
    ctx.drawImage(images[i], 0, 0);

    try      {images[i] = ctx.getImageData(0, 0, canvas.width, canvas.height);}
    catch(e) {images[i] = undefined;}
   }
  }
  f(images);
 }
}

To be certain that an image has been completely handled by the browser before trying to load the next one, we defer doing so to the Image object's onerror and onload event handlers, setting array elements to undefined in the former. After the last image has been handled, we draw each in turn onto a Canvas object and store the ImageData containing its pixels in the images array, again setting elements to undefined in the event of errors. Finally, we pass it to the function f to be processed.

Note that the implementations of these helper functions can be found in ImageDataMatrix.js and ImageDataLoader.js and in the next post we shall use them whilst implementing our algorithm for the generation of eigenfaces and an application that exploits them.

References

[1] Kennedy, K. M., Hope, K. & Raz, N., Lifespan Adult Faces: Norms for Age, Familiarity, Memorability, Mood, and Picture Quality, Experimental Aging Research, Vol. 35, No. 2, pp. 268-275, 2009.

[2] Sirovich, L. & Kirby, M., Low-Dimensional Procedure for the Characterization of Human Faces, Journal of the Optical Society of America A, Vol. 4, No. 3, pp. 519-524, 1987.

[3] The Principals Of The Things, www.thusspakeak.com, 2014.

[4] Climbing Mount Impractical, www.thusspakeak.com, 2013.

Leave a comment