This time we shall take a look at the binomial distribution which governs the number of successes out of

This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur. ]]>

We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue. ]]>

These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post. ]]>

This time we shall take a look at another family of special functions derived from the beta function B. ]]>

Whilst we didn't originally derive the Cauchy distribution in this way, there are others, known as ratio distributions, that are explicitly constructed in this manner and in this post we shall take a look at one of them. ]]>

An easy way to create rotationally symmetric functions, known as radial basis functions, is to apply univariate functions that are symmetric about zero to the distance between the interpolation's argument and their associated nodes. PDFs are a rich source of such functions and, in fact, the second bell shaped curve that we considered is related to that of the Cauchy distribution, which has some rather interesting properties. ]]>

An alternative approach is to construct a single function that passes through all of the points and, given that

and together are known as the spectral decomposition of

In this post, we shall add it to the

`ak`

library using the `householder`

and `givens`

functions that we have put so much effort into optimising.]]>

The columns of the matrix of transformations

and, since the product of

which is known as the spectral decomposition of

Last time we saw how we could efficiently apply the Householder transformations in-place, replacing the elements of

implying that the columns of

where

From a mathematical perspective the combination of Householder transformations and shifted Givens rotations is particularly appealing, converging on the spectral decomposition after relatively few matrix multiplications, but from an implementation perspective using

`ak.matrix`

multiplication operations is less than satisfactory since it wastefully creates new `ak.matrix`

objects at each step and so in this post we shall start to see how we can do better.
]]>
and, since the transpose of

which is known as the spectral decomposition of

Unfortunately, the way that we used Givens rotations to diagonalise tridiagonal symmetric matrices wasn't particularly efficient and I concluded by stating that it could be significantly improved with a relatively minor change. In this post we shall see what it is and why it works. ]]>

known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product

where the columns of

You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with

where

You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it. ]]>

A simple way of constructing them is to initially place each datum in its own cluster and then iteratively merge the closest pairs of clusters in each clustering to produce the next one in the sequence, stopping when all of the data belong to a single cluster. We have considered three ways of measuring the distance between pairs of clusters, the average distance between their members, the distance between their closest members and the distance between their farthest members, known as average linkage, single linkage and complete linkage respectively, and implemented a reasonably efficient algorithm for generating hierarchical clusterings defined with them, using a min-heap structure to cache the distances between clusters.

Finally, I claimed that there is a more efficient algorithm for generating single linkage hierarchical clusterings that would make the sorting of clusters by size in our

`ak.clustering`

type too expensive and so last time we implemented the `ak.rawClustering`

type to represent clusterings without sorting their clusters which we shall now use in the implementation of that algorithm.
]]>