Last time^{[1]} we implemented the
We then went on to define the
Now that we have those structures in place we're ready to see how to create hierarchical clusterings and so in this post we shall start with a simple, general purpose, but admittedly rather inefficient, way to do so.
In order to define such similarity heuristics we first need a function \(\delta\) to measure the distance between pairs of data which typically satisfy the relations
Given a pair of clusters \(X_i\) and \(X_j\) in a set of data \(X\), a natural measure of the distance between them is the average distance between their members
Note that this contributes to our assumption that the first of the distance relations is satisfied since \(\delta(x_j,x_i)\) doesn't appear in the formula and we shall be identifying the smallest distance between any of \(n\) clusters with
Since the distances between the data won't change from one step of a hierarchical clustering algorithm to the next, we can save ourselves more work by using a distance matrix^{[3]} \(\mathbf{D}\) having the elements
Another way that we might measure how close two clusters are to each other is by the distance between their nearest elements with
Hierarchical clusterings generated with this distance measure are known as single linkage clusterings and an implementation of it is given in listing 2.
We might also measure the distance between clusters by that of their farthest elements with
Since we know that if there are just two clusters then the next clustering must have just one, we are treating it as a special case in which we set the cluster membership of each datum to zero.
Program 1 demonstrates the results of this function for the set of scalars
As a basic check that this correctly identifies which clusters to merge we can work through step five of the average linkage clusterings in program 1. At the end of step four it had the memberships
and consequently the clusters
and therefore the clusters
Note that the clusterings at the start and after steps one, two, six and seven are the same for all three linkages whereas after step four only the average and single linkage clusterings are equal and after steps three and five only the average and complete linkage clusterings are. This switching between which clusterings are equal nicely reflects the fact that the average linkage distance lies between the single and complete linkage distances but it's by no means guaranteed that any but the first and last few won't all be different in general.
Arbitrary results are best avoided and so it would be much better to merge every pair of clusters that are separated by the minimum distance, in this case yielding
Now you may be unconvinced that such updates will work for more than three clusters so let's also look at an example with five
Listing 5 updates the
Program 2 demonstrates the difference that it makes to the hierarchical clusterings of our example data.
Here you can see that in all three clusterings steps one and two have been combined, as expected, and that in the average linkage clusterings steps four and five have also been combined, as have steps six and seven in the complete linkage clusterings and steps three and four and steps five and six in the single linkage clusterings.
To check that steps four and five in the average linkage clusterings were correctly combined, recall that its memberships at the end of steps three, four and five in program 1 were
meaning that in step four we merged the clusters
Whilst this approach yields satisfyingly nonarbitrary sequences of clusterings, it requires an unfortunately expensive average of \(O\left(n^2\right)\) operations in the calculation of the cluster distances at each of its typical \(O(n)\) steps and so next time we shall take a look at how we can more efficiently construct hierarchical clusterings for definitions of the distance between pairs of clusters that depend only upon the distances between each of their members.
[2] Some Of These Things Are Not Like The Others, www.thusspakeak.com, 2015.
[3] In The Neighbourhood, www.thusspakeak.com, 2019.
ak.clusterings
type to store a set of ak.clustering
objects^{[2]} in order to represent hierarchical clusterings, which are sequences of clusterings having the property that if a pair of data are in the same cluster in one clustering then they will be in the same cluster in the next, where clusters are subsets of a set of data that are in some sense similar to each other.We then went on to define the
ak.clade
type to represent hierarchical clusterings as trees, so named because that's what they're called in biology when they are used to show the relationships between species and their common ancestors.Now that we have those structures in place we're ready to see how to create hierarchical clusterings and so in this post we shall start with a simple, general purpose, but admittedly rather inefficient, way to do so.
Cluster Similarity Measures
Recall that the basic idea for constructing hierarchical clusterings was to begin by assigning each datum to its own cluster and then progressively merge the clusters using some heuristic, or rule of thumb, to measure the distance between pairs of clusters and selecting the closest clusters according to it at each step until all of the data are assigned to a single cluster.In order to define such similarity heuristics we first need a function \(\delta\) to measure the distance between pairs of data which typically satisfy the relations
\[
\begin{align*}
\delta\left(y, x\right) &= \delta\left(x, y\right)\\
\delta\left(x, x\right) &= 0\\
\delta\left(x, y\right) &> 0
\end{align*}
\]
for distinct \(x\) and \(y\), and
\[
\delta\left(x, z\right) \leqslant \delta\left(x, y\right) + \delta\left(y, z\right)
\]
for any \(x\), \(y\) and \(z\), although we shall only require, or more accurately assume, that the first of them holds.Given a pair of clusters \(X_i\) and \(X_j\) in a set of data \(X\), a natural measure of the distance between them is the average distance between their members
\[
\Delta_\mu\left(X_i, X_j\right) = \frac{\sum_{x_i \in X_i} \sum_{x_j \in X_j} \delta(x_i,x_j)}{X_i \times X_j}
\]
where \(\sum\) is the summation sign, \(\in\) means within and the vertical bars stand for the size of the term between them, in this case the number of elements in the cluster.Note that this contributes to our assumption that the first of the distance relations is satisfied since \(\delta(x_j,x_i)\) doesn't appear in the formula and we shall be identifying the smallest distance between any of \(n\) clusters with
\[
\min_{0 \leqslant i < j < n} \Delta_\mu\left(X_i, X_j\right)
\]
reducing the number of function evaluations by half.Since the distances between the data won't change from one step of a hierarchical clustering algorithm to the next, we can save ourselves more work by using a distance matrix^{[3]} \(\mathbf{D}\) having the elements
\[
d_{ij} = d_{ji} = \delta\left(x_i, x_j\right)
\]
where \(x_i\) and \(x_j\) are the \(i^\mathrm{th}\) and \(j^\mathrm{th}\) elements of \(X\) respectively, and redefining the clusters' elements as the indices of their data in \(X\), so that
\[
\Delta_\mu\left(X_i, X_j\right) = \frac{\sum_{i{}^\prime \in X_i} \sum_{j{}^\prime \in X_j} d_{i{}^\prime j{} ^\prime}}{X_i \times X_j}
\]
Hierarchical clusterings that are constructed using \(\Delta_\mu\) to calculate cluster distances are known as average linkage clusterings and listing 1 shows how we might implement it for an ak.matrix
distance matrix d
and ak.cluster
clusters of indices xi
and xj
.Listing 1: Average Linkage Distance
function averageLinkage(d) { var n; if(ak.type(d)!==ak.MATRIX_T  d.rows()!==d.cols()) { throw new Error('invalid distance matrix in averageLinkage'); } n = d.rows(); return function(xi, xj) { var ni = xi.size(); var nj = xj.size(); var i, j, s; i = 0; while(i<ni && xi.at(i)<n) ++i; if(i<ni) throw new Error('invalid distance matrix in averageLinkage'); j = 0; while(j<nj && xj.at(j)<n) ++j; if(j<nj) throw new Error('invalid distance matrix in averageLinkage'); s = 0; for(i=0;i<ni;++i) for(j=0;j<nj;++j) s += d.at(xi.at(i), xj.at(j)); return s/(ni*nj); }; }
Another way that we might measure how close two clusters are to each other is by the distance between their nearest elements with
\[
\Delta_\alpha\left(X_i, X_j\right) = \min_{i{}^\prime \in X_i \wedge j{}^\prime \in X_j} d_{i{}^\prime j{}^\prime}
\]
where \(\wedge\) means and.Hierarchical clusterings generated with this distance measure are known as single linkage clusterings and an implementation of it is given in listing 2.
Listing 2: Single Linkage Distance
function singleLinkage(d) { var n; if(ak.type(d)!==ak.MATRIX_T  d.rows()!==d.cols()) { throw new Error('invalid distance matrix in singleLinkage'); } n = d.rows(); return function(xi, xj) { var ni = xi.size(); var nj = xj.size(); var i, j, s; i = 0; while(i<ni && xi.at(i)<n) ++i; if(i<ni) throw new Error('invalid distance matrix in singleLinkage'); j = 0; while(j<nj && xj.at(j)<n) ++j; if(j<nj) throw new Error('invalid distance matrix in singleLinkage'); s = ak.INFINITY; for(i=0;i<ni;++i) for(j=0;j<nj;++j) s = Math.min(s, d.at(xi.at(i), xj.at(j))); return s; }; }
We might also measure the distance between clusters by that of their farthest elements with
\[
\Delta_\omega\left(X_i, X_j\right) = \max_{i{}^\prime \in X_i \wedge j{}^\prime \in X_j} d_{i{}^\prime j{}^\prime}
\]
which yields complete linkage clusterings and is implemented in listing 3.Listing 3: Complete Linkage Distance
function completeLinkage(d) { var n; if(ak.type(d)!==ak.MATRIX_T  d.rows()!==d.cols()) { throw new Error('invalid distance matrix in completeLinkage'); } n = d.rows(); return function(xi, xj) { var ni = xi.size(); var nj = xj.size(); var i, j, s; i = 0; while(i<ni && xi.at(i)<n) ++i; if(i<ni) throw new Error('invalid distance matrix in completeLinkage'); j = 0; while(j<nj && xj.at(j)<n) ++j; if(j<nj) throw new Error('invalid distance matrix in completeLinkage'); s = ak.INFINITY; for(i=0;i<ni;++i) for(j=0;j<nj;++j) s = Math.max(s, d.at(xi.at(i), xj.at(j))); return s; }; }
Naively Generating Hierarchical Clusterings
The simplest way to generate each clustering in a hierarchical sequence of clusterings is to iterate over every pair of clusters in the current clustering, select the first that are separated by the smallest distance according to the type of linkage and merge them, as is done by thenextHierarchical
function in listing 4 where the index of the first element in each pair i
is strictly less than that of the second j
as described above.Listing 4: Generating The Next Hierarchical Clustering
function nextHierarchical(clustering, linkage) { var clusters = clustering.clusters; var n = clusters.size(); var min = ak.NaN; var mapping; var members, i, j, d; if(n<=1) return clustering; members = clustering.memberships.toArray(); if(n===2) { for(i=0;i<members.length;++i) members[i] = 0; } else { for(i=0;i<n1;++i) for(j=i+1;j<n;++j) { d = linkage(clusters.at(i), clusters.at(j)); if(isNaN(min)  d<min) {mapping = [i,j]; min = d;} } for(i=0;i<members.length;++i) { if(members[i]===mapping[1]) members[i] = mapping[0]; } } return ak.clustering(members); }
Since we know that if there are just two clusters then the next clustering must have just one, we are treating it as a special case in which we set the cluster membership of each datum to zero.
Program 1 demonstrates the results of this function for the set of scalars
\[
X = \{17, 2, 8, 4, 5, 14, 10, 1\}
\]
using the averageLinkage
, singleLinkage
and completeLinkage
cluster distance functions and calculating the distance matrix
\[
\mathbf{D} =
\begin{pmatrix}
0 & 15 & 9 & 13 & 12 & 3 & 7 & 16 \\
15 & 0 & 6 & 2 & 3 & 12 & 8 & 1 \\
9 & 6 & 0 & 4 & 3 & 6 & 2 & 7 \\
13 & 2 & 4 & 0 & 1 & 10 & 6 & 3 \\
12 & 3 & 3 & 1 & 0 & 9 & 5 & 4 \\
3 & 12 & 6 & 10 & 9 & 0 & 4 & 13 \\
7 & 8 & 2 & 6 & 5 & 4 & 0 & 9 \\
16 & 1 & 7 & 3 & 4 & 13 & 9 & 0
\end{pmatrix}
\]
with ak.distanceMatrix
, which also makes the assumption that the data distance measure is symmetric.
Program 1: Comparing Linkages



As a basic check that this correctly identifies which clusters to merge we can work through step five of the average linkage clusterings in program 1. At the end of step four it had the memberships
4: {2,0,1,0,0,3,1,0}
and consequently the clusters
\[
\begin{align*}
X_0 &= \{1,3,4,7\} & X_1 &= \{2,6\} & X_2 &= \{0\} & X_3 &= \{5\}
\end{align*}
\]
Manually calculating the average linkage distances between them yields
\[
\begin{align*}
\Delta_\mu\left(X_0,X_1\right) & = \frac{d_{12}+d_{16}+d_{32}+d_{36}+d_{42}+d_{46}+d_{72}+d_{76}}{8}\\
& = \frac{6+8+4+6+3+5+7+9}{8} = \frac{48}{8} = 6\\
\Delta_\mu\left(X_0,X_2\right) & = \frac{d_{10}+d_{30}+d_{40}+d_{70}}{4} = \frac{15+13+12+16}{4} = \frac{56}{4} = 14\\
\Delta_\mu\left(X_0,X_3\right) & = \frac{d_{15}+d_{35}+d_{45}+d_{75}}{4} = \frac{12+10+9+13}{4} = \frac{44}{4} = 11\\
\Delta_\mu\left(X_1,X_2\right) & = \frac{d_{20}+d_{60}}{2} = \frac{9+7}{2} = \frac{16}{2} = 8\\
\Delta_\mu\left(X_1,X_3\right) & = \frac{d_{25}+d_{65}}{2} = \frac{6+4}{2} = \frac{10}{2} = 5\\
\Delta_\mu\left(X_2,X_3\right) & = d_{05} = 3
\end{align*}
\]
from which we can see that we should merge \(X_2\) and \(X_3\). Now, at the end of step five we had the memberships
5: {1,0,2,0,0,1,2,0}
and therefore the clusters
\[
\begin{align*}
X_0 &= \{1,3,4,7\} & X_1 &= \{0,5\} & X_2 &= \{2,6\}
\end{align*}
\]
as expected. If you have the patience then I'd certainly recommend that you work through a few steps of all three hierarchical clusterings to satisfy yourself that ak.nextHierarchical
correctly constructs them.Note that the clusterings at the start and after steps one, two, six and seven are the same for all three linkages whereas after step four only the average and single linkage clusterings are equal and after steps three and five only the average and complete linkage clusterings are. This switching between which clusterings are equal nicely reflects the fact that the average linkage distance lies between the single and complete linkage distances but it's by no means guaranteed that any but the first and last few won't all be different in general.
A Slightly More Sophisticated Approach
One of the problems with this implementation is that if more than one pair of clusters are separated by the minimum distance then the order in which they are merged is somewhat arbitrary. In our example, at the first step we have the pairs
\[
\begin{align*}
X_1 &= \{1\} & X_7 &= \{7\} & \Delta\left(X_1, X_7\right) &= d_{17} = 1\\
X_3 &= \{3\} & X_4 &= \{4\} & \Delta\left(X_3, X_4\right) &= d_{34} = 1
\end{align*}
\]
of which we only merge the first to yield the clustering
\[
\begin{align*}
X_0 &= \{1,7\} & X_1 &= \{0\} & X_2 &= \{2\} & X_3 &= \{3\}\\
X_4 &= \{4\} & X_5 &= \{5\} & X_6 &= \{6\}
\end{align*}
\]
following our convention of labelling the clusters from largest to smallest and then by the lowest to highest of the smallest index of their members in \(X\).Arbitrary results are best avoided and so it would be much better to merge every pair of clusters that are separated by the minimum distance, in this case yielding
\[
\begin{align*}
X_0 &= \{1,7\} & X_1 &= \{3,4\} & X_2 &= \{0\}\\
X_3 &= \{2\} & X_4 &= \{5\} & X_5 &= \{6\}
\end{align*}
\]
In order to do this we need to replace the mapping
of the one cluster to merge into another in nextHierarchical
with a list of mappings
. When it comes to applying them, however, we must be careful to deal with some corner cases. For example, if we have a minimum cluster distances of either
\[
\Delta\left(X_i, X_j\right) = \Delta\left(X_j, X_k\right)
\]
or
\[
\Delta\left(X_i, X_k\right) = \Delta\left(X_j, X_k\right)
\]
for \(i < j < k\), then we must merge the three of them into a single cluster, which we can do by updating the membership mappings
\[
\begin{align*}
i &\leftarrow j & j &\leftarrow k\\
i &\leftarrow k & j &\leftarrow k
\end{align*}
\]
with
\[
\begin{align*}
i &\leftarrow j & i &\leftarrow k\\
i &\leftarrow k & j &\leftarrow i
\end{align*}
\]
respectively. It's important to note that the last of these mappings is no longer from a smaller to a larger cluster id but that it doesn't actually matter provided that we apply them in the order that they were updated. For example, applying the two sequences of mappings
\[
\begin{align*}
i &\leftarrow k & j &\leftarrow i\\
i &\leftarrow k & i &\leftarrow j
\end{align*}
\]
to the set of memberships
\[
\{i,j,k,a,b,c\}
\]
yield
\[
\{i,j,k,a,b,c\} \rightarrow \{i,j,i,a,b,c\} \rightarrow \{j,j,j,a,b,c\}\\
\{i,j,k,a,b,c\} \rightarrow \{i,j,i,a,b,c\} \rightarrow \{i,i,i,a,b,c\}
\]
respectively, which, although they have different cluster indices, represent the same clustering.Now you may be unconvinced that such updates will work for more than three clusters so let's also look at an example with five
\[
\Delta\left(X_i, X_m\right) = \Delta\left(X_j, X_m\right) = \Delta\left(X_k, X_l\right) = \Delta\left(X_l, X_m\right)
\]
for \(i < j < k < l < m\), yielding the mappings
\[
\begin{align*}
i & \leftarrow m & j & \leftarrow m & k & \leftarrow l & l & \leftarrow m
\end{align*}
\]
Applying the updates from the first, then from the second and finally from the third to the last gives us
\[
\begin{align*}
i & \leftarrow m & j & \leftarrow i & k & \leftarrow l & l & \leftarrow i\\
i & \leftarrow m & j & \leftarrow i & k & \leftarrow l & l & \leftarrow j\\
i & \leftarrow m & j & \leftarrow i & k & \leftarrow l & k & \leftarrow j
\end{align*}
\]
which results in the sequence of memberships
\[
\begin{align*}
\{i,j,k,l,m,a,b,c\} &\rightarrow \{i,j,k,l,i,a,b,c\} \rightarrow \{j,j,k,l,j,a,b,c\}\\
&\rightarrow \{j,j,k,k,j,a,b,c\} \rightarrow \{k,k,k,k,k,a,b,c\}
\end{align*}
\]
To understand why this will always work as expected it's best to stop thinking about the identifiers as indices, but as labels instead. Provided that a label change defined by a mapping is made in every mapping that will be applied to the memberships after it, then the set of mappings must, each in their turn, update the labels consistently.Listing 5 updates the
nextHierarchical
function to implement this improved approach to merging clusters.Listing 5: Enabling Multiple Merges
function nextHierarchical(clustering, linkage) { var clusters = clustering.clusters; var n = clusters.size(); var min = ak.NaN; var mappings = []; var members, i, j, d, c; if(n<=1) return clustering; members = clustering.memberships.toArray(); if(n===2) { for(i=0;i<members.length;++i) members[i] = 0; } else { for(i=0;i<n1;++i) for(j=i+1;j<n;++j) { d = linkage(clusters.at(i), clusters.at(j)); c = ak.floatCompare(d, min); if(c<0) {mappings = [[i,j]]; min = d;} else if(c===0) mappings.push([i,j]); } for(i=0;i<mappings.length1;++i) for(j=i+1;j<mappings.length;++j) { if(mappings[j][0]===mappings[i][1]) mappings[j][0] = mappings[i][0]; if(mappings[j][1]===mappings[i][1]) mappings[j][1] = mappings[i][0]; } for(i=0;i<mappings.length;++i) for(j=0;j<members.length;++j) { if(members[j]===mappings[i][1]) members[j] = mappings[i][0]; } } return ak.clustering(members); }
Program 2 demonstrates the difference that it makes to the hierarchical clusterings of our example data.
Program 2: The Result Of Multiple Merges



Here you can see that in all three clusterings steps one and two have been combined, as expected, and that in the average linkage clusterings steps four and five have also been combined, as have steps six and seven in the complete linkage clusterings and steps three and four and steps five and six in the single linkage clusterings.
To check that steps four and five in the average linkage clusterings were correctly combined, recall that its memberships at the end of steps three, four and five in program 1 were
3: {3,0,1,2,2,4,1,0}
4: {2,0,1,0,0,3,1,0}
5: {1,0,2,0,0,1,2,0}
meaning that in step four we merged the clusters
\[
\begin{align*}
X_0 &= \{1, 7\} & X_2 &= \{3, 4\}
\end{align*}
\]
which had an average linkage distance of
\[
\Delta_\mu\left(X_0,X_2\right) = \frac{d_{13}+d_{14}+d_{73}+d_{74}}{4} = \frac{2+3+3+4}{4} = \frac{12}{4} = 3
\]
and that in step five we merged
\[
\begin{align*}
X_2 &= \{0\} & X_3 &= \{5\}
\end{align*}
\]
at a distance of
\[
\Delta_\mu\left(X_2,X_3\right) = d_{05} = 3
\]
I leave it to you to verify that the steps in the single and complete linkage clusterings were also correctly combined.Whilst this approach yields satisfyingly nonarbitrary sequences of clusterings, it requires an unfortunately expensive average of \(O\left(n^2\right)\) operations in the calculation of the cluster distances at each of its typical \(O(n)\) steps and so next time we shall take a look at how we can more efficiently construct hierarchical clusterings for definitions of the distance between pairs of clusters that depend only upon the distances between each of their members.
References
[1] Making The Clade, www.thusspakeak.com, 2019.[2] Some Of These Things Are Not Like The Others, www.thusspakeak.com, 2015.
[3] In The Neighbourhood, www.thusspakeak.com, 2019.
Leave a comment