## Nearest And Dearest

Last time we saw how we could use the list of the nearest neighbours of a datum in a set of clustered data to calculate its strengths of association with their clusters. Specifically, we used the k nearest neighbours algorithm to define those strengths as the proportions of its k nearest neighbours that were members of each cluster or with a generalisation of it that assigned weights to the neighbours according to their positions in the list.
This time we shall take a look at a clustering algorithm that uses nearest neighbours to identify clusters, contrasting it with the k means clustering algorithm that we covered about four years ago.

Full text...

## In The Neighbourhood

A little under four years ago we saw how we could use the k means algorithm to divide sets of data into distinct subsets, known as clusters, whose members are in some sense similar to each other. The interesting thing about clustering is that even though we find it easy to spot clusters, at least in two dimensions, it's incredibly difficult to give them a firm mathematical definition and so clustering algorithms invariably define them implicitly as the subsets identified by this algorithm.
The k means algorithm, for example, does so by first picking k different elements of the data as cluster representatives and then places every element in the cluster whose representative is nearest to it. The cluster representatives are then replaced by the means of the elements assign to it and the process is repeated iteratively until the clusters stop changing.
Now I'd like to introduce some more clustering algorithms but there are a few things that we'll need first.

Full text...

## A Heap Of Stuff

A little over a year ago we added a bunch of basic computer science algorithms to the ak library, taking inspiration from the C++ standard library. Now, JavaScript isn't just short of algorithms in its standard library, it's also lacking all but a few data structures. In fact, from the programmer's perspective, it only has hash maps which use a function to map strings to non-negative integers that are then used as indices into an array of sets of key-value pairs. Technically speaking, even arrays are hash maps of the string representation of their indices, although in practice the interpreter will use more efficient data structures whenever it can.
As clever as its implementation might be, it's unlikely that it will always figure out exactly what we're trying to do and pick the most appropriate data structure and so it will occasionally be worth explicitly implementing a data structure so that we can be certain of its performance.
The first such data structure that we're going to need is a min-heap which will allow us to efficiently add elements in any order and remove them in ascending order, according to some comparison function.

Full text...

## Archimedean Crew

We have recently seen how we can define dependencies between random variables with Archimedean copulas which calculate the probability that they each fall below given values by applying a generator function φ to the results of their cumulative distribution functions, or CDFs, for those values, and applying its inverse to their sum.
Like all copulas they are effectively the CDFs of vector valued random variables whose elements are uniformly distributed when considered independently. Whilst those Archimedean CDFs were relatively trivial to implement, we found that their probability density functions, or PDFs, were somewhat more difficult and that the random variables themselves required some not at all obvious mathematical manipulation to get right.
Having done all the hard work implementing the ak.archimedeanCopula, ak.archimedeanCopulaDensity and ak.archimedeanCopulaRnd functions we shall now use them to implement some specific families of Archimedean copulas.

Full text...

## Archimedean Review

In the last couple of posts we've been taking a look at Archimedean copulas which define the dependency between the elements of vector values of a multivariate random variable by applying a generator function φ to the values of the cumulative distribution functions, or CDFs, of their distributions when considered independently, known as their marginal distributions, and applying the inverse of the generator to the sum of the results to yield the value of the multivariate CDF.
We have seen that the densities of Archimedean copulas are rather trickier to calculate and that making random observations of them is trickier still. Last time we found an algorithm for the latter, albeit with an implementation that had troubling performance and numerical stability issues, and in this post we shall add an improved version to the ak library that addresses those issues.

Full text...

## Archimedean View

Last time we took a look at how we could define copulas to represent the dependency between random variables by summing the results of a generator function φ applied to the results of their cumulative distribution functions, or CDFs, and then applying the inverse of that function φ-1 to that sum.
These are known as Archimedean copulas and are valid whenever φ is strictly decreasing over the interval [0,1], equal to zero when its argument equals one and have nth derivatives that are non-negative over that interval when n is even and non-positive when it is odd, for n up to the number of random variables.
Whilst such copulas are relatively easy to implement we saw that their densities are a rather trickier job, in contrast to Gaussian copulas where the reverse is true. In this post we shall see how to draw random vectors from Archimedean copulas which is also much more difficult than doing so from Gaussian copulas.

Full text...

## Archimedean Skew

About a year and a half ago we saw how we could use Gaussian copulas to define dependencies between the elements of a vector valued multivariate random variable whose elements, when considered in isolation, were governed by arbitrary cumulative distribution functions, known as marginals. Whilst Gaussian copulas are quite flexible, they can't represent every possible dependency between those elements and in this post we shall take a look at some others defined by the Archimedean family of copulas.

Full text...

## New Directions Of Interpolation

We have spent a few months looking at how we might interpolate between sets of points (xi, yi), where the xi are known as nodes and the yi as values, to approximate values of y for values of x between the nodes, either by connecting them with straight lines or with cubic curves.
Last time, in preparation for interpolating between multidimensional vector nodes, we implemented the ak.grid type to store ticks on a set of axes and map their intersections to ak.vector objects to represent such nodes arranged at the corners of hyperdimensional rectangular cuboids.
With this in place we're ready to take a look at one of the simplest multidimensional interpolation schemes; multilinear interpolation.

Full text...

## Cuboid Space Division

Over the last few months we have been taking a look at algorithms for interpolating over a set of points (xi,yi) in order to approximate values of y between the nodes xi. We began with linear interpolation which connects the points with straight lines and is perhaps the simplest interpolation algorithm. Then we moved on to cubic spline interpolation which yields a smooth curve by specifying gradients at the nodes and fitting cubic polynomials between them that match both their values and their gradients. Next we saw how this could result in curves that change from increasing to decreasing, or vice versa, between the nodes and how we could fix this problem by adjusting those gradients.
I concluded by noting that, even with this improvement, the shape of a cubic spline interpolation is governed by choices that are not uniquely determined by the points themselves and that linear interpolation is consequently a more mathematically appropriate scheme, which is why I chose to generalise it to other arithmetic types for y, like complex numbers or matrices, but not to similarly generalise cubic spline interpolation.

The obvious next question is whether or not we can also generalise the nodes to other arithmetic types; in particular to vectors so that we can interpolate between nodes in more than one dimension.

Full text...

## We're Not For Turning

We have seen how it is possible to smoothly interpolate between a set of points (xi, yi), with the xi known as nodes and the yi as values, by specifying the gradients gi at the nodes and calculating values between adjacent pairs using the uniquely defined cubic polynomials that match the values and gradients at them.
We have also seen how extrapolating such polynomials beyond the first and last nodes can yield less than satisfactory results, which we fixed by specifying the first and last gradients and then adding new first and last nodes to ensure that the first and last polynomials would represent straight lines.
Now we shall see how cubic spline interpolation can break down rather more dramatically and how we might fix it.

Full text...

### Gallimaufry

 AKCalc ECMA Endarkenment Turning Sixteen

This site requires HTML5, CSS 2.1 and JavaScript 5 and has been tested with

 Chrome 26+ Firefox 20+ Internet Explorer 9+