The Middle Way

| 0 Comments

A few years ago we spent some time implementing a number of the sorting[1], searching[2] and set manipulation[3] algorithms from the standard C++ library in JavaScript. Since the latter doesn't support the former's abstraction of container access via iterators we were compelled to restrict ourselves to using native Array objects following the conventions of its methods, such as slice and sort.
In this post we shall take a look at an algorithm for finding the centrally ranked element, or median, of an array, which is strongly related to the ak.nthElement function, and then at a particular use for it.

The Central Issue

For example, the array
\[ \begin{pmatrix} 4 & 1 & 2 & 6 & 5 \end{pmatrix} \]
when sorted by increasing magnitude of its elements yields
\[ \begin{pmatrix} 1 & 2 & 4 & 5 & 6 \end{pmatrix} \]
The central element is consequently
\[ \begin{pmatrix} 1 & 2 & |4| & 5 & 6 \end{pmatrix} \]
meaning that the median is four.

Things are a little more complex if the array has an even number of elements. For example
\[ \begin{pmatrix} 4 & 1 & 2 & 6 & 5 & 3 \end{pmatrix} \]
when sorted yields
\[ \begin{pmatrix} 1 & 2 & 3 & 4 & 5 & 6 \end{pmatrix} \]
having two central elements
\[ \begin{pmatrix} 1 & 2 & |3 & 4| & 5 & 6 \end{pmatrix} \]
In such cases we can define the median as the mean of the pair of elements. In this example giving a value of three and a half which has three fewer and three greater elements.

Unfortunately this only works if the elements of the array are numbers. To cope with non-numeric elements we shall instead define the median as a pair of elements, both being identical if there are an odd number in the array, as is done by the ak.median function defined in listing 1.

Listing 1: ak.median
ak.median = function(a, compare, start, end) {
 var i, k, mid, m0, m1;

 if(ak.nativeType(compare)===ak.UNDEFINED_T) {
  compare = ak.alphaCompare;
 }
 else if(ak.nativeType(compare)!==ak.FUNCTION_T) {
  throw new Error('invalid comparator in ak.median');
 }

 start = ak.arrayIndex(a, start, 'ak.median');
 end = ak.arrayIndex(a, end, 'ak.median');

 if(ak.nativeType(start)===ak.UNDEFINED_T) start = 0;
 if(ak.nativeType(end)===ak.UNDEFINED_T) end = a.length;

 k = end - start;
 if(!(k>0)) return [];

 if(k%2===1) {
  mid = start+(k-1)/2;
  m0 = ak._unsafeNthElement(a, mid, compare, start, end);
  return [m0, m0];
 }
 mid = start+k/2;
 m1 = ak._unsafeNthElement(a, mid, compare, start, end);
 m0 = a[start];
 for(i=start+1;i<mid;++i) if(compare(a[i], m0)>0) m0 = a[i];
 return [m0, m1];
};

Here, like JavaScript's array operators, we're defaulting to an alphanumeric comparison if one isn't specified and are treating negative values of the start and end bounds of the algorithm as offsets from the end of the array, using ak.arrayIndex.
If the number of elements is odd then the median is that element that ranks at half that number minus one, otherwise the second of the pair is at half that number and we must set the first to the greatest element that is smaller than it according to compare. In both cases we use the ak._unsafeNthElement function to find the appropriate \(n^\mathrm{th}\) element since we've already confirmed that the arguments are valid.
Note that we can be sure that the first of the pair in the case of an even number of elements must have a smaller index than the second since ak._unsafeNthElement partitions the elements into those less than or greater than the \(n^\mathrm{th}\).
Unfortunately, if there are multiple central elements that rank equal then the results will arbitrarily depend upon the implementation details of the sorting operation, but the same is true of the \(n^\mathrm{th}\) element algorithm and so we must simply accept the fact.

Program 1 demonstrates the use of ak.median which is defined in Median.js.

Program 1: Using ak.median

Neighbour In The Middle

We have also seen how we can approximate a function from a sample of evaluations by smoothing[4], a process that interpolates between them by taking a weighted average of those samples' values according to the distance from the point that's being approximated, the further away each is from it, the less weight they are given.

An alternative to using a weighted average over the entire sample is to use an average of the nearest points to that which we wish to approximate, and for the purposes of this post that means the median. Specifically we might approximate the value of a function at a point \(x\) with the median of its \(k\) nearest neighbours, as is done by the ak.medianSmooth function defined in listing 2.

Listing 2: Median Approximation
ak.medianSmooth = function() {
 var state = {nodes:[], k:undefined};
 var arg0 = arguments[0];
 var nodes, neighbours, f;

 constructors[ak.nativeType(arg0)](state, arg0, arguments);
 neighbours = new Array(state.k);

 if(ak.nativeType(state.nodes[0].x)===ak.NUMBER_T) {
  nodes = copyNodes(state.nodes);
  nodes.sort(function(x0, x1){return x0.x-x1.x;});
  f = function(x) {
   return ak._unsafeUniMedianSmooth(x, nodes, neighbours);
  };
 }
 else {
  f = function(x) {
   return ak._unsafeMultiMedianSmooth(x, state.nodes, neighbours);
  };
 }
 f.nodes = function() {return copyNodes(state.nodes);};
 f.k = function() {return state.k;};
 return Object.freeze(f);
};

var constructors = {};

The first thing to note about this function is how it is initialised by dispatching to a constructors object. Specifically we shall support both a set of nodes and values or a pair of sets of nodes and associated values together with the number of nearest neighbours.
Listing 3 shows how we distinguish between these two options.

Listing 3: Dispatching To The Appropriate Initialiser
constructors[ak.ARRAY_T] = function(state, arg0, args) {
 var arg1 = args[1];
 constructors[ak.ARRAY_T][ak.nativeType(arg1)](state, arg0, arg1, args);
};

constructors[ak.ARRAY_T][ak.ARRAY_T] = function(state, x, y, args) {
 var arg2 = args[2];
 constructors[ak.ARRAY_T][ak.ARRAY_T][ak.nativeType(arg2)](state, x, y, arg2);
};

Listing 4 shows how we shall do the latter, ensuring that the nodes and their values are of equal size and that the number of nearest neighbours, \(k\), is valid.

Listing 4: Initialising With Separate Nodes And Values
constructors[ak.ARRAY_T][ak.ARRAY_T][ak.NUMBER_T] = function(state, x, y, k) {
 var n = x.length;
 var i, xi, yi;

 if(y.length!==n) throw new Error('size mismatch in ak.medianSmooth');

 if(k<1 || k!==ak.floor(k) || k>n) {
  throw new Error('invalid window in ak.medianSmooth');
 }
 state.k = k;

 state.nodes.length = n;
 for(i=0;i<n;++i) {
  xi = x[i];
  yi = Number(y[i]);
  if(!isFinite(yi)) {
   throw new Error('invalid node value in ak.medianSmooth');
  }
  state.nodes[i] = {x:xi, y:yi};
 }
};

Similarly, listing 5 shows how me can initialise the function with arrays containing objects representing both the nodes \(x\) and their associated values \(y\).

Listing 5: Initialising With Combined Nodes And Values
constructors[ak.ARRAY_T][ak.NUMBER_T] = function(state, nodes, k) {
 var n = nodes.length;
 var i, x, y;

 if(k<1 || k!==ak.floor(k) || k>n) {
  throw new Error('invalid window in ak.medianSmooth');
 }
 state.k = k;

 state.nodes.length = n;

 for(i=0;i<n;++i) {
  x = nodes[i].x; if(ak.nativeType(x)===ak.FUNCTION_T) x = x();
  y = nodes[i].y; if(ak.nativeType(y)===ak.FUNCTION_T) y = y();
  y = Number(y);
  if(!isFinite(y)) {
   throw new Error('invalid node value in ak.medianSmooth');
  }
  state.nodes[i] = {x:x, y:y};
 }
};

Before we get on to the actual calculations we should note that requiring that the nodes and values cannot be changed via the nodes accessor means that we must make a copy of them, as is done by listing 6.

Listing 6: Copying The Nodes
function copyNodes(nodes) {
 var n = nodes.length;
 var copy = new Array(n);
 var i;

 for(i=0;i<n;++i) copy[i] = {x:nodes[i].x, y:nodes[i].y};
 return copy;
}

If the nodes are numbers then we copy them and sort them according to compare allowing us to quickly identify the point's \(k\) nearest neighbours by finding the position that it would be placed in using ak._unsafeLowerBound and then iterating above and below to populate the neighbours array, taking care to stay within their limits, as shown by listing 7.

Listing 7: Univariate Nearest Neighbours
ak._unsafeUniMedianSmooth = function(x, nodes, neighbours) {
 var n = nodes.length;
 var k = neighbours.length;
 var compare_x = function(x0, x){return x0.x-x;};
 var pos = ak._unsafeLowerBound(nodes, x, compare_x, 0, n);
 var i0 = pos-1;
 var i1 = pos;
 var j = 0;
 var median;

 while(j!==k) {
  if(i0>=0 && (i1===n || x-nodes[i0].x<nodes[i1].x-x)) {
   neighbours[j++] = nodes[i0--];
  }
  else {
   neighbours[j++] = nodes[i1++];
  }
 }
 median = ak.median(neighbours, function(x0, x1){return x0.y-x1.y;});
 return 0.5*(median[0].y+median[1].y);
};

Here i0 and i1 are the first and last of the \(k\) nearest neighbours and the condition inside the loop ensures that the former doesn't fall below the start of the nodes, that the latter doesn't fall above its end and, if neither of these are met, selects between the two according to which is the closest to the point in question.
Finally, we return the average of the values of the upper and lower values of the median of the neighbours.

If the nodes are not numbers then we can't use this trick and must instead search through all of them for the \(k\) nearest neighbours. To do so we first populate the neighbours array with the first \(k\) nodes, sort them and then iteratively replace those that are further from the point in each step inserting the nearer node at the index at which its distance ranks and removing the furthest, as shown by listing 8.

Listing 8: Multivariate Nearest Neighbours
ak._unsafeMultiMedianSmooth = function(x, nodes, neighbours) {
 var n = nodes.length;
 var k = neighbours.length;
 var compare_x = function(x0, x1){return ak.dist(x0.x, x)-ak.dist(x1.x, x);};
 var i = 0;
 var dk1, di, j, median;

 while(i<k) {neighbours[i] = nodes[i]; ++i;}
 neighbours.sort(compare_x);
 dk1 = ak.dist(neighbours[k-1].x, x);

 compare_x = function(xj, d){return ak.dist(xj.x, x)-d;};

 while(i<n) {
  di = ak.dist(nodes[i].x, x);
  if(di<dk1) {
   j = ak._unsafeLowerBound(neighbours, di, compare_x, 0, k);
   neighbours.splice(j, 0, nodes[i]);
   neighbours.pop();
   dk1 = ak.dist(neighbours[k-1].x, x);
  }
  ++i;
 }
 median = ak.median(neighbours, function(x0, x1){return x0.y-x1.y;});
 return 0.5*(median[0].y+median[1].y);
};

Program 2 demonstrates the use of ak.medianSmooth which can be found in MedianSmooth.js, plotting the function being approximated in green, the noisy samples in grey and the results in red.

Program 2: Median "Smoothing"

Whilst the results are not particularly smooth they are still reasonable approximations, the name of the function being a reflection of its algorithmic similarity to smoothers.

References

[1] We're All Sorted From A To Z, www.thusspakeak.com, 2017.

[2] I Still Haven't Found What I'm Looking For, www.thusspakeak.com, 2017.

[3] Let's Talk About Sets, www.thusspakeak.com, 2018.

[4] Smooth Operator, www.thusspakeak.com, 2021.

Leave a comment