We're All Sorted From A To Z

| 0 Comments

Something that I miss when programming in JavaScript is the wide variety of array manipulation functions available in my primary language, C++. We have, in fact, already implemented one of them with ak.shuffle which randomly rearranges the elements of an array[1]. We shall be needing another one of them in the not too distant future and so I have decided to take a short break from numerical computing to add those of them that I use the most frequently to the ak library, starting with a selection of sorting operations.

Nothing Compares 2 U

By default, JavaScript sorts the elements of arrays in increasing alphabetical order when represented as strings, with individual letters ranked according to their character codes such as are returned from String.charCodeAt. For the sake of consistency we shall do the same and so the first thing that we need is a function to compare elements in this way, satisfying the JavaScript's requirements for a comparator by returning a negative value if the first argument comes before the second, a positive one if it comes after it and zero if they are equal, as given in listing 1.

Listing 1: ak.alphaCompare
ak.alphaCompare = function(l, r) {
 l = String(l);
 r = String(r);
 return l<r ? -1 : l===r ? 0 : 1;
};

The reasons that this uses comparison operators on the string representations rather than explicitly comparing their character codes are because, firstly, it's simpler and, secondly, my expectation is that the inbuilt comparison operators will be significantly faster than looping through the strings in JavaScript, to the extent that we can afford to call the former twice when the first argument compares greater than the second.

To sort an array a numerically rather than alphabetically the typical advice, at least as far as I have seen, is to use

a.sort(function(l,r){return l-r;});

since this will be negative if l is less than r, positive is it is greater than it and zero if they are equal. Unfortunately this isn't quite true since both
\[ \infty - \infty \]
and
\[ (-\infty) - (-\infty) \]
result in NaN, or not a number, rather than zero. To correctly compare numbers we must take care to handle these corner cases, as is done by ak.numberCompare in listing 2.

Listing 2: ak.numberCompare
ak.numberCompare = function(l, r) {
 l = Number(l);
 r = Number(r);
 return l===r ? 0 : l-r;
};

Note that if either argument is NaN then this function will still return NaN, indicating that they are incomparable. Unfortunately this plays merry hell with sorting algorithms, which require that values are comparable, and so it's worth having a numerical comparison operator that treats NaNs as such. The ak.floatCompare function given in listing 3 does so by ranking NaNs greater than any number and equal to each other.

Listing 3: ak.floatCompare
function nanCompare(l, r) {
 return !isNaN(l) ? -1 : !isNaN(r) ? 1 : 0
}

ak.floatCompare = function(l, r) {
 var d;
 l = Number(l);
 r = Number(r);
 return l===r ? 0 : !isNaN(d=l-r) ? d : nanCompare(l, r);
};

Program 1 demonstrates how sorting with ak.floatCompare is more consistent than doing so with ak.numberCompare when faced with NaNs.

Program 1: Numeric Versus Float Comparison

If you run it several times you will find that, when sorting with ak.numberCompare, the elements before the NaN will be sorted, as will the elements after it, but that they won't always be sorted with respect to each other. In contrast, when sorting with ak.floatCompare, the NaN is always placed at the end of the array and all of the other elements are correctly sorted in increasing order.
Incidentally, all three of these comparison functions can be found in Compare.js.

Whilst we're on the subject of comparison functions we might as well add one to lexicographically compare arrays, or ranges of elements within them, by which we treat their elements as if they were characters in a string and rank them alphabetically. To specify ranges we shall pass a start and end position and, just as we did in ak.shuffle, we shall follow the Array.slice conventions of defining the end of a range or array as the position immediately following its last element, treating negative indices as reading back from the end of the array and truncating any positions falling before its start or after its end to zero and that end respectively.
Now, we're going to need to make such transformations to indices every time we want an algorithm to operate upon a range of elements and so it's worth creating a dedicated function to do so with ak.arrayIndex given in listing 4 and defined in ArrayIndex.js

Listing 4: ak.arrayIndex
ak.arrayIndex = function(a, i, caller) {
 var n;

 if(ak.nativeType(a)!==ak.ARRAY_T) {
  if(ak.nativeType(caller)!==ak.STRING_T) caller = 'ak.arrayIndex';
  throw new Error('invalid array in ' + caller);
 }

 if(ak.nativeType(i)===ak.UNDEFINED_T) return i;

 if(i!==ak.floor(i)) {
  if(ak.nativeType(caller)!==ak.STRING_T) caller = 'ak.arrayIndex';
  throw new Error('invalid index in ' + caller);
 }

 n = a.length;
 if(i<=-n)    i = 0;
 else if(i<0) i += n;
 else if(i>n) i = n;

 return i;
};

Note that the optional caller argument allows us to specify which algorithm is using this function should we need to throw an exception to indicate that the array is invalid or that the index is not an integer. Furthermore, if the index is undefined then we return it as is, deferring to the caller to decide what to do with it.
The ak.lexicographicalCompare function given in listing 5 uses it to transform the starts and ends of the ranges, setting them to the starts and ends of their respective arrays if they are undefined.

Listing 5: ak.lexicographicalCompare
ak.lexicographicalCompare = function(a1, a2, compare, start1, end1, start2, end2) {
 var c1, c2;

 if(ak.nativeType(compare)===ak.UNDEFINED_T) {
  compare = ak.alphaCompare;
 }
 else if(ak.nativeType(compare)!==ak.FUNCTION_T) {
  throw new Error('invalid comparator in ak.lexicographicalCompare');
 }

 start1 = ak.arrayIndex(a1, start1, 'ak.lexicographicalCompare');
 end1 = ak.arrayIndex(a1, end1, 'ak.lexicographicalCompare');

 start2 = ak.arrayIndex(a2, start2, 'ak.lexicographicalCompare');
 end2 = ak.arrayIndex(a2, end2, 'ak.lexicographicalCompare');

 if(ak.nativeType(start1)===ak.UNDEFINED_T) start1 = 0;
 if(ak.nativeType(end1)===ak.UNDEFINED_T) end1 = a1.length;

 if(ak.nativeType(start2)===ak.UNDEFINED_T) start2 = 0;
 if(ak.nativeType(end2)===ak.UNDEFINED_T) end2 = a2.length;

 if(start1>end1 || start2>end2) return ak.NaN;

 c1 = c2 = 0;
 while(c1===0 && start1<end1 && start2<end2) {
  c1 = c2 = compare(a1[start1++], a2[start2++]);
 }
 while(!c1 && start1<end1 && start2<end2) {
  c1 = compare(a1[start1++], a2[start2++]);
 }
 if(c1) return c1;

 if(start1<end1) c2 = 1;
 else if(start2<end2) c2 = -1;
 return c2;
};

Note that we're also ensuring that compare must be a function if supplied and defaulting it to ak.alphaCompare if not. Having ensured that the arguments are valid, provided appropriate defaults and/or transformed the starts and ends of the ranges being compared, we're finally ready to implement the lexicographical comparison itself.

The first step is to return NaN if either range's start position is greater than its end position. Now you might argue that throwing an exception would be a more appropriate response and, to be perfectly honest, I'd be inclined to agree. However, Array.slice doesn't throw one when it's given such ranges and I'm very keen to follow JavaScript's conventions for the ak library's array manipulation functions so, since we're already using NaN to mean incomparable, this seemed to me to be the most consistent behaviour.
The next step is to increment the start position of both array's ranges for as long as their elements compare equal and they are before their respective end positions, storing the result of the last comparison in both c1 and c2. You may be a little surprised that this is followed by another, almost identical, loop in which the test for the elements' equality is replaced by one for their not being unequal. The reason for this is to handle incomparable elements, which we do by exploiting the fact that NaN will be interpreted as false in the test and by leaving c2 as NaN should it have been set to it in the first loop.
Finally, if c1 is a non-zero number then it must show how the first unequal comparable elements compare to each other and consequently represents how the arrays compare lexicographically, and so we return it and are done. If not, then if the first array's range has not been fully traversed then has some elements after those of the second's and so comes after it which we indicate by setting c2 to one. Similarly, if the second range hasn't been then the first range must come before it and so we set c2 to minus one.
The upshot of all of this is that incomparable elements are ignored by ak.lexicographicalCompare unless the two ranges are the same length, every comparable pair of their elements compare equal and there is at least one pair that are incomparable, in which case it returns NaN rather than zero.

Program 2 demonstrates ak.lexicographicalCompare, which can be found in LexicographicalCompare.js, by applying it to two randomly shuffled copies of a short array.

Program 2: Using ak.lexicographicalCompare

Good Partition

A fundamental sorting operation is partitioning the elements of an array so that all of them that have some particular property will appear before all of them that don't. To do this we simply need to search for the first element that doesn't and then swap it with the first after it that does, incrementing the former's position to reflect the fact that it will then do so and repeating the process until there are no elements left to test.
The ak._unsafePartition helper function given in listing 6 implements this algorithm, returning the position of the first element that doesn't satisfy the boolean predicate upon which we're partitioning, the end of the range if no such value exists and minus one if the range is invalid.

Listing 6: ak._unsafePartition
ak._unsafePartition = function(a, pred, start, end) {
 var mid, t;
 while(start<end && pred(a[start])) ++start;
 mid = start;
 while(++mid<end) {
  if(pred(a[mid])) {
   t = a[start];
   a[start++] = a[mid];
   a[mid] = t;
  }
 }
 return start<=end ? start : -1;
};

The reason that this has been added to the ak library rather than left as a local function is because we shall need to use it again for other sorting functions. In C++, functions with leading underscores are pretty much reserved for the compiler vendor and they serve as a warning that they shouldn't be called directly by the programmer, so I decided to name this function with one for the same reason. I was subsequently pleasantly surprised to discover that they're conventionally used in JavaScript programs to indicate that functions and variables are private. The unsafe prefix is there to hammer home the warning that this function should not be called directly.
The ak.partition function given in listing 7 provides the public interface to the algorithm, checking the validity of the arguments, providing defaults for those that are not provided and mapping negative positions to ones within the array with the ak.arrayIndex function again.

Listing 7: ak.partition
ak.partition = function(a, pred, start, end) {
 if(ak.nativeType(pred)!==ak.FUNCTION_T) {
  throw new Error('invalid predicate in ak.partition');
 }

 start = ak.arrayIndex(a, start, 'ak.partition');
 end = ak.arrayIndex(a, end, 'ak.partition');

 if(ak.nativeType(start)===ak.UNDEFINED_T) start = 0;
 if(ak.nativeType(end)===ak.UNDEFINED_T) end = a.length;

 return ak._unsafePartition(a, pred, start, end);
};

Program 3 demonstrates ak.partition, as found in Partition.js, by using it to partition a shuffled array of the integers from one to sixteen on randomly generated values between zero and seventeen, using a vertical bar to highlight the position dividing those elements that are less than it and those that are greater than or equal to it.

Program 3: Using ak.partition

By recursively partitioning an array we can sort it with
function sort(a, compare, start, end) { var x, pred, i; if(start<end-1) { x = a[start]; pred = function(ai){return compare(x, ai)<0;}; i = ak.partition(a, pred, start, end); if(i===start) ++i; sort(a, compare, start, i); sort(a, compare, i, end); } }

as demonstrated by program 4.

Program 4: Sorting By Partitioning

The only really tricky part here is making sure that we increment the partition point i if it's equal to the start of the range since that means that the start is already in its sorted position and we would infinitely recurse if we didn't.
This is, an admittedly sub-optimal, implementation of the quicksort algorithm which is pretty much the first choice for sorting data. My expectation is that JavaScript's inbuilt sort will be much more efficient than this and so the ak library's sorting algorithm slices out the range to be sorted, sorts it and splices it back in, as shown by listing 8.

Listing 8: ak.sort
ak._unsafeSort = function(a, compare, start, end) {
 if(start===0 && end===a.length) {
  a.sort(compare);
 }
 else if(start<end-1) {
  Array.prototype.splice.apply(
   a, [start, end-start].concat(a.slice(start, end).sort(compare))
  );
 }
};

ak.sort = function(a, compare, start, end) {
 if(ak.nativeType(compare)===ak.UNDEFINED_T) {
  compare = ak.alphaCompare;
 }
 else if(ak.nativeType(compare)!==ak.FUNCTION_T) {
  throw new Error('invalid comparator in ak.sort');
 }

 start = ak.arrayIndex(a, start, 'ak.sort');
 end = ak.arrayIndex(a, end, 'ak.sort');

 if(ak.nativeType(start)===ak.UNDEFINED_T) start = 0;
 if(ak.nativeType(end)===ak.UNDEFINED_T) end = a.length;

 ak._unsafeSort(a, compare, start, end);
};

Of course this means that it uses more memory than would usually be the case, but I'm comfortable with that. The implementation can be found in Sort.js by the way.

The Nth Element

Another thing that we can use ak._unsafePartition for is to search for the \(n^\mathrm{th}\) ranking element of an array, placing it in the \(n^\mathrm{th}\) position and ensuring that no elements that comes before it compares greater and that no element that comes after it compares lesser. To do so we simply partition it on a value taken from the elements that we're searching, known as the pivot, and if the partition point is the \(n^\mathrm{th}\) position return its value, otherwise repeat the operation for either the elements after it or those before it, depending on whether it comes before or after the \(n^\mathrm{th}\), as is done by ak._unsafeNthElement in listing 9.

Listing 9: ak._unsafeNthElement
ak._unsafeNthElement = function(a, mid, compare, start, end) {
 var v, p;

 if(start>end || mid===end) return undefined;

 while(start<end-1) {
  p = pivot(a, compare, start, end);
  v = a[p]; a[p] = a[end-1]; a[end-1] = v;
  p = ak._unsafePartition(a, function(x){return compare(x, v)<0;}, start, end-1);
  v = a[p]; a[p] = a[end-1]; a[end-1] = v;

  if(p===mid)    start = end;
  else if(p<mid) start = p+1;
  else           end = p;
 }
 return a[mid];
};

Here, the mid argument is the \(n^\mathrm{th}\) position in the range rather than \(n\) itself and so we don't add it to the start.

Note that we ensure that the pivot value ends up at the partition point by first swapping it with the last element in the range, then partitioning it and finally swapping it back into the partition point. This means that we don't need to include the last element in the partition since it's guaranteed to be on the correct side. Furthermore, if the partition occurs before the \(n^\mathrm{th}\) position we can be continue the search from the first element after it on the next step.
In the crude implementation of quicksort we simply pivoted on the first value in the range, which will be rather inefficient if it's already mostly sorted since most steps will leave the elements in the same order as they were before. What we really want to do is partition the range into two roughly equal sized parts which will happen if the pivot is close the middle of the finally sorted values, or in other words its median. A not too unreasonable approximation of this is the median of the values at the start, middle and end of the range which we can find by sorting them and taking the middle value.
To sort just three values we can't really do much better than bubblesort which we might implement with
var t; if(a1<a0) {t=a0; a0=a1; a1=t;} if(a2<a1) {t=a1; a1=a2; a2=t;} if(a1<a0) {t=a0; a0=a1; a1=t;}

Now we don't actually need the values to end up sorted when searching for their median and so we can slightly reduce the number of operations by nesting the comparisons, as shown by listing 10.

Listing 10: The pivot Function
function pivot(a, compare, start, end) {
 var mid = ak.floor((start+end)/2);
 var a0 = a[start];
 var a1 = a[mid];
 var a2 = a[end-1];
 if(!(compare(a0,a1)>0)) {
  if(!(compare(a1,a2)>0)) return mid;
  if(!(compare(a0,a2)>0)) return end-1;
  return start;
 }
 if(!(compare(a0,a2)>0)) return start;
 if(!(compare(a2,a1)>0)) return mid;
 return end-1;
}

All that's left to do is provide a public function that checks the validity of the arguments provided by the user and sets default values for those that aren't, as done by ak.nthElement in listing 11.

Listing 11: ak.nthElement
ak.nthElement = function(a, mid, compare, start, end) {
 if(ak.nativeType(compare)===ak.UNDEFINED_T) {
  compare = ak.alphaCompare;
 }
 else if(ak.nativeType(compare)!==ak.FUNCTION_T) {
  throw new Error('invalid comparator in ak.nthElement');
 }

 mid = ak.arrayIndex(a, mid, 'ak.nthElement');

 if(ak.nativeType(mid)===ak.UNDEFINED_T) {
  throw new Error('invalid mid in ak.nthElement');
 }

 start = ak.arrayIndex(a, start, 'ak.nthElement');
 end = ak.arrayIndex(a, end, 'ak.nthElement');

 if(ak.nativeType(start)===ak.UNDEFINED_T) start = 0;
 if(ak.nativeType(end)===ak.UNDEFINED_T) end = a.length;

 if(mid<start)    mid = start;
 else if(mid>end) mid = end;

 return ak._unsafeNthElement(a, mid, compare, start, end);
};

Program 5 demonstrates ak.nthElement, which is implemented in NthElement.js, by selecting a randomly ranked element of a randomly generated array, showing its full contents both before and after the function is called.

Program 5: Using ak.nthElement

Another useful sorting operation is to find each of the \(n\) lowest ranked elements in an array, which we can do using both ak._unsafeNthElement and ak._unsafeSort. Specifically, we simply put the \((n-1)^\mathrm{th}\) element into its correctly sorted position with the former and sort all of the elements before it with the latter, as done by ak.partialSort in listing 12.

Listing 12: ak.partialSort
ak.partialSort = function(a, mid, compare, start, end) {
 if(ak.nativeType(compare)===ak.UNDEFINED_T) {
  compare = ak.alphaCompare;
 }
 else if(ak.nativeType(compare)!==ak.FUNCTION_T) {
  throw new Error('invalid comparator in ak.partialSort');
 }

 mid = ak.arrayIndex(a, mid, 'ak.partialSort');

 if(ak.nativeType(mid)===ak.UNDEFINED_T) {
  throw new Error('invalid mid in ak.partialSort');
 }

 start = ak.arrayIndex(a, start, 'ak.partialSort');
 end = ak.arrayIndex(a, end, 'ak.partialSort');

 if(ak.nativeType(start)===ak.UNDEFINED_T) start = 0;
 if(ak.nativeType(end)===ak.UNDEFINED_T) end = a.length;

 if(mid<start)     mid = start;
 else if (mid>end) mid = end;

 ak._unsafeNthElement(a, mid-1, compare, start, end);
 ak._unsafeSort(a, compare, start, mid-1);
};

This time mid is the end of the partially sorted range or, in other words, the first position after it. Program 6 demonstrates ak.partialSort, which can be found in PartialSort.js, by using it to partially sort some randomly generated arrays up to a randomly chosen end position, which is marked in the results by a vertical bar.

Program 6: Using ak.partialSort

That's all that I have to say regarding sorting operations, but next time we shall take a look at some query-like operations that we can perform on data once it's been sorted.

References

[1] Halton, Here's A Pseud, www.thusspakeak.com, 2016.

Leave a comment