A Heap Of Stuff

| 0 Comments

A little over a year ago we added a bunch of basic computer science algorithms[1][2][3] to the ak library, taking inspiration from the C++ standard library. Now, JavaScript isn't just short of algorithms in its standard library, it's also lacking all but a few data structures. In fact, from the programmer's perspective, it only has hash maps which use a function to map strings to non-negative integers that are then used as indices into an array of sets of key-value pairs. Technically speaking, even arrays are hash maps of the string representation of their indices, although in practice the interpreter will use more efficient data structures whenever it can.
As clever as its implementation might be, it's unlikely that it will always figure out exactly what we're trying to do and pick the most appropriate data structure and so it will occasionally be worth explicitly implementing a data structure so that we can be certain of its performance.
The first such data structure that we're going to need is a min-heap which will allow us to efficiently add elements in any order and remove them in ascending order, according to some comparison function.

Figure 1: A Balanced Binary Tree
1
6 3
6 8 7
The basic idea is to arrange the data in a balanced binary tree in which each element has one or two child elements that rank greater than or equal to it with as few layers as possible, as illustrated by figure 1. These can be efficiently represented by arrays in which the \(i^\mathrm{th}\) element's children are located in the \(\left(2i+1\right)^\mathrm{th}\) and \(\left(2i+2\right)^\mathrm{th}\) positions. For example, the tree in figure 1 can be represented by an array a with the elements

i012345
a[i]163687

which is clearly the sequence of the elements of the tree as read from top to bottom and left to right.

The Removal Business

It's trivially the case that the smallest element in the min-heap is the first in its array. It's less obvious how we can remove it efficiently whilst preserving its tree's structure but fortunately there's a simple algorithm to do so. The trick is to replace it with the last element in the array, removing the latter after we've done so, and then bubble it down through the tree by repeatedly swapping it with the smaller of its children if it's larger than it.
For example, to remove the first element from the heap above we first copy the contents of position five into position zero and then clear position five to yield

i012345
a[i]76368

Figure 2: First Element Removed
3
6 7
6 8
Here the moved element is marked in bold and its first and second child elements are marked in red and green. The second is the smaller of them and is smaller than the parent and so we swap them to yield

i012345
a[i]36768

At this point the moved element doesn't have any children and so the algorithm terminates with the heap representing the tree in figure 2.
To remove the next smallest element we move the value in position four to position zero resulting in

i012345
a[i]8676

This time the element's first child is the smaller and, being smaller than it, we swap them to get

i012345
a[i]6876

Figure 3: Next Element Removed
6
6 7
8
Now the moved element has a single child that is smaller than it and so we swap them too to yield

i012345
a[i]6678

and the algorithm terminates with the tree given in figure 3.

Finally, since a binary tree of \(n\) elements has \(\lfloor\log_2 n\rfloor + 1\) layers, where the odd shaped brackets stand for the largest integer that is less than or equal to the term that they enclose and \(\log_2\) is the base two logarithm satisfying
\[ \begin{align*} \log_2 2^n &= n\\ 2^{\log_2 x} &= x \end{align*} \]
and in the worst case we must bubble the moved element down through all of them, removing the smallest element has logarithmic complexity.

New Additions

To add new elements to a min-heap we work in the opposite direction, first pushing them onto the end of the array and then bubbling them up through the tree by swapping them with their parents if they are smaller than they are. Note that the parent of the element in the \(i^\mathrm{th}\) position in the array can be found at the \(\big\lfloor\frac{i-1}2\big\rfloor^\mathrm{th}\) position.
For example, adding a two to our diminished min-heap initially puts its array in the state

i012345
a[i]66782

where the new element is marked in bold and its parent in red. Since the former is smaller than the latter we swap them to yield

i012345
a[i]62786

Figure 4: Added An Element
2
6 7
8 6
Once again the new element is smaller than its new parent and so we swap them to get

i012345
a[i]26786

after which there's nowhere left for the new element to go and so we stop with the min-heap representing the tree given in listing 4.
If we now add a five we have

i012345
a[i]267865

Figure 5: Added Another
2
6 5
8 6 7
which we must swap with its larger parent to yield the array

i012345
a[i]265867

Now the new element is larger than its parent and so the algorithm terminates with the min-heap representing the tree given in listing 5.

As with the removal of the smallest element this has a worst case of traversing every layer of the tree which means that it too has logarithmic complexity. Together they allow us to efficiently queue up elements and draw off the smallest of them at any given time.

A Heap Of Code

To make the use of min-heaps as simple as possible, it's worth implementing them so that they conform to JavaScript's conventions for containers, in so far as that's possible or makes sense, which effectively means following those of its Array type, as is done by the ak.minHeap type given in listing 1.

Listing 1: ak.minHeap
ak.MIN_HEAP_T = 'ak.minHeap';

function MinHeap(){}
MinHeap.prototype = {TYPE: ak.MIN_HEAP_T, valueOf: function(){return ak.NaN;}};

ak.minHeap = function() {
 var h = new MinHeap();
 var state = {a:[], compare:ak.alphaCompare};
 var arg0 = arguments[0];
 
 constructors[ak.nativeType(arg0)](state, arg0, arguments);
 initialise(state.a, state.compare);

 h.size = function() {return state.a.length;};
 h.at = function(i) {return state.a[i];};
 h.min = function() {return state.a[0];};
 h.max = function() {return state.a[maxIndex(state.a, state.compare)];};
 h.indexOf = function(x) {return state.a.indexOf(x);};
 h.lastIndexOf = function(x) {return state.a.lastIndexOf(x);};
 h.add = function(x) {return addElement(x, state.a, state.compare);};
 h.push = h.add;
 h.pop = function() {
  return removeIndex(maxIndex(state.a, state.compare), state.a, state.compare);
 };
 h.unshift = h.add;
 h.shift = function() {return removeIndex(0, state.a, state.compare);};
 h.remove = function(i) {return removeIndex(i, state.a, state.compare);};
 h.replace = function(i, x) {return replaceIndex(i, x, state.a, state.compare);};
 h.merge = function(a) {merge(a, state);};
 h.toArray = function() {return state.a.slice(0);};
 h.toString = function() {return state.a.toString();};
 h.compare = function() {return state.compare;};

 return Object.freeze(h);
};

var constructors = {};

As usual we're using size and at methods in place of the length property and indexing of JavaScript's Array objects. We're also duplicating its indexOf, lastIndexOf, push, pop, unshift and shift methods and will be implementing much, but not all, of the functionality of its splice method with the remove, replace and merge methods.
Before we get on to those though, we should look at how ak.minHeap is initialised.

Drop Them In A Heap

We're using our standard mechanism of dispatching to a constructors object according to the types of the arguments. Listing 2 implements the most typical forms of initialisation, allowing the user to replace the state object's default empty array with the contents of an array, to replace its default ak.alphaCompare comparison function or to leave them as their defaults.

Listing 2: Array And Function Initialisation
constructors[ak.ARRAY_T] = function(state, a, args) {
 var f = args[1];
 state.a = a.slice(0);
 constructors[ak.ARRAY_T][ak.nativeType(f)](state, f);
};

constructors[ak.ARRAY_T][ak.FUNCTION_T]  = function(state, f) {state.compare = f;};
constructors[ak.ARRAY_T][ak.UNDEFINED_T] = function() {};

constructors[ak.FUNCTION_T]  = constructors[ak.ARRAY_T][ak.FUNCTION_T];
constructors[ak.UNDEFINED_T] = constructors[ak.ARRAY_T][ak.UNDEFINED_T];

We shall also allow the user to initialise our min-heap with an object having a size property or method, a compare property or method and an at code method to retrieve the element at a given index, as shown by listing 3.

Listing 3: Object Initialisation
constructors[ak.OBJECT_T] = function(state, obj) {
 var n = obj.size;
 var f = obj.compare;
 var i;

 if(ak.nativeType(n)===ak.FUNCTION_T) n = n();
 if(ak.nativeType(n)!==ak.NUMBER_T || n<0 || n!==ak.floor(n) || !isFinite(n)) {
  throw new Error('invalid size on ak.minHeap');
 }

 if(ak.nativeType(f)===ak.FUNCTION_T) {
  try{if(ak.nativeType(f())===ak.FUNCTION_T) f = f();} catch(e){}
 }

 switch(ak.nativeType(f)) {
  case ak.FUNCTION_T:  state.compare = f; break;
  case ak.UNDEFINED_T: break;
  default: throw new Error('invalid compare in ak.minHeap');
 }

 state.a.length = n;
 for(i=0;i<n;++i) state.a[i] = obj.at(i);
};

Note that we've got a slightly tricky job in identifying whether that object's compare member is a property representing a comparison function or a method returning one since in both case it'll be a function. To find out which it is, the object constructor tries calling it with no arguments and assumes that it's the latter if it returns a function.

Once these constructors have finished the ak.minHeap function calls initialise with the state's array and comparator to put the former into the order required for a min-heap. We could simply add its elements to an empty array one at a time and bubble them up through the tree, preserving the min-heap structure at each step, and then replace its contents with those of that array. For \(n\) elements this would have a complexity of order \(n \log_2 n\), as shown by derivation 1.

Derivation 1: The Complexity Of Repeated Addition
Firstly, the \(k^\mathrm{th}\) layer of a min-heap's tree has \(2^{k-1}\) elements and filling it has a complexity of \((k-1) \times 2^{k-1}\).
Secondly, the layers before the \(k^\mathrm{th}\) contain \(2^{k-1}-1\) elements in total, adding each of which had a complexity less than \(k-1\).
This means that completely filling the first \(k\) layers of the tree has a complexity \(o_k\) satisfying
\[ \begin{align*} (k-1) \times 2^{k-1} < &o_k < 2 \times (k-1) \times 2^{k-1}\\ \tfrac12 (k-1) \times 2^k < &o_k < (k-1) \times 2^k \end{align*} \]
The complexity \(o^\prime_k\) of partially filling the \(k^\mathrm{th}\) layer of the tree must lie between the complexities of completely filling the \((k-1)^\mathrm{th}\) and the \(k^\mathrm{th}\), so that
\[ \begin{align*} \tfrac12(k-2) \times 2^{k-1} < &o^\prime_k < (k-1) \times 2^k\\ \tfrac14(k-2) \times 2^k < &o^\prime_k < (k-1) \times 2^k\\ \tfrac14 k \times 2^k - 2^{k-1} < &o^\prime_k < k \times 2^k - 2^k \end{align*} \]
and consequently \(o^\prime_k\) is of the order \(k \times 2^k\).
Finally, for \(n\) elements \(k\) is equal to \(\lfloor \log_2 n\rfloor+1\) and so adding them to the min-heap one at a time has a complexity of the order
\[ O_n = \left(\lfloor \log_2 n\rfloor+1\right) \times 2^{\lfloor \log_2 n\rfloor+1} \]
and, noting that
\[ \log_2 n - 1 < \lfloor \log_2 n\rfloor \leqslant \log_2 n \]
we have
\[ \begin{align*} \log_2 n \times 2^{\log_2 n} &< O_n \leqslant \left(\log_2 n + 1\right) \times 2^{\log_2 n + 1}\\ \log_2 n \times n &< O_n \leqslant \left(\log_2 n + 1\right) \times 2 n\\ n \times \log_2 n &< O_n \leqslant 2n \times \log_2 n + 2n \end{align*} \]
so that \(O_n\) is of the order \(n \times \log_2 n\) as claimed.

We can significantly improve the performance of initialising a min-heap by putting all of the elements in its array at once and then iterating backwards from parent of the last element to the first element bubbling them down through its tree[4] which, somewhat counter-intuitively, has a complexity of order \(n\) rather than \(n \log_2 n\), as proven by derivation 2.

Derivation 2: The Complexity Of Bulk Addition
Firstly, the number of elements in the \(k^\mathrm{th}\) layer of the tree is less than or equal to \(2^{k-1}\) and there are \(\lfloor\log_2 n\rfloor+1-k\) layers beneath it.
Secondly, if an element's children are the roots of a pair of min-heaps then making that element the root of a heap can be done with the same algorithm that we use to restore a min-heap's tree after we remove its first element, at a cost of order \(\lfloor\log_2 n\rfloor+1-k\).
The worst-case complexity of putting every element in the \(k^\mathrm{th}\) layer into a heap with respect to its children therefore has an order of
\[ 2^{k-1} \times \left(\lfloor\log_2 n\rfloor+1-k\right) \]
Noting that the elements in the lowest layer are always heaps with respect to their non-existent children, we need only perform this operation for the \(\lfloor\log_2 n\rfloor^\mathrm{th}\) layer to the first for a total cost in the order of
\[ \sum_{k=1}^{\lfloor \log_2 n\rfloor} 2^{k-1} \times \left(\lfloor\log_2 n\rfloor+1-k\right) \]
where \(\sum\) is the summation sign.
Defining \(h\) as \(\lfloor\log_2 n\rfloor + 1 - k\) yields
\[ \sum_{h=1}^{\lfloor\log_2 n\rfloor} 2^{\lfloor\log_2 n\rfloor-h} \times h = 2^{\lfloor\log_2 n\rfloor} \times \sum_{h=1}^{\lfloor\log_2 n\rfloor} \frac{h}{2^h} \leqslant n \times \sum_{h=1}^{\lfloor\log_2 n\rfloor} \frac{h}{2^h} < n \times \sum_{h=1}^\infty \frac{h}{2^h} \]
Next, we have
\[ \begin{align*} \sum_{h=1}^\infty \frac{h}{2^h} &= \frac12 + \frac24 + \frac38 + \frac{4}{16} + \dots = \frac12 + \frac14 + \frac14 + \frac18 + \frac18 + \frac18 + \frac{1}{16} + \frac{1}{16} + \frac{1}{16} + \frac{1}{16} + \dots\\ &= \left(\frac12+\frac14+\frac18+\frac{1}{16}+\dots\right) + \left(\frac14+\frac18+\frac{1}{16}+\dots\right) + \left(\frac18+\frac{1}{16}+\dots\right) + \left(\frac{1}{16}+\dots\right)\\ &= \sum_{i=1}^\infty \frac{1}{2^i} +\frac12 \sum_{i=1}^\infty \frac{1}{2^i} +\frac14 \sum_{i=1}^\infty \frac{1}{2^i} +\frac18 \sum_{i=1}^\infty \frac{1}{2^i} +\dots\\ &= 1+\frac12+\frac14+\frac18+\dots = 1 + \sum_{j=1}^\infty \frac{1}{2^j} = 2 \end{align*} \]
since the identical infinite sums are geometric series that converge to one.
Finally, we must at least iterate over every element in the \(\left(\lfloor \log_2 n\rfloor + 1 - 2\right)^\mathrm{th}\) layer, which must be full and have
\[ 2^{\lfloor \log_2 n\rfloor - 2} > 2^{\log_2 n - 3} = \tfrac18 n \]
elements and so the order of complexity of bulk addition must be \(n\).

Tidy Up That Heap

The fundamental min-heap operations are bubbling elements up and down through its tree and so we'll need to implement them before we do anything else and listing 4 implements the former.

Listing 4: Bubbling Up
function bubbleUp(i, a, compare) {
 var p = i%2===0 ? i/2-1 : (i-1)/2;
 var x;

 while(i>0 && compare(a[i], a[p])<0) {
  x = a[i]; a[i] = a[p]; a[p] = x;
  i = p; p = i%2===0 ? i/2-1 : (i-1)/2;
 }
}

Note that we're calculating the index of the parent of the \(i^\mathrm{th}\) element by exploiting the fact that
\[ \Bigg\lfloor\frac{i-1}2\Bigg\rfloor = \begin{cases} \frac{i}2-1 & i \;\text{is even}\\ \frac{i-1}2 & i \;\text{is odd} \end{cases} \]
This detail aside, it simply swaps the \(i^\mathrm{th}\) element with its parent if it compares smaller than it and then sets the index to that of its parent and updates its parent's index in the same way if so.

Listing 5 bubbles an element down through the tree by determining the index of the greatest value of it and its children and then swapping it with the element at that index if it's not its own, iteratively repeating the process until it's no greater than its children or doesn't have any.

Listing 5: Bubbling Down
function bubbleDown(i, a, compare) {
 var n = a.length;
 var l = i*2+1;
 var r = l+1;
 var c = i;
 var x;

 if(l<n && compare(a[c], a[l])>0) c = l;
 if(r<n && compare(a[c], a[r])>0) c = r;

 while(c!==i) {
  x = a[i]; a[i] = a[c]; a[c] = x;
  i = c; l = i*2+1; r = l+1;
  if(l<n && compare(a[c], a[l])>0) c = l;
  if(r<n && compare(a[c], a[r])>0) c = r;
 }
}

If we don't know which way to bubble an element then we can simply compare it to its parent and bubble up if it's smaller than it or down if it's greater or is at the root of the tree.

Listing 6: Bubbling Both Ways
function bubble(i, a, compare) {
 var p = i%2===0 ? i/2-1 : (i-1)/2;
 var c = p>=0 ? compare(a[i], a[p]) : 1;
 var x;

 if(c<0) {
  x = a[i]; a[i] = a[p]; a[p] = x;
  if(p>0) bubbleUp(p, a, compare);
 }
 else if(c!==0) {
  bubbleDown(i, a, compare);
 }
}

Note that if we need to bubble up then we make the first swap before continuing from the parent's index since we know that we'll need to and it saves us a step.

Listing 7 shows how we use the bubbleDown function to implement the efficient bulk initialisation of our min-heap.

Listing 7: Building The Min-Heap
function initialise(a, compare) {
 var n = a.length-1;
 var p;

 if(n>0) {
  p = n%2===0 ? n/2-1 : (n-1)/2;
  while(p>=0) bubbleDown(p--, a, compare);
 }
}

Similarly, listing 8 shows how we use the bubbleUp function to implement adding an element to it by pushing it onto the end of its array and then bubbling it up through the tree.

Listing 8: Adding An Element
function addElement(x, a, compare) {
 a.push(x);
 bubbleUp(a.length-1, a, compare);
 return a.length;
}

This is used by the add, push and unshift methods of the ak.minHeap type to add elements to it.

It's convenient to be able to remove any element from the min-heap rather than just its smallest and so the removeIndex function given in listing 9 does so by replacing the element at a given index with the last element and then using the bubble function to move it up or down the tree as appropriate.

Listing 9: Removing An Element
function removeIndex(i, a, compare) {
 var xi = a[i];
 if(i>=0 && i<a.length) {
  if(i===a.length-1) --a.length;
  else {
   a[i] = a.pop();
   bubble(i, a, compare);
  }
 }
 return xi;
}

Returning the value of the removed element means that it conforms to the convention of JavaScript's array's pop and shift methods.

We can replace an element at any given position in the heap in much the same way, as demonstrated by listing 10.

Listing 10: Replacing An Element
function replaceIndex(i, x, a, compare) {
 var xi = a[i];
 if(i>=0 && i<a.length) {
  a[i] = x;
  bubble(i, a, compare);
 }
 else addElement(x, a, compare);
 return xi;
}

Here we're assuming that replacing a non-existent element means adding the replacement.

Now reading the smallest element of an ak.minHeap simply requires returning its first element and shifting means removing it, as you can see are being done by the min and shift methods in listing 1, but reading or popping the largest element require that we first locate it. We can at least be sure that it cannot have a parent and so only need to search from the last element to the one before its parent, as is done by the maxIndex function given in listing 11.

Listing 11: Finding The Greatest Element
function maxIndex(a, compare) {
 var n = a.length-1;
 var p, max;

 if(n<2) return n;

 p = n%2===0 ? n/2-1 : (n-1)/2;
 max = n;
 while(--n>p) if(compare(a[n],a[max])>0) max = n;
 return max;
}

Looking back to listing 1, you will see that this is being used to read the greatest element in the min-heap with the max method and to remove it with the pop method.

Finally, the merge function given in listing 12 allows us to add elements to an ak.minHeap in bulk by pushing them onto the end of its array and then calling initialise.

Listing 12: Merging Elements
function merge(a, state) {
 var t = ak.nativeType(a);
 var n0 = state.a.length;
 var n1, at, i;

 if(t===ak.UNDEFINED_T) {
  n1 = 0;
 }
 else if(t===ak.ARRAY_T) {
  n1 = a.length;
  at = function(i){return a[i];};
 }
 else {
  n1 = a.size();
  at = a.at;
 }

 if(n1>0) {
  n0 = state.a.length;
  state.a.length = n0+n1;
  for(i=0;i<n1;++i) state.a[n0+i] = at(i);
  initialise(state.a, state.compare);
 }
}

Note that we're allowing the new elements to be contained in a JavaScript array or an object that supports the ak library's size and at methods which means that we can merge two ak.minHeap objects, albeit using the comparison function of the one being merged into.

Using ak.minHeap

Program 1 demonstrates the bulk addition of elements to our ak.minHeap type followed by shifting them out of it in increasing order, as determined by the ak.numberCompare function.

Program 1: Shifting The Elements

You can see for yourself that the min-heap structure is preserved throughout.
Program 2 shows how we can use it as a priority queue in which elements are added at random whilst being removed in order of importance, with one element being judged more important than another if it compares smaller than it.

Program 2: A Priority Queue

By replacing the first element with the new one at each step, rather than slicing the former and adding the latter, we can perform the read and addition operations in one shot, halving the amount of work required.

With these examples showing that our ak.minHeap behaves as expected we're finished with its implementation, which you can find in MinHeap.js.

References

[1] Were All Sorted From A To Z, www.thusspakeak.com, 2017.

[2] I Still Haven't Found What I'm Looking For, www.thusspakeak.com, 2017.

[3] Let's Talk About Sets, www.thusspakeak.com, 2018.

[4] Hayward, R. & McDiarmid, C., Average Case Analysis of Heap Building by Repeated Insertion, Journal of Algorithms, Vol. 12, 1991.

Leave a comment