# Adventures in Data Structures Vol 8: Heaps of Fun

So a great update! Aced the data structures exam :)

Had a great week off and now it’s back to work.
Today, I wanted to go over **heaps**.
Heaps are basically a type of binary tree that preserve **heap order**, that is, each node’s value is less than or equal to the value of its child nodes.
Consequently, they are *partially ordered*, but not completely.

Later on, we will see that this property of *heap order* is very useful for implementing the priority queue ADT, which is essentially a collection designed to make it easy to find the element with the highest priority.

We can implement a heap using our friend the *dynamic array*.
Recall that we represent trees in a dynamic array by the following:

- Children of node $i$ are found in positions $2i + 1$ and $2i + 2$
- The parent of node $i$ is found in position $(i - 1)/2$

The two main operations we are concerned with are `addition`

and `removal`

.
In both operations, we want to **first** preserve completeness first and **afterwards** fix heap order.

- When adding values to a heap, we first
*add it to the end*of the array to preserve completeness, and then*percolate up*to preserve heap order. - When removing values, we remove the root and then replace that value with the last filled position, thereby preserving completeness of the tree. We then fix heap order by
*percolating down*to the appropritae node.

For both these operations, we will need to employ functions which compare the value with its parent node; if it is smaller, we swap the two values.
In writing the API functions for the heap, we can make use of helpful dynamic array functions that we wrote earlier, including the `swap`

function we wrote for the dynamic array!

# C implementation of Heap

Here are some useful functions!

The `getMin`

and `removeMin`

functions are pretty straightforward, following the pseudocode:

Adding a value to the heap can be done using this function:

Preserving the heap order can be done using this handy `adjustHeap`

function:

In the first `if`

condition, the node has two children.
We get the index of the smallest child using `indexSmallest`

and then compare the smallest child to the value at the position `pos`

.
If needed, we can swap those values and call `adjustHeap`

again recursively.

Similarly, for the second `if`

condition, we only have one child node and simply compare that child to the parent.
We can do the same type of checking and if necessary, swap and call `adjustHeap`

recursively.

# Priority Queue Implementation

Recall that a priority queue has three operations:

`void add(newValue)`

`TYPE getMin()`

`void removeMin()`

We have 4 (2 of which are technically the same, just in opposite order) implementations for the priority queue, including the heap. Here are the execution times for these implementations:

Sorted Array | Reverse Sorted Array | Sorted List | Heap | |
---|---|---|---|---|

Add | O(n) | O(n) | O(n) | O(log n) |

getMin | O(1) | O(1) | O(1) | O(1) |

removeMin | O(n) | O(1) | O(1) | O(log n) |

We can see that are tradeoffs between the heap and the other implementations. How do we decide which implementation fits our applications needs?

Well, here’s an example. Consider a case where you have to insert and then remove n elements.

For the sorted vector and list, we need to have $n$ insertions that are $n$ execution time apiece and $n$ removals that are constant time, then we have $n^2 + n$ operations to perform, or in other words, $O(n^2)$.

For the heap, if we have $n$ insertions that are $log(n)$ time apiece and then $n$ removals that are also $log(n)$ time apiece, then our total time is $2nlog(n)$ or in other words, $O(nlogn)$. So in that reasonable scenario, the heap does better than the sorted vector or list.

# BuildHeap and HeapSort

The last topics of heaps that I wanted to touch on was the idea of building a heap from an arbitrary vector and using a heap to sort values.

The pseudocode for the buildHeap process is:

- Put the values in a complete binary tree.
- Calculate first non-leaf node
*not*guaranteed to be a proper heap at (size/2 - 1) and call this value i - _adjustHeap(heap, max, i)
- Decrement the index i until we get to the root node

The end product of this will be a heap.

The pseudocode for the heapSort is:

- Call buildHeap on the data
- Swap the first and last values
- adjust heap from index 0 to the last position (note, this doesn’t include the last position)
- Repeat steps 2 and 3, but in decrement the last position until you get to the root

The end product of this process will be an array in reverse sorted array. Heap sort has some advantages! For starters, it sorts in-place so there is no need to allocate additional memory. Both buildHeap and heapSort are $O(n logn)$, so heapSort is comparable to quicksort, merge sort, etc.