Adventures in Data Structures Vol 10: Seeing the Forest Through the Trees
Time for Trees
Specifically binary search trees!
The basic principle of a BST is pretty simple. It is composed of nodes and edges. The top node is called the root node and the bottom nodes are called leaf nodes. Every tree has a height and each node in that tree has a depth.
The height of the tree is dictated by the height of the root node i.e. it is the greatest number of edges of the root node from any of the leaf nodes.
The depth of a node is the number of edges it has from the root node. So the root node itself has a depth of 0, and at least one of the leaf nodes has a depth equal to the height of the tree.
Adding to a tree is simple enough. Removal however is a different story, and requires a bit of careful thought. Whenever we are trying to remove a node, we have to replace it with something, otherwise the entire subtree is lost. We have two options:
- Replace that node with the righmost child of the left child node
- or replace that node with the leftmost child of the right child node
Either option is basically guaranteeing that the value we replace the node with will be either the closest value that’s less than it or greater than it.
Implementing the Binary Search Tree
We are going to write the following functions for the BST:
- Add
- Contains
- leftMostChild (useful helper function)
- removeLeftmostChild (bye bye little guy)
- remove
The structs we will be using are:
We’ll go in order! Starting with add
:
This recursive function is a little complicated; let’s break it down!
- The base case is that the current node is null so we make a new node, set the fields, and return a pointer to that node
- If current is an actual node, then we have to cehck if the new node is supposed to be on the right or left of it.
- If it’s supposed to be on the left, make recursive call to
current->left
- If it’s supposed to be on the right, make a recursive call to
current->right
- If it’s supposed to be on the left, make recursive call to
With this in mind, the contains
function is a little more straightforward, although it does still use recursion.
Now we get to the remove
function.
We are going to implement this function with a bunch of helper functions!
The actual remove function will have a simple API:
We already have the contains
function written, but naturally we have to write _nodeRemoveBST
.
What will that function look like?
Recall that removing a node means replacing it with either the leftmost child of the right child node or the rightmost child of the left child node. Let’s assume for now that we will replace a node with the leftmost child of the right child node. So now we need a function to get the leftmost child of any given node, a function to remove the leftmost child, and then finally the function to remove any given node!
The _leftMostChild
returns the value of the leftmost child of the current node.
The _removeLeftMostChild
function returns a pointer to a node.
The _nodeRemoveBST
function is also recursive and drills down to the node that has the value we want to remove.
If that node does not have a right child, then we return the left node and delete the current node. If the node does have a right child, then we set the value of the node equal to the value of the leftmost child of the right child node. We then set the right child node equal to leftmost child of the right child node.
These recursive calls are pretty confusing at first.
You have to remember that in adding or removing, we want to basically keep the tree as it is save for the one node we want to change.
The key to remembering how and why they work is that the add
and remove
functions both return nodes so they actually allow you to rebuild the tree as it is until you get to the spot where you need to modify it.
Average Execution Times
We now have four ways to implement a bag ADT! We can see that using a BST bag creates the overall most efficient approach.
Dynamic Array | Linked List | Ordered Array | BST Bag | |
---|---|---|---|---|
Add | O(1) | O(1) | O(n) | O(log n) |
Contains | O(n) | O(n) | O(log n) | O(log n) |
Remove | O(n) | O(1) | O(n) | O(log n) |
BST Iterators
Tree traversals are an important concept when it comes to BSTs. There are three kinds of traversals that we will discuss: pre-order (when you pass the node on the left), in-order (from below), and post-order (on the right).
These traversals can be implemented recursively. For example, in-order traversal can be implemented like:
The process stack for this function represents the path to the leftmost unprocessed node. We can use this helpful fact to build a nice iterator for the BST!