Time for Trees

Specifically binary search trees!

The basic principle of a BST is pretty simple. It is composed of nodes and edges. The top node is called the root node and the bottom nodes are called leaf nodes. Every tree has a height and each node in that tree has a depth.

The height of the tree is dictated by the height of the root node i.e. it is the greatest number of edges of the root node from any of the leaf nodes.

The depth of a node is the number of edges it has from the root node. So the root node itself has a depth of 0, and at least one of the leaf nodes has a depth equal to the height of the tree.

Adding to a tree is simple enough. Removal however is a different story, and requires a bit of careful thought. Whenever we are trying to remove a node, we have to replace it with something, otherwise the entire subtree is lost. We have two options:

  • Replace that node with the righmost child of the left child node
  • or replace that node with the leftmost child of the right child node

Either option is basically guaranteeing that the value we replace the node with will be either the closest value that’s less than it or greater than it.

Implementing the Binary Search Tree

We are going to write the following functions for the BST:

  • Add
  • Contains
  • leftMostChild (useful helper function)
  • removeLeftmostChild (bye bye little guy)
  • remove

The structs we will be using are:

struct Node {
	TYPE value;
	struct Node* left;
	struct Node* right;
};

struct BinarySearchTree {
	struct Node* root;
	int size;
};

We’ll go in order! Starting with add:

struct Node* _nodeAddBST(struct Node* current, TYPE newValue)
{
	assert(newValue != NULL);

	if (current == NULL)
	{
		struct node* newNode = (struct node*)malloc(sizeof(struct node));
		assert(newNode != NULL);
		newNode->left = newNode->right = NULL);
		newNode->value = newValue;
		return newNode;
	}

	if (newValue < current->value)
	{
		current->left = _nodeAddBST(current->left, newValue);
	}
	else
	{
		current->right = _nodeAddBST(current->right, newValue);
	}
	
	return current;
}

This recursive function is a little complicated; let’s break it down!

  • The base case is that the current node is null so we make a new node, set the fields, and return a pointer to that node
  • If current is an actual node, then we have to cehck if the new node is supposed to be on the right or left of it.
    • If it’s supposed to be on the left, make recursive call to current->left
    • If it’s supposed to be on the right, make a recursive call to current->right

With this in mind, the contains function is a little more straightforward, although it does still use recursion.

int containsBST(struct binarySearchTree *tree, TYPE d)
{
	assert(tree != NULL);
	assert(d != NULL);

	struct Node* searchNode = tree->root;
	while (searchNode != NULL)
	{
		if (searchNode->value == d)
		{
			return 1;
		}
		else if (d < searchNode->value)
		{
			searchNode = searchNode->left;
		}
		else
		{
			searchNode = searchNode->right;
		}
	}
	return 0;
}

Now we get to the remove function. We are going to implement this function with a bunch of helper functions! The actual remove function will have a simple API:

void removeBST(struct BinarySearchTree* tree, TYPE d)
{
	if (containsBST(tree, d))
	{
		tree->root = _nodeRemoveBST(tree->root, d):
		tree->size--;
	}
}

We already have the contains function written, but naturally we have to write _nodeRemoveBST. What will that function look like?

Recall that removing a node means replacing it with either the leftmost child of the right child node or the rightmost child of the left child node. Let’s assume for now that we will replace a node with the leftmost child of the right child node. So now we need a function to get the leftmost child of any given node, a function to remove the leftmost child, and then finally the function to remove any given node!

TYPE _leftMostChild(struct Node* current)
{
	assert(current != NULL);

	while (current->left != NULL)
	{
		current = current->left;
	}
	return current->value;
}

The _leftMostChild returns the value of the leftmost child of the current node.

The _removeLeftMostChild function returns a pointer to a node.

struct Node* _removeLeftMostChild(struct Node* current)
{
	assert(current != NULL);

	if (current->left == NULL)
	{
		// if there is no left child, then we are already at the leftmost node
		// we return the right child (which could be NULL)
		struct Node* temp = current->right;
		free(current);
		return temp;
	}
	else
	{
		// if there is a left child, make a recursive call on the left child
		// we then return the current node
		current->left = _removeLeftMostChild(current->left);
		return current;
	}
}

The _nodeRemoveBST function is also recursive and drills down to the node that has the value we want to remove.

If that node does not have a right child, then we return the left node and delete the current node. If the node does have a right child, then we set the value of the node equal to the value of the leftmost child of the right child node. We then set the right child node equal to leftmost child of the right child node.

struct Node* _nodeRemoveBST(struct Node* current, TYPE d)
{
	assert(current != NULL);
	assert(d != NULL);

	if (current->value == d)
	{
		if (current->right == NULL)
		{
			struct Node* tempNode = current->left;
			free(current);
			return tempNode;
		}
		else
		{
			current->value = _leftMostChild(current->right);
			current->right = _removeLeftMostChild(current->right);
			return current;
		}
	}
	else if (d < current->value)
	{
		current->left = _nodeRemoveBST(current->left, d);
	}
	else
	{
		current->right = _nodeRemoveBST(current->right, d);
	}
}

These recursive calls are pretty confusing at first. You have to remember that in adding or removing, we want to basically keep the tree as it is save for the one node we want to change. The key to remembering how and why they work is that the add and remove functions both return nodes so they actually allow you to rebuild the tree as it is until you get to the spot where you need to modify it.

Average Execution Times

We now have four ways to implement a bag ADT! We can see that using a BST bag creates the overall most efficient approach.

  Dynamic Array Linked List Ordered Array BST Bag
Add O(1) O(1) O(n) O(log n)
Contains O(n) O(n) O(log n) O(log n)
Remove O(n) O(1) O(n) O(log n)

BST Iterators

Tree traversals are an important concept when it comes to BSTs. There are three kinds of traversals that we will discuss: pre-order (when you pass the node on the left), in-order (from below), and post-order (on the right).

These traversals can be implemented recursively. For example, in-order traversal can be implemented like:

void inorder(struct Node* node)
{
	if (node != NULL)
	{
		inorder(node->left);
		process(node->val);
		inorder(node->right);
	}
}

The process stack for this function represents the path to the leftmost unprocessed node. We can use this helpful fact to build a nice iterator for the BST!

struct BSTIterator 
{
	struct DynArr* stk;
	struct BSTree* tree;
};

void BSTIteratorInit(struct BSTree* tree, struct BSTIterator* itr)
{
	assert(tree != NULL);
	assert(itr != NULL);
	itr->tree = tree;
	itr->stack = createDynArr(INIT_CAPACITY);
}

void _slideLeft(struct Node* cur, struct BSTIterator* itr)
{
	while (current != NULL)
	{
		dynArrayPush(itr->stk, current);
		current = current->left;
	}
}

int BSTIteratorHasNext(struct BSTIterator* itr)
{
	if (dynArrayIsEmpty(itr->stk))
	{
		_slideLeft(itr->tree->root, itr);
	}
	else
	{
		TYPE topOfStack = dynArrayTop(itr->stk);
		dynArrayPop(itr->stk);
		_slideLeft(topOfStack->right, itr);
	}
	return !(dynArrayIsEmpty(itr->stk));
}

TYPE BSTIteratorNext(struct BSTIterator* itr)
{
	assert(itr != NULL);
	return dynArrayTop(itr->stk);
}