More about trees…

Before we get into what a self-balancing tree is, let’s actually talk about why would it be good to to have one haha

Turns out there are some cons to a BST (or at least some things to watch out for).

It’s very important to the execution performance of the BST that it is roughly balanced. This keeps the operations ~O(log n) which is proportional to the number of edges from the root to a leaf node. An unbalanced BST could lead to slow performance, similar to a linked list. This can happen when values are inserted in sorted order.

A complete binary tree is a BST that is filled at all levels except for the last which is filled from left to right. If the tree is complete, then $log n$ bounds the longest path. However, maintaining completeness is very costly, so it’s really not worth it in terms of performance.

Instead, it’s easier to maintain height balanced trees. This means that for each node, the height difference between the left and right subtrees is at most 1. Some important things to remember:

  • Null nodes have a height of -1
  • Leaf nodes have a height of 0
  • Mathematically, the longest path of a height balanced tree is at worst 44% longer than log n so the operations in a height balanced tree is still O(log n)!

But how do we maintain balance?

The AVL tree maintains height balancing through a series of node rotations.

This is sometimes straightforward but most times not, and some operations require not only single rotations but double rotations as well.

Luckily this class was nice enough to give us not only some nice drawings, but also the full balancing pseudocode!

For a node that is unbalanced:
	If left child is tallest by more than one:
		If left child is heavy on the right side:
			Rotate the left child to the left
		Rotate top node to the right
	Else if right child is the tallest by more than one:
		If right child is heavy on the left side:
			Rotate the right child to the right
		Rorate top node to the left

Implementation of AVL Trees

The AVL node will be based on the following struct:

struct AVLnode {
	TYPE value;
	struct AVLnode* left;
	struct AVLnode* right;
	int height;
};

We will need some helper functions to calculate the heights of nodes for us:

int _h(struct AVLnode* current)
{
	if (current == NULL)
	{
		return -1;
	}
	return current->height;
}

void _setHeight(struct AVLnode* current)
{
	int lch = _h(current->left);
	int rch = _h(current->right);
	
	if (lch < rch)
	{
		current->height = 1 + rch;
	}
	else
	{
		current->height = 1 + lch;
	}
}

Adding the 1 in the _setHeight function is important since the heights technically start at -1.

There is a lot of code to implement the AVL Tree, but the important thing to remember is that these are basically just binary search trees that need to be rebalanced.

struct AVLnode* _AVLnodeAdd(struct AVLnode* current, TYPE newValue)
{
	struct AVLnode* newNode;

	if (current == NULL)
	{
		newNode = (struct AVLnode*)malloc(sizeof(struct AVLnode));
		assert(newNode != NULL);
		newNode->value = newValue;
		newNode->left = newNode->right = NULL;
		return newNode;
		// note that we don't need to call balance here since the tree is empty!
	}
	else if (LT(newValue, current->value))
	{
		current->left = _AVLnodeAdd(current->left, newValue);
	}
	else
	{
		current->right = _AVLnodeAdd(current->right, newValue);
	}

	return _balance(current);
}

Let’s define balance factor to be the height difference between the right and left child subtrees.

int _bf(struct AVLnode* current)
{
	return _h(current->right) - _h(current->left);
}

If _bf returns a positive value, then that means the current node is right heavy. if it returns a negative value, then the current node is left heavy.

struct AVLnode* _balance(struct AVLnode* current)
{
	int cbf = _bf(current);
	if (cbf < -1)
	{
		if (_bf(current->left) > 0)
		{
			//double rotation needed
			current->left = _rotateLeft(current->left);
		}
		return _rotateRight(current);
	}
	else if (cbf > 1)
	{
		if (_bf(current->right) < 0)
		{
			//double rotation needed
			current->right = _rotateRight(current->right);
		}
		return _rotateLeft(current);
	}
	_setHeight(current);
	return current;
}

Now we have to write the functions to perform the rotations on a given subtree.

struct AVLnode* _rotateLeft(struct AVLnode* current)
{
	struct AVLnode* newTop = current->right;
	current->right = newTop->left;
	newTop->left = current;
	_setHeight(current);
	_setHeight(newTop);
	return newTop;
}

struct AVLnode* _rotateRight(struct AVLnode* current)
{
	struct AVLnode* newTop = current->left;
	current->left = newTop->right;
	newTop->right = current;
	_setHeight(current);
	_setHeight(newTop);
	return newTop;
}

The last function we have to implement is our remove function, but much like the regular BST implementation, we need some helper functions for this one. Luckily, we have a lot of the functions already written from BST, we just need to modify them slightly to incorporate rebalancing.

The user facing remove function assumes that we have a struct constructed called AVLTree which has a root node and a count field.

void removeAVLTree(struct AVLTree* tree, TYPE val)
{
	if (containsAVLTree(tree, val))
	{
		tree->root = _removeNode(tree->root, val);
		tree->cnt--;
	}
}

The _leftMost and _removeLeftMost functions are pretty much the same as their prior BST counterparts. Here they are again, for convenience.

TYPE _leftMost(struct AVLnode* current)
{
	assert(current != NULL);
	while (current->left != NULL)
	{
		current = current->left;
	}
	return current->val;
}

struct AVLnode* _removeLeftMost(struct AVLnode* current)
{
	assert(current != NULL);
	struct AVLnode* temp;
	
	if (current->left == NULL)
	{
		temp = current->right;
		free(current);
		return temp;
	}
	else
	{
		current->left = _removeLeftMost(current->left);
		return _balance(current);
	}
}

The last function we need to write is the _removeNode helper function itself~

struct AVLnode* _removeNode(struct AVLnode* current, TYPE val)
{
	assert(current != NULL);
	assert(val != NULL);
	struct AVLnode* temp;

	if (current->val == val)
	{
		if (current->right == NULL)
		{
			temp = current->left;
			free(current);
			return temp;
		}
		else
		{
			current->val = _leftMost(current->right);
			current->right = _removeLeftMost(current->right);
		}
	}
	else if (val < cur->val)
	{
		current->left = _removeNode(current->left, val);
	}
	else
	{
		current->right = _removeNode(current->right, val);
	}
}

Tree Sort

We can sort an array $A$ using an AVL tree! The basic idea is that we:

  1. Copy each element from A into an AVL tree
  2. Copy each element from the AVL tree back into the array

Following this pseudocode, the treeSort function is pretty straightforward, but we do need to write a recursive helper function.

void treeSort(TYPE* data, int n)
{
	AVLTree tree;
	int i = 0;
	int count = 0;

	AVLTreeInit(&tree);

	for (i = 0; i < n; i++)
	{
		_AVLnodeAdd(&tree, dadta[i]);
	}

	_treeSortHelper(tree->root, data, &count);
}

void _treeSortHelper(AVLnode* current, TYPE* data, int* count)
{
	if (current != NULL)
	{
		_treeSortHelper(current->left, data, count);
		data[*count] = current->val;
		*count = *count + 1;
		_treeSortHelper(current->right, data, count);
	}
}