39  Tree Data Structure

39.1 Background

When we first begin to learn about data structures, we often encounter linear structures such as arrays, linked lists, stacks, and queues. These structures, as the term ‘linear’ suggests, store data in a sequential manner (Figure 39.1). This makes them great for situations where the data can be naturally expressed as a sequence, for instance, a to-do list or a queue at a coffee shop.

G 1 1 2 2 1->2 3 3 2->3 4 4 3->4 5 5 4->5
Figure 39.1: A representation of a linked list structure, highlighting its linear nature.

Let’s consider the linked list. Each node in the linked list has a reference to the next node, creating a chain of elements. This works well when the data in the structure has a natural sequential or linear relationship. However, as we delve deeper into more complex data problems, we begin to see that not all data fits neatly into a linear relationship.

Consider an organizational hierarchy within a company. Each person (except the CEO) reports to exactly one person. However, each manager can have multiple direct reports. In this case, each node (person) might need to reference more than one node (Figure 39.2). Using a linked list to represent this data would be cumbersome, as it would not capture the multiple relationships that exist between a manager and their direct reports.

G CEO CEO Manager1 Manager1 CEO->Manager1 Manager2 Manager2 CEO->Manager2 Employee1 Employee1 Manager1->Employee1 Employee2 Employee2 Manager1->Employee2 Employee3 Employee3 Manager2->Employee3 Employee4 Employee4 Manager2->Employee4
Figure 39.2: An example of an organizational hierarchy within a company, illustrating the non-linear relationships that exist.

Another example is a file system on your computer. Each folder can contain multiple files or other folders. But, each file or folder is contained within exactly one other folder. Once again, we see the need for a data structure that can capture these multiple relationships (Figure 39.3).

G Root Root Folder1 Folder1 Root->Folder1 File1 File1 Root->File1 Subfolder1 Subfolder1 Folder1->Subfolder1 File2 File2 Folder1->File2 File3 File3 Subfolder1->File3 File4 File4 Subfolder1->File4
Figure 39.3: A representation of a simple file system, showing how folders can contain multiple other folders or files.

These examples bring to light a common pattern: there are situations where our data is not linear, but hierarchical. Hierarchies exist in various forms in the real world, from biological classifications to the layout of web pages, and modeling these hierarchies accurately in our programs allows us to reflect the reality more truthfully.

Enter the concept of Trees.

A tree in computer science is a hierarchical data structure that can model these kinds of relationships (Figure 39.4). This structure gets its name from a real-world tree, but with a twist - the tree data structure is often visualized with the root at the top and the leaves at the bottom, resembling an upside-down real-world tree.

G Root Root Child1 Child1 Root->Child1 Child2 Child2 Root->Child2 Child3 Child3 Root->Child3 Grandchild1 Grandchild1 Child1->Grandchild1 Grandchild2 Grandchild2 Child1->Grandchild2 Grandchild3 Grandchild3 Child3->Grandchild3
Figure 39.4: An illustration of a simple tree data structure, with key components labeled.

The tree structure is a fundamental shift from our previously understood data structures. It can open new doors to solving problems and provide an efficient way to organize and model complex relationships in our data.

In the following sections, we’ll take a deeper dive into the world of trees, understanding their properties, variations, and how they can revolutionize the way we think about hierarchical data relationships.

39.2 Introduction to Trees

After establishing the need for a data structure that can handle more complex relationships, we can now formally introduce the concept of trees.

A tree in computer science is an abstract data type that simulates a hierarchical structure with a set of interconnected nodes. The key characteristic that separates trees from other data structures is that no node in the tree may be connected to more than one parent node, thereby preventing any possibility of a loop in the data structure.

Let’s get familiar with some fundamental tree terminologies (see Figure 39.5):

  • Node: A unit or location in the tree where data is stored. It’s similar to a link in a linked list or an element in an array.
  • Root: The topmost node in the tree that doesn’t have a parent.
  • Edge: The connection between two nodes.
  • Parent: A node which has one or more child nodes.
  • Child: A node which has a parent node.
  • Leaf: A node which does not have any child nodes.
  • Subtree: A tree consisting of a node and its descendants.
  • Degree: The number of children of a node.
  • Level: The distance from the root, measured in edges.
  • Height: The distance from the node to its furthest leaf.
  • Depth: The distance from the node to the root.
G A Root (A) Node B Child (B) Node Level: 1 Depth: 1 Parent of: F, G A->B C Child (C) Node Degree: 2 Parent of: D, E A->C F Leaf (F) Node Depth: 2 B->F G Leaf (G) Node Depth: 2 B->G D Leaf (D) Node Depth: 2 C->D E Leaf (E) Node Depth: 2 C->E
Figure 39.5: A simple tree illustrating various tree terminologies. Node A is the root, nodes B and C are children of A. Node C, having two children, demonstrates the degree of a node. Nodes D, E, F, G are leaf nodes. The subtree consists of Node C and its descendants (D, E). The level is demonstrated by the level of Node B (Level 1). The depth of Node D is 2 (from Root A). The height of the tree is 2 (distance from the Root A to furthest leaf).

The concept of trees becomes truly interesting when we delve into the various types of trees. Different types of trees can be used to solve different types of problems, and it’s crucial to choose the right type for a given problem.

The simplest form of a tree is a General Tree, where each node can have any number of child nodes. An example of where a general tree might be useful is in the representation of a file system on a computer. In this system, each folder can contain any number of files or folders. We can illustrate this with a diagram (Figure 39.6).

G Root Root Folder1 Folder1 Root->Folder1 File1 File1 Root->File1 Subfolder1 Subfolder1 Folder1->Subfolder1 File2 File2 Folder1->File2 File3 File3 Subfolder1->File3 File4 File4 Subfolder1->File4 File5 File5 Subfolder1->File5
Figure 39.6: An example of a general tree representing a file system, where each node can have any number of child nodes.

However, there are also more specialized types of trees, like Binary Trees, where each node can have at most two children, commonly referred to as left and right child. The binary tree can be visualized in the form of a family tree where each parent (node) can have at most two children (Figure 39.7).

G Parent Parent Child1 Child1 Parent->Child1 Child2 Child2 Parent->Child2
Figure 39.7: A binary tree representing a family tree, where each parent can have at most two children.

A further specialization is the Binary Search Tree (BST), where the nodes are arranged in a manner such that for every node, nodes in the left subtree have values less than the node, and nodes in the right subtree have values greater than the node. This property makes BSTs useful for search operations, such as finding a book in a library system, where books are organized based on their identifying information (Figure 39.8).

G 50 50 30 30 50->30 70 70 50->70 20 20 30->20 40 40 30->40 60 60 70->60 80 80 70->80
Figure 39.8: A binary search tree (BST) used in a library system for organizing books. The key property of BST is that nodes in the left subtree have values less than the parent node, and nodes in the right subtree have values greater.

There are also self-balancing trees like AVL Trees and Red-Black Trees that maintain their balance as new nodes are inserted or existing nodes are removed, which ensures that the tree remains efficient for operations like insertion, deletion, and search.

Another common type of tree is the B-tree, used in databases and filesystems. In a B-tree, each node can have more than two children, unlike in a binary tree. This structure allows for efficient access and modification of large blocks of data, which makes B-trees particularly suitable for use in disk-based storage systems, such as databases (Figure 39.9).

G Node1 Node1 Node2 Node2 Node1--Node2 Node3 Node3 Node1--Node3 Node4 Node4 Node1--Node4 Leaf1 Leaf1 Node2--Leaf1 Leaf2 Leaf2 Node2--Leaf2 Leaf3 Leaf3 Node3--Leaf3 Leaf4 Leaf4 Node3--Leaf4 Leaf5 Leaf5 Node3--Leaf5 Leaf6 Leaf6 Node4--Leaf6 Leaf7 Leaf7 Node4--Leaf7
Figure 39.9: A representation of a B-tree used in a database. Each node can have more than two children, providing efficient access and modification of large data blocks.

Understanding these types of trees and how they work is crucial to using them effectively. In the next sections, we will delve deeper into some of these tree types and explore their properties and uses.

39.3 Binary Tree

After understanding the basic concept of trees, let’s now focus on a special kind of tree, the Binary Tree. As we have mentioned earlier, in a binary tree, a node can have at most two children - commonly referred to as the left child and the right child. This property leads to some interesting and useful characteristics that we’ll explore in this section.

The structure of a binary tree is relatively straightforward. Each node in a binary tree contains some data, a reference to the left child node, and a reference to the right child node. If a node has no left or right child, the corresponding reference is set to null.

In a binary tree, each level has twice as many nodes as the previous level, creating a rapidly expanding structure. This property enables binary trees to store large amounts of data in a relatively small number of levels, which can lead to efficient search and insertion operations, given the tree is balanced.

Let’s take a look at some different types of binary trees:

  • Full Binary Tree: In this tree, every node has either 0 or 2 children. This type of tree is useful when you know that every internal node in your tree will have exactly two children. For example, in certain types of decision trees used in machine learning, each decision leads to two subsequent decisions, forming a full binary tree.
G A A B B A->B C C A->C D D B->D E E B->E
Figure 39.10: A Full Binary Tree. Every node has either 0 or 2 children.
  • Complete Binary Tree: Every level, except possibly the last, is fully filled, and all nodes are as far left as possible. Heaps are an example of complete binary trees. In a heap, all levels are fully filled except for the last level, which is filled from left to right.
G A A B B A->B C C A->C D D B->D E E B->E F F C->F
Figure 39.11: A Complete Binary Tree. Every level is fully filled except for the last, which is filled from left to right.
  • Perfect Binary Tree: A perfect binary tree is both full and complete. Every level is fully filled. This kind of tree appears in some specialized data structures or algorithms, but is not common in general use.
G A A B B A->B C C A->C D D B->D E E B->E F F C->F G G C->G
Figure 39.12: A Perfect Binary Tree. Every level is fully filled.
  • Balanced Binary Tree: The difference in height of left and right subtrees for every node is not more than k (mostly 1). Most of the tree algorithms require the tree to be balanced to maintain their optimal time complexity.
G A A B B A->B C C A->C D D B->D E E B->E F F C->F
Figure 39.13: A Balanced Binary Tree. The difference in height of left and right subtrees for every node is not more than 1.
  • Degenerate (or pathological) Tree: Each parent node has only one child. The tree behaves like a linked list, and we lose the advantages of binary trees.
G A A B B A->B C C B->C D D C->D E E D->E
Figure 39.14: A Degenerate (or pathological) Tree. Each parent node has only one child.

39.3.1 Implementing a Binary Tree

In this section, we will be implementing a simple Binary Tree structure using Java. We’ll start with the definition of the Node class.

/**
 * This class represents a node in the binary tree.
 * @param <T> This is the type parameter which represents the type of value the node will hold.
 */
public class Node<T> {
    T value;
    Node<T> left;
    Node<T> right;

    /**
     * Node class constructor.
     * @param value This is the value that the node will hold.
     */
    public Node(T value) {
        this.value = value;
        this.left = null;
        this.right = null;
    }
}

In the code above, we defined a generic Node class with a type parameter T. The Node class has three fields: value of type T representing the data the node holds, and left and right of type Node<T> representing the left and right children of the node.

Next, let’s define our BinaryTree class.

/**
 * This class represents a binary tree.
 * @param <T> This is the type parameter which represents the type of value the nodes in the tree will hold.
 */
public class BinaryTree<T> {
    Node<T> root;

    /**
     * BinaryTree class constructor.
     */
    public BinaryTree() {
        this.root = null;
    }
}

In the BinaryTree class, we defined a root of type Node<T> representing the root of the tree. Practical applications of binary trees usually include additional operations like adding a node, deleting a node, checking if a value exists in the tree, and various ways of traversing the tree (in-order, pre-order, post-order).

39.4 Common Operations on Binary Trees

39.4.1 Traversals and Orderings

Trees, being a nonlinear data structure, cannot be traversed in a single run like arrays, linked lists, or other linear data structures. Therefore, they require some specific methods to traverse their nodes, visit each node once, and perform some operations such as search, update, or accumulate.

Three common types of depth-first traversal are in-order, pre-order, and post-order.

39.4.1.1 In-Order Traversal

In in-order traversal, the process is as follows: 1. Traverse the left subtree 2. Visit the root node 3. Traverse the right subtree

This traversal technique is widely used, and in binary search trees, it retrieves data in ascending order.

G A 2 A B 1 B A->B C 3 C A->C
Figure 39.15: In-Order Traversal: Traverse left subtree, Visit root, Traverse right subtree. The numbers indicate the order of traversal.

Pseudocode for in-order traversal can look like this:

function inOrderTraversal(node) {
  if (node != null) {
    inOrderTraversal(node.left)
    visit(node)
    inOrderTraversal(node.right)
  }
}

39.4.1.2 Pre-Order Traversal

In pre-order traversal, the process is as follows: 1. Visit the root node 2. Traverse the left subtree 3. Traverse the right subtree

This method can be used to create a copy of the tree, or to get a prefix expression of an expression tree.

G A 1 A B 2 B A->B C 3 C A->C
Figure 39.16: Pre-Order Traversal: Visit root, Traverse left subtree, Traverse right subtree. The numbers indicate the order of traversal.

Pseudocode for pre-order traversal can look like this:

function preOrderTraversal(node) {
  if (node != null) {
    visit(node)
    preOrderTraversal(node.left)
    preOrderTraversal(node.right)
  }
}

39.4.1.3 Post-Order Traversal

In post-order traversal, the process is as follows: 1. Traverse the left subtree 2. Traverse the right subtree 3. Visit the root node

This method is used to delete or deallocate nodes of a tree, or to get a postfix expression of an expression tree.

G A 3 A B 1 B A->B C 2 C A->C
Figure 39.17: Post-Order Traversal: Traverse left subtree, Traverse right subtree, Visit root. The numbers indicate the order of traversal.

Pseudocode for post-order traversal can look like this:

function postOrderTraversal(node) {
  if (node != null) {
    postOrderTraversal(node.left)
    postOrderTraversal(node.right)
    visit(node)
  }
}

In each of the traversal methods above, visit(node) is an operation that performs some computation on node, such as printing its value.

It’s important to note that traversal operations have a time complexity of O(n), where n is the number of nodes in the tree. This is because every node must be visited once and only once during the traversal.

In addition to traversal methods, other common operations on binary trees include insertion, deletion, and searching. These are vital operations for maintaining and interacting with the binary tree structure.

39.5 Binary Search Trees (BST)

Before delving into Binary Search Trees (BSTs), let’s take a step back and contemplate the significance of search operations in computer science. A search operation is fundamental to many applications, from looking up a contact in your smartphone to searching for a specific book in a library database. Therefore, making search operations efficient has always been a central challenge in computer science.

Traditional search approaches like linear search, which checks each item one by one, can be time-consuming especially when dealing with large datasets, as it has a time complexity of O(n). A considerable improvement is the binary search algorithm, which works by repeatedly dividing the sorted list of elements in half until the target element is found, giving it a time complexity of O(log n). However, binary search requires the data to be sorted, and maintaining a sorted list of elements can be expensive for insert and delete operations.

This is where Binary Search Trees come in. A Binary Search Tree is a special type of binary tree that maintains a specific ordering property — for each node, all elements in the left subtree are less than the node, and all elements in the right subtree are greater than the node (see Figure 39.18). This property of BSTs enables us to perform search, insert, and delete operations efficiently, maintaining a time complexity of O(log n) in an ideal situation where the tree is balanced. Let’s explore these concepts in more detail.

39.5.1 Definition, Properties, and Structure of BSTs

A Binary Search Tree (BST) is a binary tree where for every node:

  • The values in its left subtree are less than the node’s value.
  • The values in its right subtree are greater than the node’s value.
G 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8
Figure 39.18: The Binary Search Tree property: for every node, all elements in the left subtree are less than the node, and all elements in the right subtree are greater than the node.

This property ensures that the “in-order” traversal of a BST results in a sorted sequence of values, which is the key to efficient search, insertion, and deletion operations.

39.5.2 Advantages of BSTs over other tree structures

The BST property provides some advantages over other tree structures:

  • Efficient search: If the tree is balanced, we can find, insert, or delete entries in O(log n) time where n is the total number of nodes. This is significantly faster than linear search, and it doesn’t require the maintenance of a sorted list like binary search.
  • Sorted traversal: In-order traversal of a BST yields the nodes in sorted order, which can be useful in various applications.
  • Flexibility: Unlike an array, BSTs do not have a fixed size. They can grow and shrink during the execution of the program, allowing efficient use of memory.

39.5.3 Detailed description of BST operations: insertion, deletion, searching

BSTs support three fundamental operations: search, insertion, and deletion, all of which leverage the BST property to operate efficiently.

Search: Starting from the root, if the target value is less than the current node, we move to the left child; if it’s greater, we move to the right child. We repeat this process until we either find the target value or reach a null where the target value should be if it were in the tree (see Figure 39.19).

Insertion: Similar to search, but when we reach a null location, we create a new node with the target value at that spot (see Figure 39.20).

G 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8 1 1 2->1
Figure 39.20: BST insertion operation: Similar to the search operation, the algorithm moves down the tree starting from the root. When a null location is encountered, a new node is created at that spot.

Deletion: A bit more complicated as we need to maintain the BST property after removing a node. The process varies based on the number of children the node to be deleted has

, similar to deletion in a binary tree (see Figure 39.21).

G 5 5 3 3 5->3 7 7 5->7 2 2 3->2 4 4 3->4 6 6 7->6 8 8 7->8 1 1 2->1
Figure 39.21: BST deletion operation: Deletion in a BST can be complex as it must maintain the BST property after removing a node. The process varies depending on whether the node to be deleted has no children, one child, or two children.

Absolutely. Let’s break down the implementation of BST operations into both recursive and iterative approaches using Java.

39.5.4 Recursive and Iterative Implementations in Java

While recursion offers a straightforward and elegant approach to implementing BST operations, the iterative method can often provide better performance in terms of space complexity. Let’s look at both methods for each of the main BST operations: search, insertion, and deletion.

39.5.5 Search Operation

The objective of the search operation in a binary search tree (BST) is to find a node that contains a specified value. If the value is present in the BST, the function will return that node. Otherwise, it will return null.

To find a node in a BST, we can use one of two main approaches: recursion or iteration. Both have their advantages and disadvantages, which will depend on the situation at hand.

39.5.5.1 Recursive Approach

Consider the following Java code that uses a recursive method to search a node in a BST.

public TreeNode search(TreeNode node, int value) {
    if (node == null || node.val == value)
        return node;

    if (node.val > value)
        return search(node.left, value);
    
    return search(node.right, value);
}

The method search receives a TreeNode node (that represents a current node in the BST) and an integer value (that is the value to be searched).

The search operation begins by checking whether the node is null (which indicates that we’ve traversed a path in the tree where the value does not exist) or the value of the current node is equal to the desired value. If either condition is true, the node is returned.

If the value to be found is less than the current node’s value, the function recursively calls itself, continuing the search on the left subtree (node.left).

Conversely, if the value is greater than the current node’s value, the function calls itself to continue the search on the right subtree (node.right).

39.5.5.2 Iterative Approach

Now, let’s look at an iterative approach to search a node in a BST. The iterative approach can be more space-efficient as it does not require stack space for recursive calls.

public TreeNode search(TreeNode root, int value) {
    while (root != null) {
        if (root.val > value)
            root = root.left;
        else if (root.val < value)
            root = root.right;
        else
            return root;
    }
    return null;
}

In the iterative approach, we begin with the root node and enter a while loop that continues as long as the current node is not null.

If the value of the current node (root.val) is greater than the desired value, we move to the left child of the current node (root = root.left).

If the value of the current node is less than the desired value, we move to the right child (root = root.right).

The loop breaks and the current node is returned if the value of the current node is equal to the desired value.

If the value is not found in the tree, the function will return null.

39.5.5.3 Visualizing the Search Operation

We’ll use a simple binary search tree with the values 2, 1, and 3, with 2 being the root node, 1 the left child, and 3 the right child. We’ll search for the value 3.

Here, the search operation starts at node 2 (the root node). In both recursive and iterative approaches, the search function compares the value to be found (3) with the current node’s value (2). As 3 is greater than 2, the function moves to the right child (node 3) for both recursive and iterative approaches. This search path is highlighted in blue for recursive and green for iterative.

The search function now finds that the value of the current node (3) is equal to the desired value (3). Therefore, the function returns this node.

Please note that in this simple example, the recursive and iterative paths are identical. This won’t always be the case, especially in larger trees, but the key takeaway is that both methods are following the same fundamental logic, albeit through different operational mechanics.

39.5.6 Insertion Operation

Insertion operation in a Binary Search Tree (BST) aims to add a new node with a specific value to the tree. The position of the new node is determined by the fundamental property of BST: for any given node, all nodes in its left subtree have values less than its value, and all nodes in its right subtree have values greater than its value.

39.5.6.1 Recursive Approach

Let’s start with the recursive approach to insert a node in a BST. Here is a snippet of Java code implementing this approach:

public TreeNode insert(TreeNode node, int value) {
    if (node == null) {
        return new TreeNode(value);
    }

    if (value < node.val) {
        node.left = insert(node.left, value);
    } else if (value > node.val) {
        node.right = insert(node.right, value);
    }

    return node;
}

This insert function takes a TreeNode node (representing the current node in the BST) and an integer value (the value to be inserted).

The function first checks if the node is null. If it is (indicating that we’ve reached an empty spot in the tree where a new node should be inserted), it creates a new TreeNode with the given value and returns it.

If the node is not null, it checks if the value to be inserted is less than the value of the current node. If so, it recursively calls the insert function to insert the value into the left subtree (node.left).

If the value to be inserted is greater than the value of the current node, it recursively calls insert to insert the value into the right subtree (node.right).

After the value is inserted, it returns the (potentially modified) current node, maintaining the structure of the tree.

39.5.6.2 Iterative Approach

Now let’s consider an iterative approach to insert a node in a BST, which can be more space-efficient as it doesn’t require stack space for recursive calls:

public TreeNode insert(TreeNode root, int value) {
    TreeNode node = new TreeNode(value);
    if (root == null) {
        return node;
    }

    TreeNode parent = null, current = root;
    while (current != null) {
        parent = current;
        if (current.val > value) {
            current = current.left;
        } else {
            current = current.right;
        }
    }

    if (parent.val > value) {
        parent.left = node;
    } else {
        parent.right = node;
    }

    return root;
}

In the iterative insert function, we start by creating a new TreeNode node with the specified value.

If the BST is empty (i.e., the root is null), we return this new node as the root of the BST.

If the BST is not empty, we start from the root and iterate down the tree to find the correct position for the new node.

We maintain two pointers during the traversal: parent (representing the parent of the current node) and current (the current node in the traversal). We update parent to be current before moving current in each iteration.

When the current node becomes null, we’ve found the correct spot for the new node. If the new value is less than the parent’s value, we insert the new node as the left child of the parent; otherwise, we insert it as the right child.

The function then returns the root of the (now modified) BST.

39.5.6.3 Visualizing the Insertion Operation

Consider a simple binary search tree with values 2, 1, and 3, where 2 is the root node, 1 is the left child, and 3 is the right child. Let’s try to insert a new node with the value 4 into this tree.

BST 2 2 1 1 2->1 3 3 2->3 4 4 3->4
Figure 39.23: Illustration of the node insertion process in a binary search tree, showcasing both recursive and iterative approaches. The path followed by the recursive approach is shown in blue, while the path followed by the iterative approach is shown in green.

In this diagram, we start at node 2 (the root node) and want to insert a node with the value 4 (shown in red). In both the recursive and iterative methods, we start by comparing the value to be inserted (4) with the value of the current node (2). Since 4 is greater than 2, we move to the right child (node 3).

We then compare 4 with 3. Since 4 is greater than 3, and node 3 doesn’t have a right child, we add the new node as the right child of node 3. The paths taken by the recursive and iterative approaches are shown in blue and green respectively.

Just like the search operation, in this simple example, the recursive and iterative paths for the insertion operation are identical. But in a larger tree, the paths may differ depending on the current state of the tree and the value to be inserted. However, both methods adhere to the same principle - the new node is added in such a way that it maintains the property of the binary search tree.

39.5.7 Deletion Operation

Deleting or removing a node from a binary tree is a slightly complex operation compared to insertion. This is because when we remove a node, we need to ensure that the remaining nodes still form a binary tree. The process differs depending on the number of children the node has:

  1. No child: If the node is a leaf node (i.e., it does not have any children), we can directly remove the node.
G A A B B B->A remove C C B->C
Figure 39.24: Removing a node with no children. The node ‘C’ is simply removed from the tree.
  1. One child: If the node has a single child, we can replace the node with its subtree.
G A A B B B->A remove C C B->C D D C->D
Figure 39.25: Removing a node with one child. The node ‘B’ is replaced by its subtree rooted at ‘C’.
  1. Two children: This is the most complex case. If the node has two children, we generally seek a node from the tree to replace the node to be deleted. The replacement node is typically the in-order predecessor or the in-order successor of the node. These are the node’s immediate previous and next nodes in in-order traversal (see Figure 39.15).

    • In-order predecessor: This is the node with the highest value that is smaller than the value of the node to be removed. It can be found by moving one step to the left (to the left child), and then as far right as possible.

    • In-order successor: This is the node with the smallest value that is larger than the value of the node to be removed. It can be found by moving one step to the right (to the right child), and then as far left as possible.

G A A B B B->A C C B->C
Figure 39.26: Removing a node with two children. The node ‘B’ is replaced by its in-order predecessor ‘A’ or in-order successor ‘C’. The replacement node is then recursively removed from its original position.

Once we find the in-order predecessor (or successor), we replace the node to be deleted with the predecessor (or successor) and then recursively delete the predecessor (or successor) from its original position.

The reason we choose the in-order predecessor or successor is to maintain the binary search tree property, i.e., for every node, nodes in the left subtree have values less than the node, and nodes in the right subtree have values greater than the node.

Here is a basic outline of the removal process in pseudocode:

function remove(node, key) {
  if (node is null) {
    return node
  }
  if (key < node.key) {
    node.left = remove(node.left, key)
  } else if (key > node.key) {
    node.right = remove(node.right, key)
  } else {
    if (node.left is null) {
      return node.right
    } else if (node.right is null) {
      return node.left
    }
    node.key = minValue(node.right)
    node.right = remove(node.right, node.key)
  }
  return node
}

function minValue(node) {
  current = node
  while (current.left is not null) {
    current = current.left
  }
  return current.key
}

In this pseudocode, remove() is a recursive function that deletes the node with the specified key from the tree and returns the new root of the subtree, and minValue() is a helper function that returns the minimum value in a non-empty binary search tree.

The time complexity for deletion in a binary tree can range from O(log n) to O(n), where n is the number of nodes. In a balanced tree, the time complexity is O(log n) because we would be traversing the height of the tree, which is logarithmic in a balanced tree. However, in the worst-case scenario, such as when the tree is a degenerate or skewed tree, the time complexity would be O(n) because each removal could potentially involve traversing all the nodes of the tree.

39.5.7.1 Recursive Approach

public TreeNode delete(TreeNode root, int value) {
    if (root == null) {
        return root;
    }

    if (value < root.val) {
        root.left = delete(root.left, value);
    } else if (value > root.val) {
        root.right = delete(root.right, value);
    } else {
        if (root.left == null)
            return root.right;
        else if (root.right == null)
            return root.left;

        root.val = findMinValue(root.right);
        root.right = delete(root.right, root.val);
    }

    return root;
}

private int findMinValue(TreeNode root) {
    int min = root.val;
    while (root.left != null) {
        min = root.left.val;
        root = root.left;
    }
    return min;
}

Please note that the iterative implementation of delete operation is relatively complex and can detract from the understanding of how the delete operation fundamentally works. Therefore, we generally prefer the recursive approach for teaching purposes. However, once you are comfortable with the recursive implementation, it’s worthwhile to try to implement the iterative version on your own for practice.

The presented examples illustrate how different BST operations can be performed in both recursive and iterative manner. Each has its own pros and cons - recursive methods are often easier to comprehend and write, whereas iterative methods might provide better performance.

39.5.8 Performance and Time Complexity

The time complexity of binary search tree operations such as search, insert, and delete depends on the height of the binary search tree. In the best-case scenario, the tree is perfectly balanced, and its height is log(n), where n is the number of nodes. In this case, search, insert, and delete operations can be performed in O(log n) time. However, in the worst-case scenario, the tree can become skewed, which means it resembles a linked list more than a tree. In this case, the height of the tree is n, and operations can take up to O(n) time.

Let’s see how this plays out for each operation:

Search Operation

  • Best-case performance: O(log n) – This is when the tree is balanced, and we can eliminate half of the nodes from our search at each step.
  • Worst-case performance: O(n) – This happens when the tree is skewed, and our search path includes most or all nodes in the tree.

Insert Operation

  • Best-case performance: O(log n) – When the tree is balanced, the location for a new node is found by traversing from the root to the appropriate leaf node.
  • Worst-case performance: O(n) – In a skewed tree, the new node could end up being a child of the deepest leaf node.

Delete Operation

  • Best-case performance: O(log n) – In a balanced tree, we find the node to delete quickly. If the node has two children, we also need time to find the in-order predecessor or successor.
  • Worst-case performance: O(n) – In a skewed tree, deletion can involve traversing most of the tree.

One thing to note is that the best-case scenario, a balanced tree, is not the average case. Unless measures are taken to balance the tree, binary search trees can become skewed from sequences of insertions and deletions.

The next topic that we’ll discuss, self-balancing binary search trees, will address this limitation of the basic binary search tree by ensuring that the tree remains approximately balanced at all times. As a result, self-balancing binary search trees can guarantee that the time complexity of major operations is always O(log n), which is a significant improvement in the worst-case scenario.

39.6 Balanced Binary Search Trees

Binary search trees (BSTs) are powerful data structures that support efficient search, insertion, and deletion operations. However, if the tree becomes unbalanced, these operations could degrade to linear time complexity. One effective way to mitigate this issue is by employing balanced binary search trees, such as AVL Trees and Red-Black Trees. These trees aim to maintain balance, ensuring efficient performance regardless of the input sequence.

The concept of a “balanced” tree might sound abstract, so let’s make it tangible with an example (Figure 39.27). A balanced binary tree is one where the difference between the heights of the left and right subtrees of every node is either -1, 0, or 1. This balance ensures that the tree doesn’t lean too heavily on one side, optimizing the path to every node and promoting efficiency.

G 6 6 4 4 6->4 8 8 6->8 2 2 4->2 5 5 4->5 7 7 8->7 9 9 8->9 1 1 2->1 3 3 2->3
Figure 39.27: A balanced binary search tree. The numbers show the height of each node.

Consider a balance scale, with nodes as the weights. When equally balanced, the scale maintains equilibrium. However, adding or subtracting weights (or nodes) disturbs this balance, causing the scale to lean towards the heavier side. AVL Trees and Red-Black Trees work similarly - they maintain equilibrium by redistributing the weights, which we call “rotations” in the context of trees.

AVL Trees, named after their inventors Adelson-Velsky and Landis, adjust their balance through a process called rotation. This operation preserves the in-order property of the tree. If the balance factor of a node in an AVL tree becomes 2 or -2, a rotation is performed to regain balance. Let’s illustrate this with a simple example (Figure 39.31).

Consider inserting 1, 2, 3 into an initially empty AVL tree.

  1. Inserting 1 gives us:
G 1 1
Figure 39.28: Inserting 1 into an empty AVL tree.
  1. Inserting 2 gives us:
G 1 1 2 2 1->2
Figure 39.29: Inserting 2 into the AVL tree.
  1. Inserting 3 unbalances the tree:
G 1 1 2 2 1->2 3 3 2->3
Figure 39.30: Inserting 3 into the AVL tree results in an unbalanced tree.

This structure is unbalanced and equivalent to a linked list. The balance factor for node 1 is -2, indicating a required rotation.

  1. We perform a left rotation around the root (node 1), resulting in a balanced tree:
G 2 2 1 1 2->1 3 3 2->3
Figure 39.31: A left rotation around the root balances the AVL tree.

The AVL Tree is now balanced, and all operations are guaranteed logarithmic time complexity.

Red-Black Trees offer another solution, painting each node one of two colors – red or black – and maintaining balance by adhering to red-black properties. Although the rules and operations are more complex, they also ensure the tree remains approximately balanced during additions and deletions. A Red-Black Tree would be great to visualize, but given its complexity, it’s better to explore it interactively or in more advanced courses.

These tree structures may seem complicated, but they highlight how well-understood tools like a binary tree can be further optimized for efficiency. While we’ve only scratched the surface here, you’ll delve deeper into these and other types of self-balancing trees in advanced data structures courses. The goal is not to memorize every detail but to appreciate the breadth of tools available for different use cases.

In your journey as a computer scientist, you’ll encounter various real-world scenarios where maintaining efficiency is vital. From databases to file systems, different applications require different tools. Understanding the principles behind these tools, such as the importance of balance in BSTs, will empower you to make informed decisions in your work.

With this, we conclude our journey into the world of binary trees and binary search trees. We hope that the concepts, examples, and code shared in this chapter help illuminate these fundamental data structures. Remember that learning is a process - with every step, you’re building a stronger foundation in computer science. Keep practicing, keep questioning, and keep exploring.