1 / 22
00:00
CS 35 — Data Structures & Algorithms

Binary Search Trees

BST = binary tree + ordering invariant
Invariant (the rule) → Search (how we use it) → Insert (how we maintain it)
A way to implement a dictionary (key-value store) where we can search and insert efficiently—when the tree stays shallow.

Remember the Guessing Game?

When we learned binary search, we played a game:
"Pick a number 1–100, I'll find it in ≤7 guesses."
That works because the data is sorted — we halve the range each time.
But what if data keeps arriving?
We have a sorted array. Now insert 42:
7 15 23 38 42? 51 67 84 93
Must shift 51, 67, 84, 93 to make room… O(n)!
Sorted arrays: fast search, slow insert.
What if the structure itself supported both?
What we already know
Binary search on a sorted array → O(log n) search
Each comparison halves the search space
The guessing game was exactly this!
Sorted Array: The Tradeoff
Search
O(log n)
Insert
O(n)
Today's answer: BST
Search
O(log n)
Insert
O(log n)
Same halving idea — but in a tree!

The Dictionary ADT

A dictionary (also called a map) stores key–value pairs and supports three core operations:

insert(key, value) # add or update search(key) # find the value remove(key) # delete entry
Concrete Example: Student Records
10234→ "Alice Chen" 10587→ "Bob Smith" 10891→ "Carol Lee"
Key = student ID, Value = name
Where you've seen dictionaries
• Python dict
• Phone contacts: name → number
• DNS: domain → IP address
• Compiler: variable name → memory address
Today we learn one implementation:
the Binary Search Tree

Today's Objectives

By the end of today's class, you will be able to:

  • 1 State the BST invariant precisely and decide whether a given tree is a valid BST
  • 2 Trace search for a key and justify each step using the invariant
  • 3 Perform insertion while maintaining the invariant
  • 4 Find min/max and produce sorted output via in-order traversal
  • 5 Describe the three cases of deletion and trace them on a tree
  • 6 Explain why runtime is O(h) and what that means for balanced vs degenerate trees
  • 7 Compare BSTs to other dictionary implementations (arrays, hash tables)

h = height of the tree (we'll define this precisely soon)

Quick Recap: Binary Tree Terms

edge root leaf leaf leaf left child right child left subtree of 8 h = 2 8 3 12 1 6 14
Node — stores a key (like 8, 3, 12…)
Edge — link from a parent to its child
Root — the topmost node (no parent)
Leaf — a node with no children
Left / Right child — at most 2 per node
Subtree — a node and ALL of its descendants (not just immediate children!)
Height (h) — longest path from root to a leaf

The BST Property (the invariant)

For a node with key K:

• Every key in the left subtree is < K
• Every key in the right subtree is ≥ K

This must hold for every node in the tree.

Invariant = a condition that must remain true before and after every operation.

ALL < 8 ALL ≥ 8 8 K 3 12 1 6 14

Check: Is This a BST?

Think (10 sec) → Pair (30 sec) → Share
Thumbs up or thumbs down for each tree. If it's invalid, which node violates the property and why?

Tree A

15 10 20 7 12
Valid BST ✓

Tree B

15 10 20 6 17
NOT a BST ✗

17 is the right child of 10 — and 17 > 10, so locally it looks correct.
But 17 is in the left subtree of 15, so it must be < 15. It's not!

One More Check — Valid or Invalid?

Think (10 sec) → Pair (30 sec)
Is this tree a valid BST? Why or why not?
20 10 30 5 15 25 35 22
Common wrong reasoning:
"22 is left child of 25 → OK (22 < 25).
But 22 is left of 30... and 22 < 30... wait, is 22 in the right subtree of 20? 22 > 20... hmm..."
Students get confused and guess "invalid."
Correct reasoning — check ALL ancestors:
22 is in right subtree of 20 → must be > 20 ✓
22 is in left subtree of 30 → must be < 30 ✓
22 is in left subtree of 25 → must be < 25 ✓
Valid! Range = (20, 25) and 22 is inside it.
Checking by hand is error-prone. How do we do this systematically?
Mental model: each node lives in a legal interval.

• Going left → shrink the upper bound
• Going right → raise the lower bound

Tracks all ancestor constraints, not just the parent.
def is_valid(node, min, max): if node is None: return True if node.value <= min or node.value >= max: return False return ( is_valid(node.left, min, node.value) and is_valid(node.right, node.value, max) )
BST validity = range checking, not parent checking.
Node 22's range: go right from 20 (min=20), go left from 30... wait, left from 25 (max=25). Range = (20, 25). 22 is in range!

Search Algorithm

Key idea: at each node, the invariant tells us which side to explore. We never check both.

def search(node, key): if node is None: return False # fell off tree if key == node.key: return True # found it! elif key < node.key: # must be in left subtree return search(node.left, key) else: # must be in right subtree return search(node.right, key)

Example: search(root, 7)

8 3 12 1 6 7 7 < 8 → go LEFT 7 > 3 → go RIGHT 7 > 6 → go RIGHT Found!
StepAt nodeCompareGo
187 < 8Left
237 > 3Right
367 > 6Right
477 == 7Found!

Search: Not Found

What happens when the key isn't in the tree? We walk until we reach None.

Example: search(root, 9)

Search for 9 on the standard tree (8, 3, 12, 1, 6, 14).

StepAt nodeCompareGo
189 > 8Right
2129 < 12Left
3NoneNot found!
"Falling off the tree" means the key doesn't exist.
Node 12 has no left child, so we reach None → return False.
8 3 12 1 6 14 9 > 8 → RIGHT 9 < 12 → LEFT None return False

Your Turn: Trace a Search

Class Activity
Search for key 13 on this tree.

I'll call on someone to walk us through it:
• Which node do we visit first?
• Compare 13 to that node — which way do we go?
• What invariant fact justifies that move?

Solution

StepAt nodeCompareGo
1813 > 8Right
21213 > 12Right
31513 < 15Left
41313 == 13Found!
8 3 12 1 6 10 15 13 20 13 > 8 → RIGHT 13 > 12 → RIGHT 13 < 15 → LEFT Found!

Insertion Algorithm

Insertion = search until you fall off, then attach a new leaf there.

def insert(node, key): if node is None: return Node(key) # new leaf! if key < node.key: node.left = insert(node.left, key) else: # key ≥ node.key node.right = insert(node.right, key) return node
Convention: duplicates go right (≥). The exact choice doesn't matter as long as you're consistent. This is a specification/design decision.

Example: insert 5

8 3 12 1 6 5 5 < 8 → LEFT 5 > 3 → RIGHT 5 < 6 → LEFT null → attach!
StepAt nodeCompareGo
185 < 8Left
235 > 3Right
365 < 6Left
4nullCreate node(5)

Your Turn: Insert Two Keys

Pair Activity — 3 minutes
Starting with this tree, insert 9 then 2.

For each insertion:
• Write the comparison decisions
• Draw the result after each insert
• Where does each new node attach?

Solution

Insert 9
189 > 8Right
2129 < 12Left
3109 < 10Left
4NoneAttach left of 10
Insert 2
182 < 8Left
232 < 3Left
312 > 1Right
4NoneAttach right of 1
8 3 12 1 6 10 15 13 20 9 2 9 > 8 → RIGHT 9 < 12 → LEFT 9 < 10 → LEFT 2 < 8 → LEFT 2 < 3 → LEFT 2 > 1 → RIGHT Use this tree for both insertions

BST Node: From Pseudocode to Python

key | value left right L R
key — what we compare on
value — associated data
left / right — child pointers
class BSTNode: def __init__(self, key, value=None): self.key = key self.value = value self.left = None self.right = None

Search in Real Python

def search(node, key): if node is None: return None if key == node.key: return node.value elif key < node.key: return search(node.left, key) else: return search(node.right, key)
Pseudocode → Real Code
node.key  → self.key
node.left → self.left
Node(key) → BSTNode(key, value)
This is what you'll implement in lab.

Insertion Order Determines Shape

The same keys inserted in a different order produce different trees with different heights.

Think about it
Insert 1, 2, 3, 4, 5 starting from empty. What does the tree look like?
Insert 1, 2, 3, 4, 5
1 2 3 4 5
Chain! h = 4 — basically a linked list
Now try
Insert 4, 2, 6, 1, 3, 5, 7 starting from empty. What does the tree look like?
Insert 4, 2, 6, 1, 3, 5, 7
4 2 6 1 3 5 7
Balanced! h = 2
Same keys, different order → different shape!
Insertion order determines the tree's height.

Finding Min and Max

Smallest key: always go left until you can't.
Largest key: always go right until you can't.
def find_min(node): if node.left is None: return node.key return find_min(node.left) def find_max(node): if node.right is None: return node.key return find_max(node.right)
Why does this work? The invariant guarantees everything smaller is to the left. So the leftmost node has no smaller keys anywhere in the tree.
MIN = 1 MAX = 14 8 3 12 1 6 14 5

In-Order Traversal: Sorted Output!

Visit left subtree, then current node, then right subtree.
Result: keys come out sorted!
def in_order(node): if node is None: return in_order(node.left) print(node.key) in_order(node.right)

Trace on our tree (click through)

1 3 5 6 8 12 14
Sorted! This is O(n) — we visit every node once.
This is WHY BSTs maintain order —
hash tables CAN'T do this!
(1st) (2nd) (3rd) (4th) (5th) (6th) (7th) 1 3 5 6 8 12 14

Deletion: Three Cases

Case 1: Leaf

No children → just remove it.

8 4 11 Done! Just removed.
Delete 11

Case 2: One Child

Replace node with its only child.

15 4 2 22 2 replaces 4
Delete 4

Case 3: Two Children

Find in-order successor, copy key up, delete successor.

8 4 11 9 14 Copy 9 up, remove old 9
Delete 8
In-order successor = smallest key in the right subtree = find_min(node.right).
It always has at most one child (no left child!), so deleting it is Case 1 or 2.

Your Turn: Delete a Key

Class Activity
Delete key 8 from this tree.

• Which case is this?
• What is the in-order successor?
• Draw the final tree.
15 8 22 4 11 18 25 9

Solution

1. Two children → Case 3
2. Successor = 9
3. Copy 9 up, delete old 9

Step 1: Find the successor

Right subtree of 8, then keep going left.

15 8 22 4 11 18 25 9 DELETE go right go left 9 has no left child → Successor = 9

Step 2: Delete 8

15 8 22 4 11 18 25 9 DELETE successor Done! 9 replaced 8.

Runtime: O(h)

Every operation follows one root-to-leaf path. Work is proportional to height h.

Balanced Tree

8 4 12 2 6 10 14
O(log n)
n = 1,000,000 → h ≈ 20

Degenerate Chain

1 2 3 4 5 Inserting sorted data causes this!
O(n)
basically a linked list!

Operations Summary

OperationRuntimeWhy?
search(key)O(h)Follow one path down
insert(key)O(h)Search + attach leaf
delete(key)O(h)Find + restructure
findMin / findMaxO(h)Walk one side
in-order traversalO(n)Visit every node
Coming up next: Balanced BSTs (AVL, red-black) guarantee h = O(log n)

Dictionary Implementations Compared

How does a BST stack up against other ways to implement a dictionary?

Structure Search Insert Delete Sorted Output
Unsorted Array / List O(n) O(1) O(n) O(n log n)
Sorted Array O(log n) O(n) O(n) O(n)
Hash Table O(1) avg O(1) avg O(1) avg O(n log n)
BST (balanced) O(log n) O(log n) O(log n) O(n)
BST Advantage
Sorted traversal in O(n)
• Range queries (all keys between A and B)
• Finding predecessor / successor
Hash Table Advantage
O(1) average for search/insert/delete
• Simpler to implement
• Better when order doesn't matter

Exit Ticket (3 minutes, on your phone)

1 Search for 11 — which nodes are visited?
A. 8→3→6→11   B. 8→12→10→11   C. 8→12→15→11   D. 8→12→11
2 Insert 9 — where does it attach?
A. Left of 10   B. Right of 10   C. Left of 11   D. Right of 6
3 Why is runtime O(h)?
A. Visit every node   B. One node per level on a single path   C. Always log(n)   D. Compare each with every other
4 Deleting a node with two children — what do we do?
A. Remove & reconnect   B. Replace with left child   C. Find successor, copy up, delete it   D. Rebuild subtree
5 In one sentence: what is the BST invariant, and why does it make search efficient?
One-sentence summary:
A BST is a binary tree with an ordering invariant that makes dictionary operations efficient when height is small.

Next class: Balanced BSTs — structures that enforce O(log n) height.

Scan to answer
QR Code
jinzhao3611.github.io/cosi230b/activities/bst-exit-ticket-worksheet.html
8 3 12 1 6 10 15 11 Use this tree for Q1 & Q2