Draft
B-Tree
Keep sorted keys searchable with a shallow multiway tree whose nodes hold many keys at once.
Hook problem
Imagine a huge sorted catalog stored on pages. A page or block is a chunk of storage read together, and one page read is much more expensive than checking a few keys already in memory. The goal is not only “compare fewer keys”; it is “touch fewer pages.”
A page/block read brings a whole chunk into memory, so fewer tree levels often means fewer expensive reads.
First naive idea: sorted array
A sorted array stores values in increasing order in one long sequence. Binary search is excellent for lookup, but insertion is painful: to insert one new key, later records may shift across many pages.
Binary search is fast, but inserting 35 can push later records across page boundaries.
Second naive idea: binary search tree
A binary search tree stores one key per node, with smaller keys on the left and larger keys on the right. That makes local insertion easier than an array, but if each node is effectively a page, a tall search path creates many page reads.
- 10read 1
- 20read 2
- 30read 3
- 40read 4
- 50read 5
A binary search tree fixes local insertion, but page reads pile up when the path is tall.
The core invention
A B-tree is a multiway balanced search tree: one node stores many sorted keys and uses them as dividers between many child pointers. In this page’s B-tree convention, a key carries its record/value or a pointer to it even when the key is in an internal node. So finding internal key 10 is a successful search, not just a routing event.
The keys divide the space into four child ranges. In this B-tree, internal keys also carry records or pointers.
Search walk
Search reads one node, compares inside that node, and either stops on a matching key or descends into the one child range that could contain the key.
Try 17, internal hit 10, and missing key 13. Notice how the internal hit returns after one page read because internal keys are real B-tree records in this convention.
When a node gets full
The easy insertion case is ordinary sorted insertion inside a leaf that still has room.
If a leaf is not full, insertion is just a sorted insert inside that page.
The repair for a full node is split: divide the overfull page and promote the middle key upward.
Root split for inserting 6
The root leaf is full
With t = 2, a node can hold at most 3 keys. Inserting 6 into [5, 10, 20] would overflow the root page.
How the tree grows
If the root is full, splitting it creates a new root and the tree becomes one level taller. That is how a height-balanced tree grows upward while all leaves stay at the same depth.
Minimum degree
This page uses the CLRS-style minimum degree $t$. For $t = 2$, a non-root node has at least 1 key and at most 3 keys. In general, if a non-root node has $k$ keys:
An internal node with $k$ keys has $k + 1$ children, so the maximum number of children is $2t$. Some texts use “order” for maximum children; this page sticks to minimum degree.
Implementation sketch
function search(node: BTreeNode, key: number): SearchResult {
let i = firstIndexWhere(node.keys[i] >= key);
if (node.keys[i] === key) return { found: true, record: node.records[i] };
if (node.leaf) return { found: false };
return search(node.children[i], key);
}
function insert(root: BTreeNode, key: number) {
if (root.isFull()) root = splitRoot(root);
insertNonFull(root, key);
}
function insertNonFull(node: BTreeNode, key: number) {
if (node.leaf) return insertIntoSortedLeaf(node, key);
let i = childIndexFor(node, key);
if (node.children[i].isFull()) splitChild(node, i);
i = childIndexFor(node, key);
insertNonFull(node.children[i], key);
}
The common “split before descent” version keeps the child you enter non-full. For the fixed sequence [10, 20, 5, 6, 12, 30, 7, 17], inserting 17 first splits full child [12, 20, 30], promotes 20, then descends into [12].
Non-root split before inserting 17
Before descending for 17
After inserting [10, 20, 5, 6, 12, 30, 7], the right child [12, 20, 30] is full. Common insertion code splits a full child before descending.
Correctness intuition
Every divider key preserves a range promise: keys in child i lie between the neighboring divider keys. Splitting a full node keeps the left half below the promoted key and the right half above it, so search ranges remain valid. Because splitting does not create a deeper leaf on just one side, all leaves stay at the same depth.
- Keys sorted inside every node
- Children match divider ranges
- Non-root nodes keep 1..3 keys for t = 2
- All leaves sit at the same depth
Complexity
Let $n$ be the number of stored keys and $h$ the number of page levels on a root-to-leaf path. B-tree height is logarithmic with a large branching factor:
That does not mean CPU comparisons disappear; each node may compare several keys. The storage win is that far fewer page levels are read.
The exact capacity depends on pages, but wider nodes reduce the number of page levels.
Common confusions
Deletion is intentionally left out here because its borrow/merge cases deserve their own treatment. This node focuses on search, insertion, split repair, and the invariants those operations protect.
Connections in the graph
B+-trees keep the shallow page-based search shape, but change where values live: internal keys become guide-only separators, all records move to leaves, and leaves are linked for range scans.
B+-trees keep the shallow page shape, then move all records to linked leaves for range scans.
Exercises
Before inserting 17, which child is full and what key will be promoted?
- In root
[10, 20], which child range contains13? - With
t = 2, is[5, 6, 7, 8]a legal node before a split? - When
[5, 10, 20]splits before inserting6, which key is promoted? - Why can a B-tree search for internal key
10stop before reaching a leaf?
Graph connections : B-Tree