B-Tree | ReConcept Lab

Hook problem

Imagine a huge sorted catalog stored on pages. A page or block is a chunk of storage read together, and one page read is much more expensive than checking a few keys already in memory. The goal is not only “compare fewer keys”; it is “touch fewer pages.”

Read one page, get several keys

page 1

51020

page 2

303540

page 3

506070

A page/block read brings a whole chunk into memory, so fewer tree levels often means fewer expensive reads.

First naive idea: sorted array

A sorted array stores values in increasing order in one long sequence. Binary search is excellent for lookup, but insertion is painful: to insert one new key, later records may shift across many pages.

Sorted array search is neat; insertion shifts pages

before

102030

405060

after

102030

354050

Binary search is fast, but inserting 35 can push later records across page boundaries.

Second naive idea: binary search tree

A binary search tree stores one key per node, with smaller keys on the left and larger keys on the right. That makes local insertion easier than an array, but if each node is effectively a page, a tall search path creates many page reads.

One key per node can mean one page per comparison

10read 1
20read 2
30read 3
40read 4
50read 5

B-tree page

1020304050

A binary search tree fixes local insertion, but page reads pile up when the path is tall.

The core invention

A B-tree is a multiway balanced search tree: one node stores many sorted keys and uses them as dividers between many child pointers. In this page’s B-tree convention, a key carries its record/value or a pointer to it even when the key is in an internal node. So finding internal key 10 is a successful search, not just a routing event.

A B-tree node is a sorted page plus child ranges

root

204060

<20child range

20..40child range

40..60child range

>60child range

The keys divide the space into four child ranges. In this B-tree, internal keys also carry records or pointers.

Search walk

Search reads one node, compares inside that node, and either stops on a matching key or descends into the one child range that could contain the key.

Try 17, internal hit 10, and missing key 13. Notice how the internal hit returns after one page read because internal keys are real B-tree records in this convention.

When a node gets full

The easy insertion case is ordinary sorted insertion inside a leaf that still has room.

Easy insert: leaf still has room

before

1220

after

122030

If a leaf is not full, insertion is just a sorted insert inside that page.

The repair for a full node is split: divide the overfull page and promote the middle key upward.

leaf

51020

Root split for inserting 6

The root leaf is full

With t = 2, a node can hold at most 3 keys. Inserting 6 into [5, 10, 20] would overflow the root page.

Inserted key6

Promoted key-

How the tree grows

If the root is full, splitting it creates a new root and the tree becomes one level taller. That is how a height-balanced tree grows upward while all leaves stay at the same depth.

A root split grows upward

before

leaf

51020

key + record pointer

after

root

key + record pointer

leaf

key + record pointer

leaf

key + record pointer

Minimum degree

This page uses the CLRS-style minimum degree $t$ . For $t = 2$ , a non-root node has at least 1 key and at most 3 keys. In general, if a non-root node has $k$ keys:

t - 1 \le k \le 2t - 1

An internal node with $k$ keys has $k + 1$ children, so the maximum number of children is $2t$ . Some texts use “order” for maximum children; this page sticks to minimum degree.

Minimum degree t = 2

t - 1min keys in non-root: 1

2t - 1max keys: 3

2tmax children: 4

Implementation sketch

function search(node: BTreeNode, key: number): SearchResult {
  let i = firstIndexWhere(node.keys[i] >= key);
  if (node.keys[i] === key) return { found: true, record: node.records[i] };
  if (node.leaf) return { found: false };
  return search(node.children[i], key);
}

function insert(root: BTreeNode, key: number) {
  if (root.isFull()) root = splitRoot(root);
  insertNonFull(root, key);
}

function insertNonFull(node: BTreeNode, key: number) {
  if (node.leaf) return insertIntoSortedLeaf(node, key);
  let i = childIndexFor(node, key);
  if (node.children[i].isFull()) splitChild(node, i);
  i = childIndexFor(node, key);
  insertNonFull(node.children[i], key);
}

The common “split before descent” version keeps the child you enter non-full. For the fixed sequence [10, 20, 5, 6, 12, 30, 7, 17], inserting 17 first splits full child [12, 20, 30], promotes 20, then descends into [12].

root

leaf

567

leaf

122030

Non-root split before inserting 17

Before descending for 17

After inserting [10, 20, 5, 6, 12, 30, 7], the right child [12, 20, 30] is full. Common insertion code splits a full child before descending.

Inserted key17

Promoted key-

Correctness intuition

Every divider key preserves a range promise: keys in child i lie between the neighboring divider keys. Splitting a full node keeps the left half below the promoted key and the right half above it, so search ranges remain valid. Because splitting does not create a deeper leaf on just one side, all leaves stay at the same depth.

The promises that must stay true

root

1020

key + record pointer

leaf

567

key + record pointer

leaf

1217

key + record pointer

leaf

key + record pointer

Keys sorted inside every node
Children match divider ranges
Non-root nodes keep 1..3 keys for t = 2
All leaves sit at the same depth

Complexity

Let $n$ be the number of stored keys and $h$ the number of page levels on a root-to-leaf path. B-tree height is logarithmic with a large branching factor:

h = O(\log_t n)

That does not mean CPU comparisons disappear; each node may compare several keys. The storage win is that far fewer page levels are read.

High fanout keeps the tree shallow

Binary-ish1 -> 2 -> 4 -> 8 -> 16

B-tree1 -> 4 -> 16 -> 64

The exact capacity depends on pages, but wider nodes reduce the number of page levels.

Common confusions

Terms to keep separate

B-tree is not a binary tree.

This page uses minimum degree t, not order.

B-tree split moves the promoted key upward.

Page reads and CPU comparisons are different costs.

Deletion is intentionally left out here because its borrow/merge cases deserve their own treatment. This node focuses on search, insertion, split repair, and the invariants those operations protect.

Connections in the graph

B+-trees keep the shallow page-based search shape, but change where values live: internal keys become guide-only separators, all records move to leaves, and leaves are linked for range scans.

Next: B+-tree

B-tree->B+-tree

B+-trees keep the shallow page shape, then move all records to linked leaves for range scans.

Exercises

Prediction checks

root

key + record pointer

leaf

567

key + record pointer

leaf

122030

key + record pointer

Before inserting 17, which child is full and what key will be promoted?

In root [10, 20], which child range contains 13?
With t = 2, is [5, 6, 7, 8] a legal node before a split?
When [5, 10, 20] splits before inserting 6, which key is promoted?
Why can a B-tree search for internal key 10 stop before reaching a leaf?

Graph connections : B-Tree