Graph connections

Draft

B+-Tree

Speed up point lookups and range scans by keeping guide keys above and all records in linked leaves.

data-structure intermediate data-structurestreesdatabasesindexing

Hook problem

A database-style index often needs both id = 42 and 20 <= key <= 60. A B-tree already gives shallow point lookup, but range queries want one more property: after finding the first matching record, nearby records should be easy to read in order.

One index, two query shapes
Point lookupid = 42
Range scan20 <= key <= 60

A database-style index should find one record and also read neighboring records in order.

Reusing a B-tree is not enough

In the B-tree page, internal keys carry records or pointers. That convention is useful for point lookup, but it interrupts sequential range scanning: a scan may need to combine internal records, leaf records, and tree traversal.

B-tree reuse: shallow search, awkward scan
30:v60:v
10:v20:v
40:v50:v
70:v80:v

If records can sit in internal nodes, a range scan mixes tree traversal with leaf visits instead of one clean leaf walk.

The core invention

A B+-tree keeps internal nodes as guide-only pages. All records/values live in the leaves, and leaves are linked left-to-right. A guide key or separator key routes search; it is not itself the payload record.

B+-tree anatomy
guide only
3060
< 3030 <= key < 60>= 60
Leaf A · records
10:v20:v
leaf link ->
Leaf B · records
30:v40:v50:v
leaf link ->
Leaf C · records
60:v70:v80:v

This page uses the fixed separator convention shown in the traces: root [30, 60] has child intervals <30, 30 <= key < 60, and >=60. Equality goes right, so lookup 30 descends to the middle child and lookup 60 descends to the right child. Range notation such as [20, 70] is inclusive. Duplicate keys are out of scope.

Point lookup

Point lookup still descends one child per level. The difference is what happens on an internal match: finding guide key 30 in the root does not return a record. Equality follows the right-hand child and the actual record is found in a leaf.

guide only
3060
< 3030 <= key < 60>= 60
Leaf A
10:A20:B
Leaf B
30:C40:D50:E
Leaf C
60:F70:G80:H

B+-tree point lookup

Read root guide keys [30, 60]. They route search only; records are not stored here.

Page reads1
Chosen leaf-
Interval-
ConventionEquality to a separator goes right

Range scan

A range query first descends to the leaf that can contain the lower bound. Then it scans leaf records in order, following leaf links until the upper bound is passed.

guide only
3060
Leaf A
10:A20:B
link ->
Leaf B
30:C40:D50:E
link ->
Leaf C
60:F70:G80:H

B+-tree range scan

Descend once to leaf A, the leaf that can contain inclusive lower bound 20.

Active leafA
Page reads2
Collected records-
Inclusive bounds[20, 70]

The [20, 65] preset stops inside leaf C: it collects 60, sees 70, and stops before including it.

Internal key versus leaf record

The same number can appear as a guide above and as a real record below. Separator 50 can be copied into the parent while record 50 -> value remains in a leaf.

Guide key above, record below
guide only
3060
< 3030 <= key < 60>= 60
Leaf A · records
10:v20:v
leaf link ->
Leaf B · records
30:v40:v50:v
leaf link ->
Leaf C · records
60:v70:v80:v

Leaf split

When a leaf overflows, split the leaf and copy the first key of the new right leaf into the parent as a separator. Unlike the B-tree split on the previous page, this separator is copied for routing; the record stays in the leaf.

parent guide keys
3060
Leaf A
10:A20:B
link -> B
Leaf B
30:C40:D50:E
link -> C
Leaf C
60:F70:G80:H

Leaf split trace

Leaf B is full

Insert 55 into leaf B [30, 40, 50]. With max leaf capacity 3, the temporary leaf would be [30, 40, 50, 55].

Copied separator-
Records stay in leaves50, 55

Formal version

A B+-tree keeps these promises: internal nodes contain guide keys and child pointers; leaves contain all records/values; all leaves are at the same depth; leaves are linked in sorted order.

B+-tree invariants
guide only
3060
< 3030 <= key < 60>= 60
Leaf A · records
10:v20:v
leaf link ->
Leaf B · records
30:v40:v50:v
leaf link ->
Leaf C · records
60:v70:v80:v
  • Internal keys are guide-only.
  • All records live in leaves.
  • All leaves stay at one depth.
  • Leaf links preserve sorted order.

Implementation sketch

function lookup(root: InternalPage, key: number) {
  let page: Page = root;
  while (!page.isLeaf) {
    const i = firstSeparatorGreaterThan(key); // equality goes right
    page = page.children[i];
  }
  return page.records.find((record) => record.key === key);
}

function rangeQuery(root: InternalPage, lo: number, hi: number) {
  let leaf = descendToLeaf(root, lo);
  const out = [];
  while (leaf) {
    for (const record of leaf.records) {
      if (record.key < lo) continue;
      if (record.key > hi) return out;
      out.push(record);
    }
    leaf = leaf.next;
  }
  return out;
}

The range path descends once. After that, sibling pointers carry the scan forward.

Correctness intuition

Guide keys route every key into the unique leaf range that can contain it. Leaf records are sorted inside each leaf, and leaf links connect the leaves in sorted order. Therefore, once a range scan reaches the lower-bound leaf, walking right sees candidate records in nondecreasing key order until the upper bound is passed.

Complexity

Let $B$ be a simplified fanout/page-capacity term, $n$ the number of records, and $k$ the number of records returned.

point lookup=O(h)O(logBn)\text{point lookup} = O(h) \approx O(\log_B n) range query=O(logBn+k)\text{range query} = O(\log_B n + k)

The scan term represents matching records and the leaf pages that contain them, not just CPU comparisons.

One descent, then a leaf walk
O(log_B n)descend to the first leaf
+ kscan matching records/pages

Common confusions

Common confusions
Guide key is not the payload record.
Equality to 30 or 60 goes right.
Leaf sibling links are not child pointers.
Copying separator 50 does not remove record 50 from the leaf.

Deletion, duplicate-key record lists, locking, logging, and real page formats are intentionally out of scope here.

Connections in the graph

B+-trees are motivated by B-trees: keep the shallow page-based search shape, then move records to linked leaves to make range scans efficient.

Why B-tree comes first

B+-trees keep the page-based shallow tree, then specialize it for point lookup plus range access.

Exercises

Prediction checks
guide only
3060
< 3030 <= key < 60>= 60
Leaf A · records
10:v20:v
leaf link ->
Leaf B · records
30:v40:v50:v
leaf link ->
Leaf C · records
60:v70:v80:v

Which leaf receives lookup 60, and why does equality go there?

  1. With root [30, 60], which child receives lookup 30?
  2. What does range [20, 65] return, and why does it stop before 70?
  3. After inserting 55 into [30, 40, 50], which separator is copied upward?
  4. Is a value stored at internal separator 60, or only in the leaf record 60:value?

Graph connections : B+-Tree