Draft
Closest Pair by Divide and Conquer
Find the nearest two points faster than checking every pair by combining recursion with a local grid search across the split.
Hook problem
Suppose a map contains thousands of points, and you need the two closest ones. The problem is easy to state: among all point pairs, find the pair with the smallest Euclidean distance.
Goal: nearest pair hidden until the merge earns it
The task is simple to state: among all point pairs, find the nearest one.
First naive idea
Check all n(n - 1) / 2 pairs and remember the shortest distance seen so far.
This is simple and correct, but it costs O(n^2) distance checks. Even before the algorithm is formal, the count is already the pain.
| n | checks |
|---|---|
| 5 | 10 |
| 20 | 190 |
| 1000 | 499,500 |
n(n - 1) / 2 checks
For 1,000 points, all-pairs means 499,500 distance checks.
Where it breaks
A small best-so-far distance can reject a known far pair after you compare it, but a plain list still has no cheap way to ask: “which points are nearby enough to matter?”
Plain lists cannot query nearby points cheaply
The missing operation is: give me only the points near F.
The real merge question is: after solving the left half and the right half, how many cross-boundary pairs still deserve attention?
The core invention
Sort points by x, split them by a vertical line, recursively solve each side, and let r be the smaller of the two closest distances found. Same-side pairs are now solved by recursion.
Left: E-F; right: G-H
Same-side pairs are not rechecked during the merge.
Small base cases still use brute force: when a recursive subproblem has at most three points, compare those few pairs directly. The point of the recursion is to make the hard part become a merge problem.
| Category | Status |
|---|---|
| left-left | solved by the left recursive call |
| right-right | solved by the right recursive call |
| cross | checked only in the merge |
n <= 3: brute force
This is divide and conquer before it is a grid trick.
Any better answer must be a cross pair whose distance is less than r. That means both endpoints must lie within distance r of the split line.
Band cross pairs: E-G, E-H, F-G, F-H
The band filters impossible cross pairs, but it still needs a local lookup rule.
All cross pairs: 36. Band pairs: 4. Grid-window pairs: 4.
The band is only a filter. Inside it, we still need a way to find nearby cross pairs without trying all of them. The grid uses cells of side r / sqrt(2) and checks a fixed local window around each active left-side cell.
| Category | Status |
|---|---|
| Left half | no same-side pair < r |
| Right half | no same-side pair < r |
| Local window | constant relevant candidates |
No same-side pair is closer than r
This is why each local grid window has only constant relevant candidates.
cell side = 0.92
The grid is a lookup structure, not the distance test itself.
dx,dy in [-2,2]
The window is conservative; the Euclidean distance check still decides.
Interactive visual demo
Phase
- Naive
- Split
- Recurse
- Band
- Grid
- Check
- Done
The naive method compares every pair. With 12 points, that is 66 checks before we can be sure.
Formal version
Given a set P of points in the plane, split it into PL and PR by x-coordinate. Recursively compute closest distances rLeft and rRight, and set r = min(rLeft, rRight).
The only unresolved candidates are pairs (p, q) with p on one side, q on the other side, and dist(p, q) < r.
| Category | Status |
|---|---|
| left-left | solved by the left recursive call |
| right-right | solved by the right recursive call |
| cross | checked only in the merge |
same-side solved; cross unresolved
The merge is not ignoring same-side pairs; it is reusing their recursive answers.
For this page, dist(P, Q) = sqrt((Px - Qx)^2 + (Py - Qy)^2). The grid stores band points with side labels, but it emits only left-right candidate pairs. The implementation uses a stable left-side active order, canonical pair ordering, and deduping before counts are shown.
F-G, then F-H
Only left-side active points emit candidates, which prevents duplicate pair counts.
Active F pairs: F-G, F-H. F-G distance 0.71.
Implementation sketch
function closestPair(points: Point[]): Distance {
const unique = rejectDuplicateCoordinates(points);
const byX = sortByX(unique);
return solve(byX);
}
function solve(pointsByX: Point[]): Distance {
if (pointsByX.length <= 3) return bruteForce(pointsByX);
const { leftByX, rightByX, splitX } = splitSortedByX(pointsByX);
const rLeft = solve(leftByX);
const rRight = solve(rightByX);
const r = Math.min(rLeft, rRight);
return Math.min(r, closestCrossPair(pointsByX, splitX, r));
}
closestCrossPair builds a hash grid for the threshold band. It uses half-open grid cells, checks a conservative dx,dy in [-2,2] cell window, and still computes real Euclidean distance before accepting any candidate.
| Step | Trace state |
|---|---|
| sortByX(points) | split-created |
| n <= 3 bruteForce | recursion-scaffold |
| r = min(rLeft, rRight) | threshold-chosen |
| filter threshold band | threshold-band-only |
| check grid window | active-window-f |
sort once -> recurse -> merge
Carry sorted order, brute force small leaves, then merge with the band and grid.
Correctness intuition
Every pair is either same-side or cross-side. Same-side pairs are already handled by recursion. A cross pair outside the threshold band cannot beat r. A cross pair outside the local grid window cannot beat r either. The remaining checked pairs still need actual distance comparisons.
F-G is checked-wins
The categories must be mutually exclusive in the trace.
Complexity
Sort by x once up front and carry ordered lists through recursion, so each level does expected linear merge work. The hash grid gives expected O(1) non-empty cell lookups.
T(n) <= 2T(n / 2) + O(n)
So the total expected running time is O(n log n).
| Level | Work |
|---|---|
| 0 | n |
| 1 | n |
| 2 | n |
| log n | n |
T(n) = 2T(n/2) + O(n)
This gives expected O(n log n), not O(n log^2 n).
Common confusions
The grid does not solve the whole problem by itself. It only makes the merge step cheap after recursion has already produced the distance threshold r.
Points exactly on positive grid boundaries use the cell to the right or above. Ties at exactly distance r do not replace the current best pair. Duplicate coordinates return distance 0 before the grid is built.
| Input | Expected |
|---|---|
| Q(0.999, 0.5) | 0,0 |
| P(1, 0.5) | 1,0 |
| R(2, 1) | 2,1 |
floor((coord - origin) / cellSize)
Q is in 0,0; P is in 1,0; R is in 2,1.
| Input | Expected |
|---|---|
| dist(R,S) = r | keep current best |
strict < r
A cross pair at exactly r is checked-loses.
| Input | Expected |
|---|---|
| P(2,3) Q(2,3) R(5,4) | P-Q = 0 |
duplicate -> distance 0
P-Q returns immediately with distance 0.
Connections in the graph
Closest pair divide and conquer is a neighboring geometry algorithm to Graham Scan. Closest pair uses recursion plus a local cross-boundary merge; Graham Scan uses angular ordering and a stack invariant.
Exercises
- In the demo, why do same-side pairs not need to be checked again in the merge?
- Which pairs remain after the threshold band but before the grid-window rule?
- Why does the grid still compute Euclidean distance instead of accepting every pair in nearby cells?
Graph connections : Closest Pair D&C