Closest Pair by Divide and Conquer

Hook problem

Suppose a map contains thousands of points, and you need the two closest ones. The problem is easy to state: among all point pairs, find the pair with the smallest Euclidean distance.

Find the nearest two pointsThe hook shows the point cloud without revealing the final pair yet.

Goal: nearest pair hidden until the merge earns it

The task is simple to state: among all point pairs, find the nearest one.

First naive idea

Check all n(n - 1) / 2 pairs and remember the shortest distance seen so far.

This is simple and correct, but it costs O(n^2) distance checks. Even before the algorithm is formal, the count is already the pain.

All pairs get expensiveThe number of pair checks grows quadratically.

n	checks
5	10
20	190
1000	499,500

n(n - 1) / 2 checks

For 1,000 points, all-pairs means 499,500 distance checks.

Where it breaks

A small best-so-far distance can reject a known far pair after you compare it, but a plain list still has no cheap way to ask: “which points are nearby enough to matter?”

A threshold is not a search structureKnowing r helps reject a far pair after checking it, but not find the near candidates.

Plain lists cannot query nearby points cheaply

The missing operation is: give me only the points near F.

The real merge question is: after solving the left half and the right half, how many cross-boundary pairs still deserve attention?

The core invention

Sort points by x, split them by a vertical line, recursively solve each side, and let r be the smaller of the two closest distances found. Same-side pairs are now solved by recursion.

Split, then reuse solved halvesThe recursive calls already found E-F on the left and G-H on the right.

Left: E-F; right: G-H

Same-side pairs are not rechecked during the merge.

Small base cases still use brute force: when a recursive subproblem has at most three points, compare those few pairs directly. The point of the recursion is to make the hard part become a merge problem.

Base cases feed the mergeSmall leaves use brute force; the merge receives two solved halves.

Category	Status
left-left	solved by the left recursive call
right-right	solved by the right recursive call
cross	checked only in the merge

n <= 3: brute force

This is divide and conquer before it is a grid trick.

Any better answer must be a cross pair whose distance is less than r. That means both endpoints must lie within distance r of the split line.

Only the threshold band remainsA better cross pair must have both points within r of the split line.

Band cross pairs: E-G, E-H, F-G, F-H

The band filters impossible cross pairs, but it still needs a local lookup rule.

All cross pairs: 36. Band pairs: 4. Grid-window pairs: 4.

The band is only a filter. Inside it, we still need a way to find nearby cross pairs without trying all of them. The grid uses cells of side r / sqrt(2) and checks a fixed local window around each active left-side cell.

Packing invariantAfter recursion, a half cannot hide another pair closer than r.

Category	Status
Left half	no same-side pair < r
Right half	no same-side pair < r
Local window	constant relevant candidates

No same-side pair is closer than r

This is why each local grid window has only constant relevant candidates.

Grid cells organize the bandUse cells of side r / sqrt(2), store band points with side labels, and emit only left-right pairs.

cell side = 0.92

The grid is a lookup structure, not the distance test itself.

Why a 5 by 5 window is enoughOffsets with abs(dx) or abs(dy) at least 3 are already too far to beat r.

dx,dy in [-2,2]

The window is conservative; the Euclidean distance check still decides.

Interactive visual demo

Phase

Naive
Split
Recurse
Band
Grid
Check
Done

The naive method compares every pair. With 12 points, that is 66 checks before we can be sure.

r:1.30

cell side:0.92

Checked:1

Active pair:A-B

Best pair:none

Categories

none

Formal version

Given a set P of points in the plane, split it into PL and PR by x-coordinate. Recursively compute closest distances rLeft and rRight, and set r = min(rLeft, rRight).

The only unresolved candidates are pairs (p, q) with p on one side, q on the other side, and dist(p, q) < r.

Every pair has a categoryLeft-left and right-right are solved recursively; only cross pairs reach the merge.

Category	Status
left-left	solved by the left recursive call
right-right	solved by the right recursive call
cross	checked only in the merge

same-side solved; cross unresolved

The merge is not ignoring same-side pairs; it is reusing their recursive answers.

For this page, dist(P, Q) = sqrt((Px - Qx)^2 + (Py - Qy)^2). The grid stores band points with side labels, but it emits only left-right candidate pairs. The implementation uses a stable left-side active order, canonical pair ordering, and deduping before counts are shown.

Active F windowF emits F-G and F-H from the local window; F-G wins.

F-G, then F-H

Only left-side active points emit candidates, which prevents duplicate pair counts.

Active F pairs: F-G, F-H. F-G distance 0.71.

Implementation sketch

function closestPair(points: Point[]): Distance {
  const unique = rejectDuplicateCoordinates(points);
  const byX = sortByX(unique);
  return solve(byX);
}

function solve(pointsByX: Point[]): Distance {
  if (pointsByX.length <= 3) return bruteForce(pointsByX);

  const { leftByX, rightByX, splitX } = splitSortedByX(pointsByX);
  const rLeft = solve(leftByX);
  const rRight = solve(rightByX);
  const r = Math.min(rLeft, rRight);

  return Math.min(r, closestCrossPair(pointsByX, splitX, r));
}

closestCrossPair builds a hash grid for the threshold band. It uses half-open grid cells, checks a conservative dx,dy in [-2,2] cell window, and still computes real Euclidean distance before accepting any candidate.

Code maps to trace stateThe implementation sketch should point to visible trace states.

Step	Trace state
sortByX(points)	split-created
n <= 3 bruteForce	recursion-scaffold
r = min(rLeft, rRight)	threshold-chosen
filter threshold band	threshold-band-only
check grid window	active-window-f

sort once -> recurse -> merge

Carry sorted order, brute force small leaves, then merge with the band and grid.

Correctness intuition

Every pair is either same-side or cross-side. Same-side pairs are already handled by recursion. A cross pair outside the threshold band cannot beat r. A cross pair outside the local grid window cannot beat r either. The remaining checked pairs still need actual distance comparisons.

Skipped pairs have reasonsEach pair is solved, outside the band, outside the window, checked and losing, or checked and winning.

F-G is checked-wins

The categories must be mutually exclusive in the trace.

Complexity

Sort by x once up front and carry ordered lists through recursion, so each level does expected linear merge work. The hash grid gives expected O(1) non-empty cell lookups.

T(n) <= 2T(n / 2) + O(n)

So the total expected running time is O(n log n).

Linear merge at each levelSort once, carry order, and spend expected O(n) work per recursion level.

Level	Work
0	n
1	n
2	n
log n	n

T(n) = 2T(n/2) + O(n)

This gives expected O(n log n), not O(n log^2 n).

Common confusions

The grid does not solve the whole problem by itself. It only makes the merge step cheap after recursion has already produced the distance threshold r.

Points exactly on positive grid boundaries use the cell to the right or above. Ties at exactly distance r do not replace the current best pair. Duplicate coordinates return distance 0 before the grid is built.

Boundary cells are half-openExact positive grid boundaries go to the cell on the right or above.

Input	Expected
Q(0.999, 0.5)	0,0
P(1, 0.5)	1,0
R(2, 1)	2,1

floor((coord - origin) / cellSize)

Q is in 0,0; P is in 1,0; R is in 2,1.

Ties do not replace rThe merge looks for pairs strictly closer than r.

Input	Expected
dist(R,S) = r	keep current best

strict < r

A cross pair at exactly r is checked-loses.

Duplicates exit before the gridDuplicate coordinates give distance 0, so cellSize should never be built from r = 0.

Input	Expected
P(2,3) Q(2,3) R(5,4)	P-Q = 0

duplicate -> distance 0

P-Q returns immediately with distance 0.

Connections in the graph

Closest pair divide and conquer is a neighboring geometry algorithm to Graham Scan. Closest pair uses recursion plus a local cross-boundary merge; Graham Scan uses angular ordering and a stack invariant.

Exercises

In the demo, why do same-side pairs not need to be checked again in the merge?
Which pairs remain after the threshold band but before the grid-window rule?
Why does the grid still compute Euclidean distance instead of accepting every pair in nearby cells?

Graph connections : Closest Pair D&C