Adjusted Rand Index | ReConcept Lab

Hook problem: raw agreement is not enough

Rand Index for the fixture is high:

RI=\frac{23}{28}\approx 0.821

But many of those agreements are easy TN pairs: two items have different reference labels and are also in different predicted clusters.

Adjusted Rand Index asks: how much better is the observed pair agreement than what these cluster sizes and class sizes could produce by chance?

Chance adjustmentARI compares observed same-same pairs with the expected amount from the margins.

observed same-same pairs

cluster margin pairs

class margin pairs

E = AB/T1

expected by chance

ARI0.444

(S - E) / (M - E)

First naive idea: trust RI directly

Raw RI is useful, but it does not know whether 23/28 is impressive for this margin shape.

Two random partitions with the same cluster sizes and class sizes can still agree on some same-cluster pairs.

Core invention: subtract expected agreement

Build the contingency table between predicted clusters and reference labels.

Let:

S = sum_ij binom(n_ij, 2): observed pairs that are same cluster and same label.
A = sum_i binom(a_i, 2): same-cluster pairs from predicted cluster sizes.
B = sum_j binom(b_j, 2): same-label pairs from reference class sizes.
T = binom(n, 2): all unordered pairs.
E = AB / T: expected same-same pairs from the margins.
M = (A + B) / 2: the maximum normalizing term.

Formal version

ARI=\frac{S-E}{M-E}

For the fixture:

S=3,\quad A=4,\quad B=7,\quad T=28,\quad E=1,\quad M=5.5

So:

ARI=\frac{3-1}{5.5-1}=\frac{2}{4.5}\approx 0.444

Interactive preset lab

Clustering metric preset lab

Explanation: One mixed cluster and two split true classes make Purity look high while pair metrics reveal damage.

Purity0.875

7/8

Rand Index0.821

23/28

Adjusted Rand Index0.444

S=3, E=1

Fowlkes-Mallows Index0.567

pair P=0.75, pair R=0.429

TP3

FP1

FN4

TN20

Interpretation

1 means the two partitions match perfectly, up to renamed clusters.
Around 0 means the same-cluster agreement is about what the margins would produce by chance.
Negative values mean less same-cluster agreement than expected by chance.

Preset contrastThe same four metrics react differently to over-splitting and merging.

Fixture

Purity: 0.875

RI: 0.821

ARI: 0.444

FMI: 0.567

Perfect match

Purity: 1

RI: 1

ARI: 1

FMI: 1

Singleton over-split

Purity: 1

RI: 0.75

ARI: 0

FMI: not available

All merged

Purity: 0.375

RI: 0.25

ARI: 0

FMI: 0.5

Degenerate branch

If M - E = 0, the normalized formula has no room to move. In implementation, return 1 when the partitions are perfectly identical under that degenerate shape; otherwise return null rather than rendering NaN.

Implementation sketch

function choose2(x: number) {
  return x < 2 ? 0 : (x * (x - 1)) / 2;
}

function adjustedRand(S: number, A: number, B: number, n: number) {
  const T = choose2(n);
  const E = T === 0 ? 0 : (A * B) / T;
  const M = (A + B) / 2;
  const denominator = M - E;
  return denominator === 0 ? null : (S - E) / denominator;
}

Complexity

After building the contingency table, ARI is O(rc) over table cells and margins. Building that table from n items is O(n).

Common confusions

ARI is not just RI with a different denominator; it subtracts chance-level same-same agreement.
ARI can be negative.
ARI still compares external reference labels, so it is not an internal clustering objective.

Clustering metric pathStart with cluster majorities, then move to pair counts and pair-positive balance.

1. Purity

purity

2. Rand Index

rand-index

3. Adjusted Rand Index

adjusted-rand-index

4. Fowlkes-Mallows

fowlkes-mallows-index

Exercises

Which term is the observed same-same count?
Why does the singleton over-split preset have Purity 1.0 but ARI 0?
What does a negative ARI mean in words?

Graph connections : Adjusted Rand Index