Draft
Adjusted Rand Index
Adjust Rand-style pair agreement by subtracting the agreement expected from the cluster and class margins.
Hook problem: raw agreement is not enough
Rand Index for the fixture is high:
But many of those agreements are easy TN pairs: two items have different reference labels and are also in different predicted clusters.
Adjusted Rand Index asks: how much better is the observed pair agreement than what these cluster sizes and class sizes could produce by chance?
observed same-same pairs
cluster margin pairs
class margin pairs
expected by chance
(S - E) / (M - E)
First naive idea: trust RI directly
Raw RI is useful, but it does not know whether 23/28 is impressive for this margin shape.
Two random partitions with the same cluster sizes and class sizes can still agree on some same-cluster pairs.
Core invention: subtract expected agreement
Build the contingency table between predicted clusters and reference labels.
Let:
S = sum_ij binom(n_ij, 2): observed pairs that are same cluster and same label.A = sum_i binom(a_i, 2): same-cluster pairs from predicted cluster sizes.B = sum_j binom(b_j, 2): same-label pairs from reference class sizes.T = binom(n, 2): all unordered pairs.E = AB / T: expected same-same pairs from the margins.M = (A + B) / 2: the maximum normalizing term.
Formal version
For the fixture:
So:
Interactive preset lab
Clustering metric preset lab
Explanation: One mixed cluster and two split true classes make Purity look high while pair metrics reveal damage.
7/8
23/28
S=3, E=1
pair P=0.75, pair R=0.429
Interpretation
1means the two partitions match perfectly, up to renamed clusters.- Around
0means the same-cluster agreement is about what the margins would produce by chance. - Negative values mean less same-cluster agreement than expected by chance.
Purity: 0.875
RI: 0.821
ARI: 0.444
FMI: 0.567
Purity: 1
RI: 1
ARI: 1
FMI: 1
Purity: 1
RI: 0.75
ARI: 0
FMI: not available
Purity: 0.375
RI: 0.25
ARI: 0
FMI: 0.5
Degenerate branch
If M - E = 0, the normalized formula has no room to move. In implementation, return 1 when the partitions are perfectly identical under that degenerate shape; otherwise return null rather than rendering NaN.
Implementation sketch
function choose2(x: number) {
return x < 2 ? 0 : (x * (x - 1)) / 2;
}
function adjustedRand(S: number, A: number, B: number, n: number) {
const T = choose2(n);
const E = T === 0 ? 0 : (A * B) / T;
const M = (A + B) / 2;
const denominator = M - E;
return denominator === 0 ? null : (S - E) / denominator;
}
Complexity
After building the contingency table, ARI is O(rc) over table cells and margins. Building that table from n items is O(n).
Common confusions
- ARI is not just RI with a different denominator; it subtracts chance-level same-same agreement.
- ARI can be negative.
- ARI still compares external reference labels, so it is not an internal clustering objective.
purity
rand-index
adjusted-rand-index
fowlkes-mallows-index
Exercises
- Which term is the observed same-same count?
- Why does the singleton over-split preset have Purity
1.0but ARI0? - What does a negative ARI mean in words?
Graph connections : Adjusted Rand Index