Draft
Purity
Score a clustering by giving each predicted cluster credit for its largest reference-label majority.
Hook problem: clusters have no built-in names
A clustering algorithm groups eight learning cards. A teacher also has an answer key: each card is really about graph, tree, or hash.
The cluster names A, B, C, and D are arbitrary. Cluster A does not mean “graph” just because the algorithm printed A.
reference: graph
cluster: A
reference: graph
cluster: A
reference: graph
cluster: B
reference: tree
cluster: B
reference: tree
cluster: C
reference: tree
cluster: C
reference: hash
cluster: D
reference: hash
cluster: D
First naive idea: compare cluster names directly
If we treated A, B, C, and D as class labels, the score would depend on random names chosen by the clustering algorithm.
That is not a stable metric. Renaming cluster A to Z should not change the evaluation.
Core invention: each cluster gets its best label
Purity asks each predicted cluster one local question:
Which reference label is the majority inside this cluster?
Then it adds the majority counts and divides by the number of items.
majority label: graph
majority label: graph
majority label: tree
majority label: hash
Contingency table: 4 clusters by 3 reference labels.
Formal version
Let C_k be predicted cluster k, L_j be reference label j, and n be the number of items.
Plainly: for each cluster, keep only its largest answer-key pile.
For the fixture:
Interactive preset lab
Use the presets to compare the same clustering result under Purity, Rand Index, Adjusted Rand Index, and Fowlkes-Mallows Index.
Clustering metric preset lab
Explanation: One mixed cluster and two split true classes make Purity look high while pair metrics reveal damage.
7/8
23/28
S=3, E=1
pair P=0.75, pair R=0.429
Static no-JS fallback:
| Cluster | Majority label | Contribution |
|---|---|---|
| A | graph | 2/2 |
| B | graph | 1/2 |
| C | tree | 2/2 |
| D | hash | 2/2 |
Where Purity breaks
Purity rewards clean clusters, but it does not penalize splitting enough.
If every item becomes its own singleton cluster, every cluster is perfectly pure, so Purity becomes 1.0. That does not mean the clustering recovered the useful groups.
Purity: 0.875
RI: 0.821
ARI: 0.444
FMI: 0.567
Purity: 1
RI: 1
ARI: 1
FMI: 1
Purity: 1
RI: 0.75
ARI: 0
FMI: not available
Purity: 0.375
RI: 0.25
ARI: 0
FMI: 0.5
Implementation sketch
function purity(items: { cluster: string; label: string }[]) {
const counts = new Map<string, Map<string, number>>();
for (const item of items) {
const row = counts.get(item.cluster) ?? new Map<string, number>();
row.set(item.label, (row.get(item.label) ?? 0) + 1);
counts.set(item.cluster, row);
}
let kept = 0;
for (const row of counts.values()) {
kept += Math.max(...row.values());
}
return items.length === 0 ? null : kept / items.length;
}
Complexity
Building the cluster-by-label table takes O(n) time for n items. The extra space is O(rc) for r predicted clusters and c reference labels that actually appear.
Common confusions
- Purity is an external clustering metric because it needs reference labels.
- Purity is not accuracy: cluster names are arbitrary, so clusters are first matched locally to majority labels.
- High Purity does not prove a good clustering when the algorithm over-splits.
purity
rand-index
adjusted-rand-index
fowlkes-mallows-index
Exercises
- Which fixture cluster causes the
1/2contribution? - Why does singleton over-splitting get Purity
1.0? - What would happen to Purity if clusters
CandDwere merged?
Graph connections : Purity