Draft
Fowlkes-Mallows Index
Score clustering with the geometric mean of pair precision and pair recall.
Hook problem: focus on pairs placed together
Rand Index includes TN: pairs that are different in the reference labels and different in the clustering.
Sometimes you want a metric that focuses on the positive clustering decision: “these two items belong together.”
Fowlkes-Mallows Index uses only TP, FP, and FN.
TP / (TP + FP)
TP / (TP + FN)
TP / sqrt((TP + FP)(TP + FN))
not used by FMI
First naive idea: use pair precision only
Pair precision asks:
That measures how trustworthy same-cluster pairs are. But it ignores same-label pairs that the clustering split apart.
Core invention: balance pair precision and pair recall
Pair recall asks:
FMI combines pair precision and pair recall with a geometric mean:
The equivalent count form is:
For the fixture:
Interactive preset lab
Clustering metric preset lab
Explanation: One mixed cluster and two split true classes make Purity look high while pair metrics reveal damage.
7/8
23/28
S=3, E=1
pair P=0.75, pair R=0.429
Why TN is absent
TN means two items were apart in both the reference labels and the clustering. That is real agreement for RI, but it does not tell us whether predicted same-cluster groups are useful.
FMI therefore focuses on the co-clustered pair decision.
together in both
clustered together by mistake
split apart by mistake
apart in both
Edge cases
If TP + FP = 0, there are no predicted same-cluster pairs, so pair precision is unavailable.
If TP + FN = 0, there are no reference same-label pairs, so pair recall is unavailable.
In either case, render FMI as not available rather than NaN.
Implementation sketch
function fowlkesMallows(tp: number, fp: number, fn: number) {
const denominator = Math.sqrt((tp + fp) * (tp + fn));
return denominator === 0 ? null : tp / denominator;
}
Complexity
Once pair counts are known, FMI is O(1). Building pair counts directly is O(n^2), or it can be derived from a contingency table.
Common confusions
- FMI is not classifier F1, though both balance two positive-side quantities.
- FMI does not adjust for chance the way ARI does.
- A clustering that merges everything can have high pair recall but weak pair precision.
Purity: 0.875
RI: 0.821
ARI: 0.444
FMI: 0.567
Purity: 1
RI: 1
ARI: 1
FMI: 1
Purity: 1
RI: 0.75
ARI: 0
FMI: not available
Purity: 0.375
RI: 0.25
ARI: 0
FMI: 0.5
Exercises
- Why does FMI ignore
TN? - What happens to FMI in the singleton over-split preset?
- Which mistake hurts pair precision:
FPorFN?
Graph connections : Fowlkes-Mallows Index