Graph connections

Draft

Fowlkes-Mallows Index

Score clustering with the geometric mean of pair precision and pair recall.

concept intermediate machine-learningmetricsclustering

Hook problem: focus on pairs placed together

Rand Index includes TN: pairs that are different in the reference labels and different in the clustering.

Sometimes you want a metric that focuses on the positive clustering decision: “these two items belong together.”

Fowlkes-Mallows Index uses only TP, FP, and FN.

Pair-positive balanceFMI ignores true negatives and balances pair precision with pair recall.
Pair precision0.75

TP / (TP + FP)

Pair recall0.429

TP / (TP + FN)

FMI0.567

TP / sqrt((TP + FP)(TP + FN))

TN20

not used by FMI

First naive idea: use pair precision only

Pair precision asks:

TPTP+FP\frac{TP}{TP+FP}

That measures how trustworthy same-cluster pairs are. But it ignores same-label pairs that the clustering split apart.

Core invention: balance pair precision and pair recall

Pair recall asks:

TPTP+FN\frac{TP}{TP+FN}

FMI combines pair precision and pair recall with a geometric mean:

FMI=TPTP+FPTPTP+FNFMI=\sqrt{\frac{TP}{TP+FP}\cdot\frac{TP}{TP+FN}}

The equivalent count form is:

FMI=TP(TP+FP)(TP+FN)FMI=\frac{TP}{\sqrt{(TP+FP)(TP+FN)}}

For the fixture:

FMI=3(3+1)(3+4)=3280.567FMI=\frac{3}{\sqrt{(3+1)(3+4)}}=\frac{3}{\sqrt{28}}\approx 0.567

Interactive preset lab

Clustering metric preset lab

Explanation: One mixed cluster and two split true classes make Purity look high while pair metrics reveal damage.

Purity0.875

7/8

Rand Index0.821

23/28

Adjusted Rand Index0.444

S=3, E=1

Fowlkes-Mallows Index0.567

pair P=0.75, pair R=0.429

TP3
FP1
FN4
TN20

Why TN is absent

TN means two items were apart in both the reference labels and the clustering. That is real agreement for RI, but it does not tell us whether predicted same-cluster groups are useful.

FMI therefore focuses on the co-clustered pair decision.

Rand pair countsRI counts both same-same and different-different agreements.
TP3

together in both

FP1

clustered together by mistake

FN4

split apart by mistake

TN20

apart in both

RI23/28 = 0.821

Edge cases

If TP + FP = 0, there are no predicted same-cluster pairs, so pair precision is unavailable.

If TP + FN = 0, there are no reference same-label pairs, so pair recall is unavailable.

In either case, render FMI as not available rather than NaN.

Implementation sketch

function fowlkesMallows(tp: number, fp: number, fn: number) {
  const denominator = Math.sqrt((tp + fp) * (tp + fn));
  return denominator === 0 ? null : tp / denominator;
}

Complexity

Once pair counts are known, FMI is O(1). Building pair counts directly is O(n^2), or it can be derived from a contingency table.

Common confusions

  • FMI is not classifier F1, though both balance two positive-side quantities.
  • FMI does not adjust for chance the way ARI does.
  • A clustering that merges everything can have high pair recall but weak pair precision.
Preset contrastThe same four metrics react differently to over-splitting and merging.
Fixture

Purity: 0.875

RI: 0.821

ARI: 0.444

FMI: 0.567

Perfect match

Purity: 1

RI: 1

ARI: 1

FMI: 1

Singleton over-split

Purity: 1

RI: 0.75

ARI: 0

FMI: not available

All merged

Purity: 0.375

RI: 0.25

ARI: 0

FMI: 0.5

Exercises

  1. Why does FMI ignore TN?
  2. What happens to FMI in the singleton over-split preset?
  3. Which mistake hurts pair precision: FP or FN?

Graph connections : Fowlkes-Mallows Index