Graph connections

Draft

Dunn Index

Score clustering by dividing the weakest inter-cluster gap by the widest within-cluster diameter.

concept intermediate machine-learningmetricsclustering

Hook problem: the weakest gap can decide the story

Averages can hide the exact place where a clustering is fragile.

Dunn Index asks an extreme question: “How large is the closest gap between clusters, compared with the widest cluster?”

Weakest gap versus widest clusterDunn uses extremes: closest cross-cluster pair divided by the largest cluster diameter.
p1p2p3p4p5p6p7p8p9
Closest cross-cluster pairp2 - p6
Minimum gap4.418
Widest diameter0.825

C

Dunn5.358

higher is better

First naive idea: average the distances

A distance average can look good even when one bridge point almost touches another cluster.

For some applications, that one weak gap is the part you care about.

Core invention: minimum separation over maximum diameter

This page uses a common Dunn variant:

  • Inter-cluster separation is the smallest distance between any point in one cluster and any point in another cluster.
  • Intra-cluster diameter is the largest distance between two points in the same cluster.

Formal version

Dunn=minijδ(Ci,Cj)maxlΔ(Cl)Dunn=\frac{\min_{i\ne j}\delta(C_i, C_j)}{\max_l \Delta(C_l)}

Here, delta is the closest cross-cluster pair distance, and Delta is the largest same-cluster pair distance.

Higher Dunn is better. It becomes unavailable when the largest within-cluster diameter is zero.

Interactive preset lab

Internal clustering metric preset lab

Explanation: Three compact groups are far apart, so cohesion and separation agree.

Silhouette0.889

higher is better

Calinski-Harabasz251.312

higher is better

Davies-Bouldin0.126

lower is better

Dunn5.358

higher is better

B_k86.004
W_k1.027
Min gap4.418
Max diameter0.825

Static no-JS fallback:

Fixture Dunn extremes
closest cross-cluster pairp2 - p6
minimum gap4.418
widest clusterC
maximum diameter0.825
Dunn5.358

Implementation sketch

function dunn(minGap: number, maxDiameter: number) {
  return maxDiameter === 0 ? null : minGap / maxDiameter;
}

Complexity

This variant needs pairwise distances, so the direct implementation is O(n^2).

Common confusions

  • Dunn is higher-better.
  • Dunn is sensitive to outliers because it uses extremes.
  • Different texts may use different inter-cluster distance variants; this page uses closest cross-cluster pair distance.
Internal metric contrastHigher is better for Silhouette, CH, and Dunn; lower is better for DB.
Compact islands

Silhouette: 0.889

CH: 251.312

DB: 0.126

Dunn: 5.358

Stretched clusters

Silhouette: 0.551

CH: 17.953

DB: 0.474

Dunn: 0.903

Bridge point

Silhouette: 0.62

CH: 28.707

DB: 0.446

Dunn: 0.327

Bad split

Silhouette: -0.314

CH: 0.035

DB: 13.062

Dunn: 0.08

Exercises

  1. Which quantity is the numerator?
  2. Why does a bridge point reduce Dunn quickly?
  3. What kind of outlier can make the denominator too large?

Graph connections : Dunn Index