Draft
Davies-Bouldin Index
Score clustering by averaging each cluster's worst spread-versus-separation rival.
Hook problem: one bad neighbor can matter
A clustering can look acceptable on average while one cluster has a dangerously similar neighbor.
Davies-Bouldin Index asks each cluster: “Which other cluster is my worst rival?”
worst rival: B
worst rival: A
worst rival: A
lower is better
First naive idea: inspect only cluster spread
Average distance to the centroid tells you whether a cluster is compact.
But a spread of 0.8 is fine if the nearest centroid is 8 units away, and risky if the nearest centroid is 1 unit away.
Core invention: spread relative to centroid separation
For cluster i:
S_iis the average distance from its points to centroidi.M_ijis the distance between centroidsiandj.R_ij=(S_i+S_j)/M_ijis the similarity between two clusters.
DB keeps the worst R_ij for each cluster, then averages those worst cases.
Formal version
Lower DB is better. Small values mean every cluster’s worst rival is still separated compared with their combined spread.
Interactive preset lab
Internal clustering metric preset lab
Explanation: Three compact groups are far apart, so cohesion and separation agree.
higher is better
higher is better
lower is better
higher is better
Implementation sketch
function dbClusterScore(scatterI: number, scatterJ: number, centroidDistance: number) {
return centroidDistance === 0 ? null : (scatterI + scatterJ) / centroidDistance;
}
Complexity
Computing scatter is O(n) after centroids are known. Comparing all centroid pairs is O(k^2), where k is the number of clusters.
Common confusions
- Lower DB is better.
- DB uses centroids and average scatter, so it may miss non-convex cluster shapes.
- The
maxis deliberate: each cluster is judged by its worst neighbor.
Silhouette: 0.889
CH: 251.312
DB: 0.126
Dunn: 5.358
Silhouette: 0.551
CH: 17.953
DB: 0.474
Dunn: 0.903
Silhouette: 0.62
CH: 28.707
DB: 0.446
Dunn: 0.327
Silhouette: -0.314
CH: 0.035
DB: 13.062
Dunn: 0.08
Exercises
- Why does the formula add
S_i + S_j? - Why does a small centroid distance increase DB?
- What does the
maxoperation protect against hiding?
Graph connections : Davies-Bouldin Index