Draft
Silhouette Score
Evaluate a clustering without labels by comparing each point's own-cluster distance with its nearest other-cluster distance.
Hook problem: no answer key this time
The external clustering metrics compare predicted clusters with reference labels. Silhouette score is different: it is an internal clustering metric.
It only sees point positions, distances, and cluster assignments. It does not know whether a point is really graph, tree, or hash.
point coordinates, distances, and cluster ids A/B/C
reference labels such as graph/tree/hash
First naive idea: check only closeness inside each cluster
A cluster feels good when its points are close together. But that is only half of the question.
If a point is also close to a different cluster, it may be sitting on a boundary or assigned to the wrong group.
Core invention: compare inside distance with nearest outside distance
For one point i, define:
a(i): average distance fromito the other points in its own cluster.b(i): the smallest average distance fromito any other cluster.
mean distance to own cluster
nearest other cluster: B
(b - a) / max(a, b)
Formal version
The silhouette value for point i is:
The clustering’s silhouette score is the average of s(i) over all points.
Values near 1 mean a point is much closer to its own cluster. Values near 0 mean it sits between clusters. Negative values mean the nearest other cluster is closer than its assigned cluster.
For singleton clusters, this implementation assigns s(i)=0 so a one-point cluster is not rewarded as perfectly separated.
Interactive preset lab
Internal clustering metric preset lab
Explanation: Three compact groups are far apart, so cohesion and separation agree.
higher is better
higher is better
lower is better
higher is better
Static no-JS fallback:
| Point | Cluster | a(i) | b(i) | s(i) |
|---|---|---|---|---|
| p1 | A | 0.493 | 5.082 | 0.903 |
| p2 | A | 0.559 | 4.673 | 0.880 |
| p3 | A | 0.605 | 5.273 | 0.885 |
| p4 | B | 0.493 | 4.939 | 0.900 |
| p5 | B | 0.559 | 5.338 | 0.895 |
| p6 | B | 0.605 | 4.752 | 0.873 |
| p7 | C | 0.506 | 5.271 | 0.904 |
| p8 | C | 0.636 | 5.415 | 0.883 |
| p9 | C | 0.695 | 5.489 | 0.873 |
Implementation sketch
function silhouettePoint(a: number, b: number) {
const denominator = Math.max(a, b);
return denominator === 0 ? 0 : (b - a) / denominator;
}
Complexity
The direct implementation needs pairwise distances, so it is usually O(n^2) time for n points. It stores only summaries plus point scores unless you cache the full distance matrix.
Common confusions
- Silhouette is internal: it does not use reference labels.
- Higher is better, but it is still distance-dependent.
- A negative point score is a warning that another cluster is closer on average.
silhouette-score
calinski-harabasz-index
davies-bouldin-index
dunn-index
Exercises
- Why does silhouette need both
a(i)andb(i)? - What does a score near
0suggest about a point? - Why should singleton clusters not automatically get score
1?
Graph connections : Silhouette Score