Graph connections

Draft

Silhouette Score

Evaluate a clustering without labels by comparing each point's own-cluster distance with its nearest other-cluster distance.

concept intermediate machine-learningmetricsclustering

Hook problem: no answer key this time

The external clustering metrics compare predicted clusters with reference labels. Silhouette score is different: it is an internal clustering metric.

It only sees point positions, distances, and cluster assignments. It does not know whether a point is really graph, tree, or hash.

Cluster geometry without labelsInternal metrics inspect distances and assigned clusters, not answer-key labels.
p1p2p3p4p5p6p7p8p9
What the metric sees

point coordinates, distances, and cluster ids A/B/C

What it does not see

reference labels such as graph/tree/hash

First naive idea: check only closeness inside each cluster

A cluster feels good when its points are close together. But that is only half of the question.

If a point is also close to a different cluster, it may be sitting on a boundary or assigned to the wrong group.

Core invention: compare inside distance with nearest outside distance

For one point i, define:

  • a(i): average distance from i to the other points in its own cluster.
  • b(i): the smallest average distance from i to any other cluster.
One point asks two distance questions`a(i)` is the average distance inside its cluster; `b(i)` is the nearest other-cluster average.
p1p2p3p4p5p6p7p8p9
a(i)0.559

mean distance to own cluster

b(i)4.673

nearest other cluster: B

s(i)0.88

(b - a) / max(a, b)

Mean silhouette0.889

Formal version

The silhouette value for point i is:

s(i)=b(i)a(i)max(a(i),b(i))s(i)=\frac{b(i)-a(i)}{\max(a(i), b(i))}

The clustering’s silhouette score is the average of s(i) over all points.

Values near 1 mean a point is much closer to its own cluster. Values near 0 mean it sits between clusters. Negative values mean the nearest other cluster is closer than its assigned cluster.

For singleton clusters, this implementation assigns s(i)=0 so a one-point cluster is not rewarded as perfectly separated.

Interactive preset lab

Internal clustering metric preset lab

Explanation: Three compact groups are far apart, so cohesion and separation agree.

Silhouette0.889

higher is better

Calinski-Harabasz251.312

higher is better

Davies-Bouldin0.126

lower is better

Dunn5.358

higher is better

B_k86.004
W_k1.027
Min gap4.418
Max diameter0.825

Static no-JS fallback:

Fixture silhouette values
PointClustera(i)b(i)s(i)
p1A0.4935.0820.903
p2A0.5594.6730.880
p3A0.6055.2730.885
p4B0.4934.9390.900
p5B0.5595.3380.895
p6B0.6054.7520.873
p7C0.5065.2710.904
p8C0.6365.4150.883
p9C0.6955.4890.873

Implementation sketch

function silhouettePoint(a: number, b: number) {
  const denominator = Math.max(a, b);
  return denominator === 0 ? 0 : (b - a) / denominator;
}

Complexity

The direct implementation needs pairwise distances, so it is usually O(n^2) time for n points. It stores only summaries plus point scores unless you cache the full distance matrix.

Common confusions

  • Silhouette is internal: it does not use reference labels.
  • Higher is better, but it is still distance-dependent.
  • A negative point score is a warning that another cluster is closer on average.
Internal clustering metric pathMove from point-level comparison to centroid scatter, worst rivals, and distance extremes.
1. Silhouette

silhouette-score

2. Calinski-Harabasz

calinski-harabasz-index

3. Davies-Bouldin

davies-bouldin-index

4. Dunn

dunn-index

Exercises

  1. Why does silhouette need both a(i) and b(i)?
  2. What does a score near 0 suggest about a point?
  3. Why should singleton clusters not automatically get score 1?

Graph connections : Silhouette Score