Silhouette Score | ReConcept Lab

Hook problem: no answer key this time

The external clustering metrics compare predicted clusters with reference labels. Silhouette score is different: it is an internal clustering metric.

It only sees point positions, distances, and cluster assignments. It does not know whether a point is really graph, tree, or hash.

Cluster geometry without labelsInternal metrics inspect distances and assigned clusters, not answer-key labels.

What the metric sees

point coordinates, distances, and cluster ids A/B/C

What it does not see

reference labels such as graph/tree/hash

First naive idea: check only closeness inside each cluster

A cluster feels good when its points are close together. But that is only half of the question.

If a point is also close to a different cluster, it may be sitting on a boundary or assigned to the wrong group.

Core invention: compare inside distance with nearest outside distance

For one point i, define:

a(i): average distance from i to the other points in its own cluster.
b(i): the smallest average distance from i to any other cluster.

One point asks two distance questions`a(i)` is the average distance inside its cluster; `b(i)` is the nearest other-cluster average.

a(i)0.559

mean distance to own cluster

b(i)4.673

nearest other cluster: B

s(i)0.88

(b - a) / max(a, b)

Mean silhouette0.889

Formal version

The silhouette value for point i is:

s(i)=\frac{b(i)-a(i)}{\max(a(i), b(i))}

The clustering’s silhouette score is the average of s(i) over all points.

Values near 1 mean a point is much closer to its own cluster. Values near 0 mean it sits between clusters. Negative values mean the nearest other cluster is closer than its assigned cluster.

For singleton clusters, this implementation assigns s(i)=0 so a one-point cluster is not rewarded as perfectly separated.

Interactive preset lab

Internal clustering metric preset lab

Explanation: Three compact groups are far apart, so cohesion and separation agree.

Silhouette0.889

higher is better

Calinski-Harabasz251.312

higher is better

Davies-Bouldin0.126

lower is better

Dunn5.358

higher is better

B_k86.004

W_k1.027

Min gap4.418

Max diameter0.825

Static no-JS fallback:

Fixture silhouette values
Point	Cluster	a(i)	b(i)	s(i)
p1	A	0.493	5.082	0.903
p2	A	0.559	4.673	0.880
p3	A	0.605	5.273	0.885
p4	B	0.493	4.939	0.900
p5	B	0.559	5.338	0.895
p6	B	0.605	4.752	0.873
p7	C	0.506	5.271	0.904
p8	C	0.636	5.415	0.883
p9	C	0.695	5.489	0.873

Implementation sketch

function silhouettePoint(a: number, b: number) {
  const denominator = Math.max(a, b);
  return denominator === 0 ? 0 : (b - a) / denominator;
}

Complexity

The direct implementation needs pairwise distances, so it is usually O(n^2) time for n points. It stores only summaries plus point scores unless you cache the full distance matrix.

Common confusions

Silhouette is internal: it does not use reference labels.
Higher is better, but it is still distance-dependent.
A negative point score is a warning that another cluster is closer on average.

Internal clustering metric pathMove from point-level comparison to centroid scatter, worst rivals, and distance extremes.

1. Silhouette

silhouette-score

2. Calinski-Harabasz

calinski-harabasz-index

3. Davies-Bouldin

davies-bouldin-index

4. Dunn

dunn-index

Exercises

Why does silhouette need both a(i) and b(i)?
What does a score near 0 suggest about a point?
Why should singleton clusters not automatically get score 1?

Graph connections : Silhouette Score