Graph connections

Draft

OPTICS

Order points by density reachability so cluster structure can be read across multiple distance scales.

algorithm advanced machine-learningclusteringalgorithms

Hook problem: one radius is too brittle

DBSCAN can find non-round clusters, but it still asks for one epsilon. If one part of the data is dense and another part is looser, one radius can split one group or merge another.

OPTICS keeps the density-walk idea but records an ordering instead of committing to one cut.

A density walk, not one epsilonOPTICS records reachability distances so valleys in the ordering reveal clusters at several scales.
OPTICS reachability ordering
OrderPointCore dist.ReachabilityHint
1p10.54not definedA
2p20.580.54A
3p30.710.58A
4p4not defined0.72A
5p9not defined1.95gap/noise
6p50.642.26B
7p60.640.64B
8p70.780.64B
9p8not defined0.78B
10p10not defined3.86gap/noise

First naive idea: run DBSCAN for many epsilon values

Trying many radii can work, but it repeats the same neighborhood work and leaves you comparing many separate outputs.

OPTICS repairs that by producing one reachability plot.

Core invention: reachability distance

OPTICS visits points in an order that keeps nearby dense points close together. For each point it records:

  • core distance: the distance needed for that point to become core.
  • reachability distance: how far this point had to be reached from already processed dense structure.

Low reachability runs look like valleys. Large jumps mark sparse gaps.

Interactive trace lab

Clustering algorithm trace lab

Step 1/1: Low reachability runs become valleys; big jumps mark sparse gaps.

order by reachability

Order points by density reachability instead of choosing one fixed radius.

valleys

A then B

Implementation sketch

for each unprocessed point:
  compute core distance;
  update neighbor reachability priorities;
  visit the next point with smallest reachability distance;

Correctness intuition and cost

OPTICS preserves a density-neighborhood invariant: the next point is chosen by the cheapest current density reachability, so dense regions appear as contiguous low-reachability runs.

With indexing, runtime is often described near O(n log n) on favorable data; without efficient neighborhood search it can degrade toward O(n^2).

Common confusions

  • OPTICS does not directly output one final clustering unless you choose an extraction rule.
  • Reachability distance is not the same as Euclidean distance from the previous point.
  • It still needs density parameters, but it is less tied to one final epsilon cut.
Clustering algorithm pathStart with centroid partitions, repair center robustness, then move to density and probability.
1. K-Means

k-means

2. K-Medoids

k-medoids

3. DBSCAN

dbscan

4. OPTICS

optics

5. HDBSCAN

hdbscan

6. EM for GMM

em-for-gmm

Exercises

  1. What does a valley in a reachability plot suggest?
  2. Why is OPTICS useful when densities vary?
  3. What extra decision is needed after the ordering is built?

Graph connections : OPTICS