Draft
OPTICS
Order points by density reachability so cluster structure can be read across multiple distance scales.
Hook problem: one radius is too brittle
DBSCAN can find non-round clusters, but it still asks for one epsilon. If one part of the data is dense and another part is looser, one radius can split one group or merge another.
OPTICS keeps the density-walk idea but records an ordering instead of committing to one cut.
| Order | Point | Core dist. | Reachability | Hint |
|---|---|---|---|---|
| 1 | p1 | 0.54 | not defined | A |
| 2 | p2 | 0.58 | 0.54 | A |
| 3 | p3 | 0.71 | 0.58 | A |
| 4 | p4 | not defined | 0.72 | A |
| 5 | p9 | not defined | 1.95 | gap/noise |
| 6 | p5 | 0.64 | 2.26 | B |
| 7 | p6 | 0.64 | 0.64 | B |
| 8 | p7 | 0.78 | 0.64 | B |
| 9 | p8 | not defined | 0.78 | B |
| 10 | p10 | not defined | 3.86 | gap/noise |
First naive idea: run DBSCAN for many epsilon values
Trying many radii can work, but it repeats the same neighborhood work and leaves you comparing many separate outputs.
OPTICS repairs that by producing one reachability plot.
Core invention: reachability distance
OPTICS visits points in an order that keeps nearby dense points close together. For each point it records:
- core distance: the distance needed for that point to become core.
- reachability distance: how far this point had to be reached from already processed dense structure.
Low reachability runs look like valleys. Large jumps mark sparse gaps.
Interactive trace lab
Clustering algorithm trace lab
Step 1/1: Low reachability runs become valleys; big jumps mark sparse gaps.
Order points by density reachability instead of choosing one fixed radius.
A then B
Implementation sketch
for each unprocessed point:
compute core distance;
update neighbor reachability priorities;
visit the next point with smallest reachability distance;
Correctness intuition and cost
OPTICS preserves a density-neighborhood invariant: the next point is chosen by the cheapest current density reachability, so dense regions appear as contiguous low-reachability runs.
With indexing, runtime is often described near O(n log n) on favorable data; without efficient neighborhood search it can degrade toward O(n^2).
Common confusions
- OPTICS does not directly output one final clustering unless you choose an extraction rule.
- Reachability distance is not the same as Euclidean distance from the previous point.
- It still needs density parameters, but it is less tied to one final
epsiloncut.
k-means
k-medoids
dbscan
optics
hdbscan
em-for-gmm
Exercises
- What does a valley in a reachability plot suggest?
- Why is OPTICS useful when densities vary?
- What extra decision is needed after the ordering is built?
Graph connections : OPTICS