Draft
t-SNE
Repair SNE's crowded maps with a heavy-tailed low-dimensional similarity that separates non-neighbors more strongly.
Hook problem: too many neighbors crowd the center
SNE tries to preserve local probabilities, but two dimensions do not have enough room for all moderately distant points to stay moderately distant. Points can crowd into the middle.
t-SNE keeps the neighbor-probability idea and changes the low-dimensional similarity so non-neighbors can move farther away.
low-dimensional neighbor similarity
low-dimensional neighbor similarity
low-dimensional neighbor similarity
First naive idea: use the same Gaussian shape in both spaces
A Gaussian tail becomes tiny quickly. In a low-dimensional map, that makes it difficult to represent many medium-distance relationships without crowding.
Core invention: heavy-tailed low-dimensional similarity
t-SNE uses a Student-t style similarity in the map:
The heavier tail gives far low-dimensional points enough probability mass for the optimizer to push non-neighbors apart.
Trace lab
Many moderately distant high-dimensional neighbors cannot all fit at moderate distances in 2D.
points crowd near the center
Implementation sketch
compute symmetric high-dimensional neighbor probabilities;
initialize a two-dimensional map;
compute Student-t low-dimensional similarities;
optimize KL divergence with attraction and repulsion;
Interpretation cautions
t-SNE is excellent for local exploration, but it is not a cluster validation metric. Cluster area, gap size, and axis direction can change with perplexity, initialization, learning rate, and random seed.
Common confusions
- Nearby points are more meaningful than far-apart distances.
- Bigger visual gaps do not automatically mean bigger original distances.
- Repeated runs can differ.
Keep the directions where centered data varies most.
Place points so low-dimensional distances imitate the original distance table.
Use neighbor-graph shortest paths before applying an MDS-style layout.
Use labels to find projections that separate class means while keeping classes tight.
Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.
Match neighbor probabilities between high and low dimensions.
Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.
Exercises
- What problem does the Student-t tail repair?
- Why should t-SNE plots not be used as automatic proof of clusters?
- Which visual relationships are safest to read from t-SNE?
Graph connections : t-SNE