Draft
Stochastic Neighbor Embedding
Preserve local neighborhoods by matching high-dimensional and low-dimensional neighbor probabilities.
Hook problem: global distances are too much to ask
In high dimensions, preserving every distance in a two-dimensional map may be unrealistic. For exploration, a more useful promise is local: keep each point’s likely neighbors nearby.
Stochastic Neighbor Embedding, or SNE, turns neighborhoods into probabilities and tries to match them.
low-dimensional neighbor similarity
low-dimensional neighbor similarity
low-dimensional neighbor similarity
First naive idea: preserve exact distances
Distance-preserving maps fight many constraints at once. If the learner mainly wants to see local groups, exact global distances may be the wrong target.
Core invention: match neighbor probabilities
For point i, SNE defines a probability that point j is chosen as its neighbor:
Then it searches for low-dimensional points with similar probabilities and minimizes a KL divergence between the high-dimensional and low-dimensional neighbor distributions.
Trace lab
SNE asks which points would choose each other as neighbors in high dimension.
near pairs get high probability
Implementation sketch
convert high-dimensional distances into neighbor probabilities;
initialize low-dimensional coordinates;
compute low-dimensional neighbor probabilities;
move points to reduce KL divergence;
Common confusions
- SNE preserves local neighborhoods better than global scale.
- The axes in the plot do not carry original feature meanings.
- Crowding in two dimensions motivates t-SNE’s heavy-tailed repair.
Keep the directions where centered data varies most.
Place points so low-dimensional distances imitate the original distance table.
Use neighbor-graph shortest paths before applying an MDS-style layout.
Use labels to find projections that separate class means while keeping classes tight.
Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.
Match neighbor probabilities between high and low dimensions.
Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.
Exercises
- Why does SNE use probabilities instead of raw distances?
- What does a high
p_{j|i}mean? - What failure does t-SNE repair?
Graph connections : Stochastic Neighbor Embedding