Stochastic Neighbor Embedding

Hook problem: global distances are too much to ask

In high dimensions, preserving every distance in a two-dimensional map may be unrealistic. For exploration, a more useful promise is local: keep each point’s likely neighbors nearby.

Stochastic Neighbor Embedding, or SNE, turns neighborhoods into probabilities and tries to match them.

Neighborhoods become probabilitiesSNE-family methods care most about which points are local neighbors.

A1 -> A20.70

low-dimensional neighbor similarity

A1 -> A30.49

low-dimensional neighbor similarity

A1 -> B10.08

low-dimensional neighbor similarity

First naive idea: preserve exact distances

Distance-preserving maps fight many constraints at once. If the learner mainly wants to see local groups, exact global distances may be the wrong target.

Core invention: match neighbor probabilities

For point i, SNE defines a probability that point j is chosen as its neighbor:

p_{j|i} \propto \exp\left(-\frac{\lVert x_i-x_j\rVert^2}{2\sigma_i^2}\right)

Then it searches for low-dimensional points with similar probabilities and minimizes a KL divergence between the high-dimensional and low-dimensional neighbor distributions.

Trace lab

SNEMatch neighbor probabilities between high and low dimensions.

Step 1/2: Turn distances into neighbor probabilities

SNE asks which points would choose each other as neighbors in high dimension.

Working formulap_j|i proportional to exp(-d^2)

near pairs get high probability

Implementation sketch

convert high-dimensional distances into neighbor probabilities;
initialize low-dimensional coordinates;
compute low-dimensional neighbor probabilities;
move points to reduce KL divergence;

Common confusions

SNE preserves local neighborhoods better than global scale.
The axes in the plot do not carry original feature meanings.
Crowding in two dimensions motivates t-SNE’s heavy-tailed repair.

Dimensionality reduction pathLinear projections, distance-preserving maps, supervised discriminants, and neighbor embeddings solve different pains.

PCA

Keep the directions where centered data varies most.

MDS

Place points so low-dimensional distances imitate the original distance table.

Isomap

Use neighbor-graph shortest paths before applying an MDS-style layout.

LDA

Use labels to find projections that separate class means while keeping classes tight.

QDA

Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.

SNE

Match neighbor probabilities between high and low dimensions.

t-SNE

Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.

UMAP

Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.

Exercises

Why does SNE use probabilities instead of raw distances?
What does a high p_{j|i} mean?
What failure does t-SNE repair?

Graph connections : Stochastic Neighbor Embedding