Graph connections

Draft

t-SNE

Repair SNE's crowded maps with a heavy-tailed low-dimensional similarity that separates non-neighbors more strongly.

algorithm intermediate machine-learningdimensionality-reductionvisualization

Hook problem: too many neighbors crowd the center

SNE tries to preserve local probabilities, but two dimensions do not have enough room for all moderately distant points to stay moderately distant. Points can crowd into the middle.

t-SNE keeps the neighbor-probability idea and changes the low-dimensional similarity so non-neighbors can move farther away.

Neighborhoods become probabilitiesSNE-family methods care most about which points are local neighbors.
A1 -> A20.70

low-dimensional neighbor similarity

A1 -> A30.49

low-dimensional neighbor similarity

A1 -> B10.08

low-dimensional neighbor similarity

First naive idea: use the same Gaussian shape in both spaces

A Gaussian tail becomes tiny quickly. In a low-dimensional map, that makes it difficult to represent many medium-distance relationships without crowding.

Core invention: heavy-tailed low-dimensional similarity

t-SNE uses a Student-t style similarity in the map:

qij(1+yiyj2)1q_{ij} \propto (1+\lVert y_i-y_j\rVert^2)^{-1}

The heavier tail gives far low-dimensional points enough probability mass for the optimizer to push non-neighbors apart.

Trace lab

t-SNERepair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Step 1/2: Notice the crowding problem

Many moderately distant high-dimensional neighbors cannot all fit at moderate distances in 2D.

Working formulatoo many neighbors, too little area

points crowd near the center

Implementation sketch

compute symmetric high-dimensional neighbor probabilities;
initialize a two-dimensional map;
compute Student-t low-dimensional similarities;
optimize KL divergence with attraction and repulsion;

Interpretation cautions

t-SNE is excellent for local exploration, but it is not a cluster validation metric. Cluster area, gap size, and axis direction can change with perplexity, initialization, learning rate, and random seed.

Common confusions

  • Nearby points are more meaningful than far-apart distances.
  • Bigger visual gaps do not automatically mean bigger original distances.
  • Repeated runs can differ.
Dimensionality reduction pathLinear projections, distance-preserving maps, supervised discriminants, and neighbor embeddings solve different pains.
PCA

Keep the directions where centered data varies most.

MDS

Place points so low-dimensional distances imitate the original distance table.

Isomap

Use neighbor-graph shortest paths before applying an MDS-style layout.

LDA

Use labels to find projections that separate class means while keeping classes tight.

QDA

Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.

SNE

Match neighbor probabilities between high and low dimensions.

t-SNE

Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.

UMAP

Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.

Exercises

  1. What problem does the Student-t tail repair?
  2. Why should t-SNE plots not be used as automatic proof of clusters?
  3. Which visual relationships are safest to read from t-SNE?

Graph connections : t-SNE