Draft
UMAP
Build a fuzzy neighbor graph and optimize a low-dimensional graph with similar local membership strengths.
Hook problem: keep local structure, scale to practical data
t-SNE makes useful local maps, but learners still need a way to think about neighbor strength, attraction, repulsion, and interpretation limits.
Uniform Manifold Approximation and Projection, or UMAP, starts from a weighted neighbor graph and optimizes a low-dimensional graph that behaves similarly.
Keep the directions where centered data varies most.
Place points so low-dimensional distances imitate the original distance table.
Use neighbor-graph shortest paths before applying an MDS-style layout.
Use labels to find projections that separate class means while keeping classes tight.
Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.
Match neighbor probabilities between high and low dimensions.
Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.
First naive idea: keep only hard nearest-neighbor edges
A yes-or-no neighbor graph throws away useful uncertainty. Some neighbors are very trustworthy; others are barely inside the local radius.
Core invention: fuzzy graph matching
UMAP builds a fuzzy neighbor graph: each local edge has a membership strength. The layout then pulls strong-neighbor pairs together and pushes sampled non-neighbor pairs apart.
The first version to remember is:
Trace lab
UMAP records local neighbor strength instead of only yes-or-no edges.
edge strengths encode local trust
Implementation sketch
find approximate nearest neighbors;
convert local distances into weighted edges;
initialize low-dimensional points;
optimize attraction for edges and repulsion for sampled non-edges;
Interpretation cautions
UMAP axes usually have no direct feature meaning. Local neighborhoods are more trustworthy than global area, orientation, or empty space. Parameters such as n_neighbors and min_dist change what the map emphasizes.
Common confusions
- UMAP is not “t-SNE but always better”; it makes different modeling choices.
- Strong visual islands need domain checks or metrics before becoming claims.
- A fuzzy graph stores degrees of local membership, not only binary edges.
Exercises
- Why does UMAP keep weighted neighbor strengths?
- What do attraction and repulsion do in the layout?
- Which parts of a UMAP plot are risky to over-interpret?
Graph connections : UMAP