UMAP | ReConcept Lab

Hook problem: keep local structure, scale to practical data

t-SNE makes useful local maps, but learners still need a way to think about neighbor strength, attraction, repulsion, and interpretation limits.

Uniform Manifold Approximation and Projection, or UMAP, starts from a weighted neighbor graph and optimizes a low-dimensional graph that behaves similarly.

Dimensionality reduction pathLinear projections, distance-preserving maps, supervised discriminants, and neighbor embeddings solve different pains.

PCA

Keep the directions where centered data varies most.

MDS

Place points so low-dimensional distances imitate the original distance table.

Isomap

Use neighbor-graph shortest paths before applying an MDS-style layout.

LDA

Use labels to find projections that separate class means while keeping classes tight.

QDA

Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.

SNE

Match neighbor probabilities between high and low dimensions.

t-SNE

Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.

UMAP

Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.

First naive idea: keep only hard nearest-neighbor edges

A yes-or-no neighbor graph throws away useful uncertainty. Some neighbors are very trustworthy; others are barely inside the local radius.

Core invention: fuzzy graph matching

UMAP builds a fuzzy neighbor graph: each local edge has a membership strength. The layout then pulls strong-neighbor pairs together and pushes sampled non-neighbor pairs apart.

The first version to remember is:

\text{low-dimensional graph} \approx \text{high-dimensional fuzzy neighbor graph}

Trace lab

UMAPBuild a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.

Step 1/2: Build a fuzzy neighbor graph

UMAP records local neighbor strength instead of only yes-or-no edges.

Working formulaweighted k-neighbor graph

edge strengths encode local trust

Implementation sketch

find approximate nearest neighbors;
convert local distances into weighted edges;
initialize low-dimensional points;
optimize attraction for edges and repulsion for sampled non-edges;

Interpretation cautions

UMAP axes usually have no direct feature meaning. Local neighborhoods are more trustworthy than global area, orientation, or empty space. Parameters such as n_neighbors and min_dist change what the map emphasizes.

Common confusions

UMAP is not “t-SNE but always better”; it makes different modeling choices.
Strong visual islands need domain checks or metrics before becoming claims.
A fuzzy graph stores degrees of local membership, not only binary edges.

Exercises

Why does UMAP keep weighted neighbor strengths?
What do attraction and repulsion do in the layout?
Which parts of a UMAP plot are risky to over-interpret?

Graph connections : UMAP