Linear Discriminant Analysis

Hook problem: biggest variance is not always best separation

PCA ignores labels. If the largest spread runs inside both classes, PCA may keep a direction that looks energetic but does not separate the labels.

Linear Discriminant Analysis, or LDA, changes the question: use known labels to find a projection that separates classes.

Labels change the questionLDA searches for a separating projection; QDA keeps separate class shapes and gets a curved boundary.

LDAone projection

shared covariance assumption

QDAcurved boundary

separate covariance per class

First naive idea: use PCA before classification

PCA can help compression, but it may discard a low-variance direction that perfectly distinguishes classes. Supervised projection needs to know what the classes are.

Core invention: between-class versus within-class scatter

For a projection direction w, LDA wants class means to move apart while points in the same class stay close.

\max_w \frac{w^T S_B w}{w^T S_W w}

Here S_B measures between-class scatter and S_W measures within-class scatter.

Trace lab

LDAUse labels to find projections that separate class means while keeping classes tight.

Step 1/2: Use labels deliberately

LDA is supervised: it looks for directions that make known classes easier to separate.

Working formulaclass labels y_i

classes are part of the input

Implementation sketch

compute class means and the global mean;
build between-class scatter S_B;
build within-class scatter S_W;
solve for directions that maximize the scatter ratio;

Correctness intuition and limits

LDA is useful when classes are roughly Gaussian-shaped with similar covariance. It can reduce to at most numberOfClasses - 1 discriminant dimensions, because only that many independent class-mean separations exist.

Common confusions

LDA here means Linear Discriminant Analysis, not Latent Dirichlet Allocation.
LDA is supervised; labels are part of the input.
LDA is linear. Curved class boundaries motivate QDA.

Dimensionality reduction pathLinear projections, distance-preserving maps, supervised discriminants, and neighbor embeddings solve different pains.

PCA

Keep the directions where centered data varies most.

MDS

Place points so low-dimensional distances imitate the original distance table.

Isomap

Use neighbor-graph shortest paths before applying an MDS-style layout.

LDA

Use labels to find projections that separate class means while keeping classes tight.

QDA

Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.

SNE

Match neighbor probabilities between high and low dimensions.

t-SNE

Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.

UMAP

Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.

Exercises

Why can PCA keep the wrong direction for classification?
What does the denominator w^T S_W w penalize?
Why can two classes produce only one LDA direction?

Graph connections : Linear Discriminant Analysis