Draft
Linear Discriminant Analysis
Use labels to project data toward directions that separate class means while keeping each class compact.
Hook problem: biggest variance is not always best separation
PCA ignores labels. If the largest spread runs inside both classes, PCA may keep a direction that looks energetic but does not separate the labels.
Linear Discriminant Analysis, or LDA, changes the question: use known labels to find a projection that separates classes.
shared covariance assumption
separate covariance per class
First naive idea: use PCA before classification
PCA can help compression, but it may discard a low-variance direction that perfectly distinguishes classes. Supervised projection needs to know what the classes are.
Core invention: between-class versus within-class scatter
For a projection direction w, LDA wants class means to move apart while points in the same class stay close.
Here S_B measures between-class scatter and S_W measures within-class scatter.
Trace lab
LDA is supervised: it looks for directions that make known classes easier to separate.
classes are part of the input
Implementation sketch
compute class means and the global mean;
build between-class scatter S_B;
build within-class scatter S_W;
solve for directions that maximize the scatter ratio;
Correctness intuition and limits
LDA is useful when classes are roughly Gaussian-shaped with similar covariance. It can reduce to at most numberOfClasses - 1 discriminant dimensions, because only that many independent class-mean separations exist.
Common confusions
- LDA here means Linear Discriminant Analysis, not Latent Dirichlet Allocation.
- LDA is supervised; labels are part of the input.
- LDA is linear. Curved class boundaries motivate QDA.
Keep the directions where centered data varies most.
Place points so low-dimensional distances imitate the original distance table.
Use neighbor-graph shortest paths before applying an MDS-style layout.
Use labels to find projections that separate class means while keeping classes tight.
Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.
Match neighbor probabilities between high and low dimensions.
Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.
Exercises
- Why can PCA keep the wrong direction for classification?
- What does the denominator
w^T S_W wpenalize? - Why can two classes produce only one LDA direction?
Graph connections : Linear Discriminant Analysis