Draft
EM for Gaussian Mixture Models
Fit overlapping Gaussian clusters by alternating soft membership estimates with parameter updates.
Hook problem: some points are between clusters
K-Means makes a hard decision: each point belongs to exactly one cluster. But overlapping clouds often have boundary points that are partly plausible under more than one group.
Gaussian mixture models keep that uncertainty visible.
| Point | x | Gaussian 1 | Gaussian 2 |
|---|---|---|---|
| x1 | 0.80 | 1.00 | 0.00 |
| x2 | 1.10 | 1.00 | 0.00 |
| x3 | 1.40 | 1.00 | 0.00 |
| x4 | 4.70 | 0.00 | 1.00 |
| x5 | 5.10 | 0.00 | 1 |
| x6 | 5.50 | 0.00 | 1 |
First naive idea: assign each point to its most likely Gaussian
Hard assignment throws away useful uncertainty. A boundary point with probabilities 0.52 and 0.48 should not influence the model the same way as a point with 0.99 and 0.01.
EM repairs that by using fractional membership.
Core invention: expectation then maximization
For a Gaussian mixture model, EM alternates:
- E-step: compute responsibilities, the probability that component
jexplains pointx_i. - M-step: update means, variances or covariances, and mixture weights using those responsibilities.
For component j, responsibility has the shape:
The numerator says how much component j explains the point; the denominator normalizes across all components.
Interactive trace lab
Clustering algorithm trace lab
Step 1/4: E-step: compute how much each Gaussian explains every point.
Alternate soft membership estimates with Gaussian parameter updates.
1.60, 4.40
0.50, 0.50
Implementation sketch
repeat {
compute responsibilities for each point and component;
update each component from weighted points;
update mixture weights from total responsibility;
}
Correctness intuition and cost
EM repeatedly improves the data likelihood for the chosen mixture family, but it can still stop at a local optimum.
For n points, k components, d dimensions, and t iterations, diagonal-covariance implementations are often about O(t n k d). Full covariance updates are more expensive.
Common confusions
- EM for GMM is soft clustering, not hard partitioning.
- Gaussian components can overlap.
- The number of components is chosen before fitting.
- Bad initialization can still matter.
k-means
k-medoids
dbscan
optics
hdbscan
em-for-gmm
Exercises
- What does responsibility
r_ijmean in words? - Why should a boundary point contribute fractionally?
- How is EM for GMM similar to K-Means, and how is it different?
Graph connections : EM for GMM