EM for Gaussian Mixture Models

Hook problem: some points are between clusters

K-Means makes a hard decision: each point belongs to exactly one cluster. But overlapping clouds often have boundary points that are partly plausible under more than one group.

Gaussian mixture models keep that uncertainty visible.

Membership can be fractionalEM for GMM treats each point as partly explained by each Gaussian, then refits the Gaussians.

Mean 11.10

Mean 25.10

Weight 10.50

Weight 20.50

Final soft memberships
Point	x	Gaussian 1	Gaussian 2
x1	0.80	1.00	0.00
x2	1.10	1.00	0.00
x3	1.40	1.00	0.00
x4	4.70	0.00	1.00
x5	5.10	0.00	1
x6	5.50	0.00	1

First naive idea: assign each point to its most likely Gaussian

Hard assignment throws away useful uncertainty. A boundary point with probabilities 0.52 and 0.48 should not influence the model the same way as a point with 0.99 and 0.01.

EM repairs that by using fractional membership.

Core invention: expectation then maximization

For a Gaussian mixture model, EM alternates:

E-step: compute responsibilities, the probability that component j explains point x_i.
M-step: update means, variances or covariances, and mixture weights using those responsibilities.

For component j, responsibility has the shape:

r_{ij}=\frac{\pi_j\,\mathcal{N}(x_i\mid\mu_j,\Sigma_j)}{\sum_{\ell}\pi_{\ell}\,\mathcal{N}(x_i\mid\mu_{\ell},\Sigma_{\ell})}

The numerator says how much component j explains the point; the denominator normalizes across all components.

Interactive trace lab

Clustering algorithm trace lab

Step 1/4: E-step: compute how much each Gaussian explains every point.

expect-1

Alternate soft membership estimates with Gaussian parameter updates.

means

1.60, 4.40

weights

0.50, 0.50

Implementation sketch

repeat {
  compute responsibilities for each point and component;
  update each component from weighted points;
  update mixture weights from total responsibility;
}

Correctness intuition and cost

EM repeatedly improves the data likelihood for the chosen mixture family, but it can still stop at a local optimum.

For n points, k components, d dimensions, and t iterations, diagonal-covariance implementations are often about O(t n k d). Full covariance updates are more expensive.

Common confusions

EM for GMM is soft clustering, not hard partitioning.
Gaussian components can overlap.
The number of components is chosen before fitting.
Bad initialization can still matter.

Clustering algorithm pathStart with centroid partitions, repair center robustness, then move to density and probability.

1. K-Means

k-means

2. K-Medoids

k-medoids

3. DBSCAN

dbscan

4. OPTICS

optics

5. HDBSCAN

hdbscan

6. EM for GMM

em-for-gmm

Exercises

What does responsibility r_ij mean in words?
Why should a boundary point contribute fractionally?
How is EM for GMM similar to K-Means, and how is it different?

Graph connections : EM for GMM