Draft
Multidimensional Scaling
Build a low-dimensional map whose distances imitate an original pairwise-distance table.
Hook problem: a table is not a map
Imagine you are comparing five campus study spots: Library, Lab, Cafe, Dorm, and Gym. You do not have useful original coordinates. You only have a pairwise dissimilarity table: each number says how different two places feel for studying.
A spreadsheet can answer “how far is Library from Gym?” but it cannot give the quick visual judgment a map gives. Multidimensional Scaling, or MDS, asks for low-dimensional coordinates whose visible distances imitate that table.
| From / to | Library | Lab | Cafe | Dorm | Gym |
|---|---|---|---|---|---|
| Library | 0 | 1.4 | 2 | 3.2 | 3.6 |
| Lab | 1.4 | 0 | 1.6 | 2.7 | 2.4 |
| Cafe | 2 | 1.6 | 0 | 1.5 | 2.7 |
| Dorm | 3.2 | 2.7 | 1.5 | 0 | 2.2 |
| Gym | 3.6 | 2.4 | 2.7 | 2.2 | 0 |
Conventions: entries are nonnegative; D is symmetric; only one unordered pair with i < j contributes to stress.
Read the table as fixed input . Its entries are nonnegative. The diagonal is zero or ignored because an item is distance zero from itself. The table is symmetric, so Library -> Lab and Lab -> Library are the same promise. When MDS computes stress, each unordered pair is counted once, using the convention .
The word “dissimilarity” is intentionally broad: the numbers may come from ratings, edit distances, survey responses, graph distances, or true Euclidean measurements. MDS can still try to draw a map, but some tables cannot fit perfectly in two dimensions.
First naive idea: place the nearest pairs
The closest pair is easy. Put Library and Lab about 1.4 units apart. Then add Cafe near both of them. The pain starts when Dorm and Gym arrive: each new point must satisfy several distances at once.
Every unordered pair has one target distance. The diagonal is ignored, and the same table will score every candidate map.
Every candidate layout is judged against the same fixed table D.
| Pair | Target | Map | Residual | Squared contribution |
|---|---|---|---|---|
| Library - Lab | 1.4 | 1.4 | 0 nearly matched | 0 |
| Library - Cafe | 2 | 1.22 | -0.78 too close / compressed | 0.61 |
A drawing made by eye has no single way to decide which promise should win. Pull Gym away from Library, and it may become too close to Dorm. Nudge Dorm, and several other distances change.
nearly matched (0)
too close / compressed (-1.45)
too close / compressed (-1.2)
Notice the axis labels: layout x and layout y. These are coordinates invented for the map. They are not original measured features like “noise level” or “walking time” unless the input table was built that way and the solution happens to align with them.
Core invention: make mismatch measurable
MDS repairs the by-eye problem by giving every candidate map the same score. Place each item at a low-dimensional point . Then compare the map’s distance for every pair with the original target distance.
residual = map distance - target distance. Positive means too far / overstretched; negative means too close / compressed.
| Pair | Target | Map | Residual | Squared |
|---|---|---|---|---|
| Library - Gym | 3.6 | 2.15 | -1.45 too close / compressed | 2.09 |
| Lab - Dorm | 2.7 | 1.41 | -1.29 too close / compressed | 1.65 |
| Cafe - Dorm | 1.5 | 0.64 | -0.86 too close / compressed | 0.74 |
| Dorm - Gym | 2.2 | 1 | -1.2 too close / compressed | 1.44 |
| Pair | Target | Map | Residual | Squared |
|---|---|---|---|---|
| Library - Gym | 3.6 | 3.73 | +0.13 too far / overstretched | 0.02 |
| Lab - Dorm | 2.7 | 2.67 | -0.03 nearly matched | 0.0 |
| Cafe - Dorm | 1.5 | 1.58 | +0.08 too far / overstretched | 0.01 |
| Dorm - Gym | 2.2 | 1.77 | -0.43 too close / compressed | 0.19 |
The signed residual convention on this page is:
Positive residual means the map puts the pair too far apart, or overstretches it. Negative residual means the map puts the pair too close together, or compresses it.
MDS accumulates those mismatches as stress:
Squaring makes positive and negative residuals both count as error. It also makes large distortions hurt more than small ones.
Trace lab: from table to lower stress
The full trace uses the same five places, the same distance table, and the same residual convention. It does not simulate random optimization; it simply compares a rough layout with a deterministic lower-stress layout so the scoring idea stays visible.
Every unordered pair has one target distance. The diagonal is ignored, and the same table will score every candidate map.
Every candidate layout is judged against the same fixed table D.
| Pair | Target | Map | Residual | Squared contribution |
|---|---|---|---|---|
| Library - Lab | 1.4 | 1.4 | 0 nearly matched | 0 |
| Library - Cafe | 2 | 1.22 | -0.78 too close / compressed | 0.61 |
The lower-stress layout is not declared “true.” It is only better under the chosen objective and fixed table . A different stress formula, output dimension, or dissimilarity table could prefer a different map.
Formal version
Let be the original target dissimilarity between items and . Let be the reconstructed coordinate table, and let be the map point for item .
- Dfixed pairwise distance table
- Ycandidate map coordinates
- delta_ijdistance shown by the map
- r_ijsigned residual
- stress(Y)sum of squared mismatches
The map distance for the same pair is:
In words, is the distance the reconstructed map shows for pair .
The residual and stress are:
MDS searches for coordinates where this total mismatch is small. Classical MDS has a closed-form route when the distances fit the Euclidean assumptions. Metric and nonmetric MDS optimize related stress ideas. This node focuses on the shared distance-preserving question, not those solver derivations.
Implementation state
An implementation carries only a few ideas, but most of them are pairwise:
| State | What it means | Why it matters |
|---|---|---|
| Distance matrix | The fixed input table of values | Every candidate map is judged against the same promises |
| Coordinates | Current low-dimensional points | These are the variables MDS changes |
| Map distances | Distances measured inside the current map | They are compared with the target table |
| Residuals | Signed differences | They show which pairs are compressed or overstretched |
| Total stress | Sum of squared residuals | It gives one score for the whole layout |
| Stopping or inspection | Iteration limit, small improvement, or human review | Low stress still needs residual inspection |
read or compute the pairwise distance matrix D;
choose output dimension k and initialize coordinates Y;
repeat: evaluate pair residuals and reduce stress;
inspect the largest remaining distortions;
Correctness intuition: useful maps still have scars
The invariant is simple: do not let the target move. The same fixed scores every candidate layout. If one layout has lower stress than another, it fits this objective better.
That statement is intentionally limited. It does not prove the displayed layout is globally optimal. It does not recover the original axes. It does not promise zero distortion when the table is noisy, non-Euclidean, or too rich for the chosen dimension.
too close / compressed
Every candidate layout is scored against the same fixed D. Lower stress means better fit to this objective, not proof of globally optimal coordinates or true original axes.
This is why residuals are part of the lesson, not a debugging afterthought. A map can be helpful exactly because it summarizes many pairwise promises, but the remaining residuals say where the summary bends the truth.
Cost: every point talks to every other point
With items, a full distance table has about useful entries. Storing the table is . If the output has dimensions, a full stress evaluation computes every map distance, so it costs .
A closed-form classical route can require eigendecomposition, which is more expensive than one stress scan.
Normalized stress of the improved map: 0.005
If an optimizer performs evaluations or updates, expect the pairwise part to appear repeatedly. Classical MDS can also require eigendecomposition, which is a different and often heavier cost than one stress scan.
Common confusions
| Confusion | Repair |
|---|---|
| ”The axes must mean original features.” | MDS axes are layout coordinates. They are useful for distance reconstruction, not guaranteed feature interpretation. |
| ”Low stress means zero stress.” | Low stress means smaller accumulated mismatch. Some pair residuals may remain visible. |
| ”MDS is just PCA.” | PCA starts from coordinates and preserves high-variance directions. MDS starts from pairwise distances and preserves those distances. |
| ”All MDS variants are the same solver.” | Classical, metric, and nonmetric MDS differ in solver path or distance interpretation, but share the map-from-distances motivation. |
Connections in the graph
PCA is the useful contrast: it has original coordinates and asks which directions keep variance. MDS can begin after coordinates are gone or unhelpful, as long as a meaningful pairwise table remains.
Isomap reuses the MDS layout step. Its repair is upstream: before drawing the map, it replaces raw straight-line distances with shortest-path distances on a neighbor graph.
Keep the directions where centered data varies most.
Place points so low-dimensional distances imitate the original distance table.
Use neighbor-graph shortest paths before applying an MDS-style layout.
Use labels to find projections that separate class means while keeping classes tight.
Let each class keep its own covariance, creating quadratic boundaries rather than one shared projection.
Match neighbor probabilities between high and low dimensions.
Repair SNE's crowding problem with a heavy-tailed low-dimensional similarity.
Build a fuzzy neighbor graph, then optimize a low-dimensional graph with similar membership strengths.
Exercises
- In the campus table, why is it enough to count
Library - Labonce instead of also countingLab - Library? - Pick one negative residual in the inspector. What physical action on the map would make that pair less compressed, and which other pair might get worse?
- Why does a lower-stress MDS layout not prove that the two displayed axes are meaningful original features?
Graph connections : Multidimensional Scaling