Draft
Linear Kernel
Use the original dot product as the simplest kernel baseline when the raw feature geometry is already useful.
Hook problem: start with the no-lift baseline
Before adding curved boundaries or local neighborhoods, ask whether the original coordinates already work.
The linear kernel is the kernel that refuses to invent new features.
| Point | Dot | Distance^2 | K(A, z) |
|---|---|---|---|
| A | 2 | 0 | 2 |
| B | 3 | 1 | 3 |
| C | 1 | 5 | 1 |
| D | -2 | 10 | -2 |
First naive idea: dot products are enough
If two vectors point in similar directions and have useful magnitudes, the dot product is a reasonable similarity score.
The pain appears when raw alignment is not the pattern: XOR-like interactions, rings, and local islands need something beyond a straight linear geometry.
Formal version
For d input features, computing this value costs O(d) time. The feature map is the identity map, so phi(x)=x.
Interactive comparison
Kernel similarity lab
x * z: Keeps the original coordinates and measures ordinary alignment. RBF-only decay rate; other kernels keep fixed parameters.
Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.
similarity; dot 2, distance^2 0
similarity; dot 3, distance^2 1
similarity; dot 1, distance^2 5
similarity; dot -2, distance^2 10
Implementation sketch
function linearKernel(x: number[], z: number[]) {
return x.reduce((sum, value, index) => sum + value * z[index], 0);
}
Common confusions
- Linear kernel does not mean the data is easy; it means the model compares original features linearly.
- A linear kernel can still be strong with good engineered features.
- Adding a polynomial or RBF kernel changes the geometry, not just the formula name.
Connections
idea layer
idea layer
named choice
named choice
named choice
named choice
The polynomial kernel starts from this same dot product and raises an affine version of it to a degree. RBF leaves dot-product alignment and measures distance decay instead.
Exercises
- What is the feature map for the linear kernel?
- Why is
O(d)the direct computation cost? - Name one pattern that a linear kernel might struggle with.
Graph connections : Linear Kernel