Draft
RBF Kernel
Turn distance into local similarity so nearby points matter much more than faraway points.
Hook problem: the local neighbor should matter
Imagine predicting the price behavior of a new house after every feature has already been normalized onto comparable scales. Use two features for the story: normalized size and normalized neighborhood quality.
The new house is A=(1,1). A nearby previous example is B=(2,1). A far previous example is E=(4,4). If the task is local prediction, B should influence A much more than E, because B is the nearby house in feature space.
Distances only make sense because both axes have already been scaled onto comparable normalized units.
| Pair | Dot | Distance^2 | RBF, gamma = 0.5 |
|---|---|---|---|
| A -> B | 3 | 1 | 0.607 |
| A -> E | 8 | 18 | 0.000123 |
Naive idea: use the dot product anyway
A kernel often feels like an inner-product shortcut, so the first attempt is to reuse the dot product .
That fails for this local-neighborhood question. From A=(1,1), the dot product gives A -> B a score of 3, but gives A -> E a score of 8. The far same-direction house wins because dot product rewards magnitude and origin-based alignment. It is not asking, “Which house is close to A?”
| Pair from A | Dot product | Squared distance | RBF at |
|---|---|---|---|
A -> B | 3 | 1 | 0.607 |
A -> E | 8 | 18 | 0.000123 |
The table shows the pain: dot product ranks the far point higher, while distance says the local example is much closer.
Core invention: exponential distance decay
The smallest useful repair is to stop comparing direction from the origin and compare squared Euclidean distance instead. Distance 0 should mean maximum similarity. Larger distances should fade smoothly toward 0.
The RBF kernel does exactly that: square the distance, multiply by a negative decay rate, then pass it through exponential decay.
exponent -0
exponent -0.500
exponent -2.500
exponent -5
exponent -9
| Squared distance | RBF value |
|---|---|
| 0 | 1 |
| 1 | 0.607 |
| 5 | 0.082 |
| 10 | 0.007 |
| 18 | 0.000123 |
Formal definition
Here and are two input vectors. The term is their squared Euclidean distance. The parameter is a positive decay rate. The function maps exponent 0 to 1; negative exponents produce values between 0 and 1.
For and finite real inputs, RBF values are in . A self-match has distance 0, so . Any nonzero distance creates a negative exponent, so the value is less than 1 but still positive.
A common bandwidth convention writes . Larger means a smaller effective neighborhood. Larger means smaller , so the neighborhood gets wider.
| Point | Dot | Distance^2 | K(A, z) |
|---|---|---|---|
| A | 2 | 0 | 1 |
| B | 3 | 1 | 0.607 |
| C | 1 | 5 | 0.082 |
| D | -2 | 10 | 0.007 |
Interactive comparison
Now compare RBF against linear, polynomial, and sigmoid kernels on the same shared points. Change the anchor, then change only the RBF gamma selector. The non-RBF scores keep their fixed parameters so the bandwidth effect stays isolated.
Kernel similarity lab
exp(-gamma ||x - z||^2), gamma = 0.500: Turns nearness into similarity; far points fade toward zero.
Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.
similarity; dot 2, distance^2 0
similarity; dot 3, distance^2 1
similarity; dot 1, distance^2 5
similarity; dot -2, distance^2 10
For anchor A, the RBF pattern is:
| Gamma | A -> B, distance squared 1 | A -> C, distance squared 5 | A -> D, distance squared 10 |
|---|---|---|---|
0.1 | 0.905 | 0.607 | 0.368 |
0.5 | 0.607 | 0.082 | 0.007 |
1.0 | 0.368 | 0.007 | 0.000045 |
Increasing gamma does not move the points. It only makes similarity decay faster with the same distances.
Implementation sketch
The inputs x and z must be equal-length numeric vectors. Let d be that shared number of features. One direct TypeScript implementation is:
function rbfKernel(x: number[], z: number[], gamma: number) {
if (x.length !== z.length) {
throw new Error("RBF kernel expects equal-length vectors.");
}
const d = x.length;
let squaredDistance = 0;
for (let i = 0; i < d; i += 1) {
squaredDistance += (x[i] - z[i]) ** 2;
}
return Math.exp(-gamma * squaredDistance);
}
For A=(1,1) and B=(2,1), the computation is small enough to see by hand:
| Feature index | x_i from A | z_i from B | (x_i-z_i)^2 | Running sum |
|---|---|---|---|---|
0 | 1 | 2 | 1 | 1 |
1 | 1 | 1 | 0 | 1 |
So , and with , .
Behavior, invariant, and complexity
The key invariant is locality: with positive gamma, increasing squared distance never increases RBF similarity.
| Case | What happens | Why it matters |
|---|---|---|
| Self-match | A point is maximally similar to itself. | |
| Near point | Value stays close to 1 | Local examples keep influence. |
| Far point | Value fades toward 0 | Distant examples become almost irrelevant. |
Gamma equals 0 | Every pair maps to 1 | Not local: distance stops mattering. |
Gamma below 0 | Farther points can exceed 1 | Not the RBF setting for this node. |
| Small positive gamma | Same distance decays slowly | Neighborhood is wide. |
| Large positive gamma | Same distance decays quickly | Neighborhood is tiny. |
If each vector has d numeric features, one RBF evaluation costs : the loop reads each coordinate once. Comparing one query against n stored points costs .
Common confusions
| Mistake | Repair |
|---|---|
| ”RBF is angle similarity.” | RBF is local distance similarity. Same direction is not enough. |
| ”Larger gamma is always better.” | Larger gamma can make neighborhoods too tiny and over-local. |
| ”Smaller gamma is always smoother and safer.” | Smaller gamma can make too many points look alike. |
| ”Bandwidth sigma moves in the same direction as gamma.” | Under , larger means smaller and a smaller neighborhood. |
| ”Feature scaling is optional.” | One large-scale feature can dominate squared distance. Normalize or scale meaningful features first. |
| ”I need the infinite feature map proof now.” | RBF can be described that way, but this node only needs the distance-decay behavior. |
Graph connections and practice
RBF comes after the general kernel idea because it is a named kernel function. It contrasts with linear kernel because linear similarity rewards origin-based alignment, while RBF rewards local nearness. It also contrasts with polynomial kernel: polynomial kernels add finite-degree interactions, while RBF behaves like a much richer local neighborhood rule. The next nearby named kernel is sigmoid kernel, which returns to a squashed dot product and has more delicate validity conditions.
idea layer
idea layer
named choice
named choice
named choice
named choice
Prediction questions:
- From anchor
A, which shared point has the highest RBF score besidesAitself? - What happens to
A -> Casgammaincreases from0.1to1.0? - Why can the dot product rank
E=(4,4)aboveB=(2,1)even thoughBis the local neighbor?
Graph connections : RBF Kernel