Graph connections

Draft

RBF Kernel

Turn distance into local similarity so nearby points matter much more than faraway points.

concept intermediate machine-learningkernelssimilarity

Hook problem: the local neighbor should matter

Imagine predicting the price behavior of a new house after every feature has already been normalized onto comparable scales. Use two features for the story: normalized size and normalized neighborhood quality.

The new house is A=(1,1). A nearby previous example is B=(2,1). A far previous example is E=(4,4). If the task is local prediction, B should influence A much more than E, because B is the nearby house in feature space.

A far same-direction house can fool the dot productAfter size and neighborhood features are normalized, local prediction should prefer the nearby house B over the far house E.
near: distance^2 = 1far: distance^2 = 18A (1, 1)B (2, 1)E (4, 4)normalized size featurenormalized neighborhood feature

Distances only make sense because both axes have already been scaled onto comparable normalized units.

From A: dot product rewards E, but RBF rewards the local neighbor B
PairDotDistance^2RBF, gamma = 0.5
A -> B310.607
A -> E8180.000123

Naive idea: use the dot product anyway

A kernel often feels like an inner-product shortcut, so the first attempt is to reuse the dot product xTzx^Tz.

That fails for this local-neighborhood question. From A=(1,1), the dot product gives A -> B a score of 3, but gives A -> E a score of 8. The far same-direction house wins because dot product rewards magnitude and origin-based alignment. It is not asking, “Which house is close to A?”

Pair from ADot product xTzx^TzSquared distance xz2\lVert x-z\rVert^2RBF at γ=0.5\gamma=0.5
A -> B310.607
A -> E8180.000123

The table shows the pain: dot product ranks the far point higher, while distance says the local example is much closer.

Core invention: exponential distance decay

The smallest useful repair is to stop comparing direction from the origin and compare squared Euclidean distance instead. Distance 0 should mean maximum similarity. Larger distances should fade smoothly toward 0.

The RBF kernel does exactly that: square the distance, multiply by a negative decay rate, then pass it through exponential decay.

Exponential decay turns distance into localityWith gamma = 0.5, each larger squared distance pushes the RBF score closer to zero.
||x - z||^2 = 01

exponent -0

||x - z||^2 = 10.607

exponent -0.500

||x - z||^2 = 50.082

exponent -2.500

||x - z||^2 = 100.007

exponent -5

||x - z||^2 = 180.000123

exponent -9

Decay strip for gamma = 0.5
Squared distanceRBF value
01
10.607
50.082
100.007
180.000123

Formal definition

K(x,z)=exp(γxz2)K(x,z)=\exp(-\gamma \lVert x-z\rVert^2)

Here xx and zz are two input vectors. The term xz2\lVert x-z\rVert^2 is their squared Euclidean distance. The parameter γ\gamma is a positive decay rate. The function exp\exp maps exponent 0 to 1; negative exponents produce values between 0 and 1.

For γ>0\gamma>0 and finite real inputs, RBF values are in (0,1](0,1]. A self-match has distance 0, so K(x,x)=exp(0)=1K(x,x)=\exp(0)=1. Any nonzero distance creates a negative exponent, so the value is less than 1 but still positive.

A common bandwidth convention writes γ=1/(2σ2)\gamma = 1 / (2\sigma^2). Larger γ\gamma means a smaller effective neighborhood. Larger σ\sigma means smaller γ\gamma, so the neighborhood gets wider.

Similarity fades with distanceRBF makes nearby points strongly similar and far points almost irrelevant.
RBF kernel from anchor A
PointDotDistance^2K(A, z)
A201
B310.607
C150.082
D-2100.007

Interactive comparison

Now compare RBF against linear, polynomial, and sigmoid kernels on the same shared points. Change the anchor, then change only the RBF gamma selector. The non-RBF scores keep their fixed parameters so the bandwidth effect stays isolated.

Kernel similarity lab

exp(-gamma ||x - z||^2), gamma = 0.500: Turns nearness into similarity; far points fade toward zero.

Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.

A -> A1

similarity; dot 2, distance^2 0

A -> B0.607

similarity; dot 3, distance^2 1

A -> C0.082

similarity; dot 1, distance^2 5

A -> D0.007

similarity; dot -2, distance^2 10

For anchor A, the RBF pattern is:

GammaA -> B, distance squared 1A -> C, distance squared 5A -> D, distance squared 10
0.10.9050.6070.368
0.50.6070.0820.007
1.00.3680.0070.000045

Increasing gamma does not move the points. It only makes similarity decay faster with the same distances.

Implementation sketch

The inputs x and z must be equal-length numeric vectors. Let d be that shared number of features. One direct TypeScript implementation is:

function rbfKernel(x: number[], z: number[], gamma: number) {
  if (x.length !== z.length) {
    throw new Error("RBF kernel expects equal-length vectors.");
  }

  const d = x.length;
  let squaredDistance = 0;

  for (let i = 0; i < d; i += 1) {
    squaredDistance += (x[i] - z[i]) ** 2;
  }

  return Math.exp(-gamma * squaredDistance);
}

For A=(1,1) and B=(2,1), the computation is small enough to see by hand:

Feature indexx_i from Az_i from B(x_i-z_i)^2Running sum
01211
11101

So AB2=1\lVert A-B\rVert^2=1, and with γ=0.5\gamma=0.5, K(A,B)=exp(0.5)0.607K(A,B)=\exp(-0.5)\approx0.607.

Behavior, invariant, and complexity

The key invariant is locality: with positive gamma, increasing squared distance never increases RBF similarity.

CaseWhat happensWhy it matters
Self-matchK(x,x)=1K(x,x)=1A point is maximally similar to itself.
Near pointValue stays close to 1Local examples keep influence.
Far pointValue fades toward 0Distant examples become almost irrelevant.
Gamma equals 0Every pair maps to 1Not local: distance stops mattering.
Gamma below 0Farther points can exceed 1Not the RBF setting for this node.
Small positive gammaSame distance decays slowlyNeighborhood is wide.
Large positive gammaSame distance decays quicklyNeighborhood is tiny.

If each vector has d numeric features, one RBF evaluation costs O(d)O(d): the loop reads each coordinate once. Comparing one query against n stored points costs O(nd)O(nd).

Common confusions

MistakeRepair
”RBF is angle similarity.”RBF is local distance similarity. Same direction is not enough.
”Larger gamma is always better.”Larger gamma can make neighborhoods too tiny and over-local.
”Smaller gamma is always smoother and safer.”Smaller gamma can make too many points look alike.
”Bandwidth sigma moves in the same direction as gamma.”Under γ=1/(2σ2)\gamma = 1 / (2\sigma^2), larger γ\gamma means smaller σ\sigma and a smaller neighborhood.
”Feature scaling is optional.”One large-scale feature can dominate squared distance. Normalize or scale meaningful features first.
”I need the infinite feature map proof now.”RBF can be described that way, but this node only needs the distance-decay behavior.

Graph connections and practice

RBF comes after the general kernel idea because it is a named kernel function. It contrasts with linear kernel because linear similarity rewards origin-based alignment, while RBF rewards local nearness. It also contrasts with polynomial kernel: polynomial kernels add finite-degree interactions, while RBF behaves like a much richer local neighborhood rule. The next nearby named kernel is sigmoid kernel, which returns to a squashed dot product and has more delicate validity conditions.

Kernel function pathFeature maps motivate kernels; named kernels choose different notions of similarity.
Feature map

idea layer

Kernel

idea layer

Linear

named choice

Polynomial

named choice

RBF

named choice

Sigmoid

named choice

Prediction questions:

  1. From anchor A, which shared point has the highest RBF score besides A itself?
  2. What happens to A -> C as gamma increases from 0.1 to 1.0?
  3. Why can the dot product rank E=(4,4) above B=(2,1) even though B is the local neighbor?

Graph connections : RBF Kernel