Graph connections

Draft

Kernel Function

Compare two inputs as if they were mapped into a feature space, often without constructing that space explicitly.

concept intermediate machine-learningkernelssimilarity

Hook problem: mapping first can be expensive

A feature map says, “rewrite every input, then compare the rewritten vectors.” That is clear, but it can be wasteful when the rewritten vector is huge.

A kernel function asks for the comparison directly.

The kernel shortcutA kernel returns the inner product after mapping, often without constructing the mapped vector.
Explicit route9

phi(A) * phi(B)

Kernel route9

K(A, B) = (A * B)^2

Same answer

The shortcut is useful when phi is huge or infinite.

First naive idea: explicitly build every feature

The direct route is:

  1. compute phi(x);
  2. compute phi(z);
  3. take their inner product.

This is fine for a small quadratic map. It is painful when the feature space has thousands, millions, or infinitely many coordinates.

Core invention: compare through a shortcut

A kernel function has the form:

K(x,z)=ϕ(x),ϕ(z)K(x,z)=\langle \phi(x), \phi(z)\rangle

The inputs x and z stay in the original space. The function returns the inner product they would have after the feature map phi.

Interactive similarity lab

Kernel similarity lab

exp(-gamma ||x - z||^2), gamma = 0.500: Turns nearness into similarity; far points fade toward zero.

Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.

A -> A1

similarity; dot 2, distance^2 0

A -> B0.607

similarity; dot 3, distance^2 1

A -> C0.082

similarity; dot 1, distance^2 5

A -> D0.007

similarity; dot -2, distance^2 10

Static no-JS fallback:

Similarity fades with distanceRBF makes nearby points strongly similar and far points almost irrelevant.
RBF kernel from anchor A
PointDotDistance^2K(A, z)
A201
B310.607
C150.082
D-2100.007

Valid-kernel boundary

Not every similarity score is a valid kernel for kernel methods. A valid kernel must behave like an inner product in some feature space, which means its Gram matrix should be positive semidefinite for any finite sample.

For this node, keep the practical rule: named kernels are useful because they come with known geometry and known conditions. The proof machinery belongs in a later node.

Implementation sketch

function rbfKernel(x: Point, z: Point, gamma: number) {
  const squaredDistance = (x.a - z.a) ** 2 + (x.b - z.b) ** 2;
  return Math.exp(-gamma * squaredDistance);
}

Common confusions

  • A kernel is a function of two inputs, not a new coordinate vector.
  • Kernel value is similarity in a chosen geometry, not universal semantic similarity.
  • Some kernels have parameters; changing them changes the geometry.
Kernel function pathFeature maps motivate kernels; named kernels choose different notions of similarity.
Feature map

idea layer

Kernel

idea layer

Linear

named choice

Polynomial

named choice

RBF

named choice

Sigmoid

named choice

Exercises

  1. What does K(x,z) return?
  2. Why might we avoid constructing phi(x) explicitly?
  3. Why is the sigmoid kernel treated more cautiously than RBF?

Graph connections : Kernel Function