Graph connections

Draft

Linear Kernel

Use the original dot product as the simplest kernel baseline when the raw feature geometry is already useful.

concept intermediate machine-learningkernelssimilarity

Hook problem: start with the no-lift baseline

Before adding curved boundaries or local neighborhoods, ask whether the original coordinates already work.

The linear kernel is the kernel that refuses to invent new features.

No lift: original-space similarityThe linear kernel is just the dot product in the coordinates you already have.
Linear kernel from anchor A
PointDotDistance^2K(A, z)
A202
B313
C151
D-210-2

First naive idea: dot products are enough

If two vectors point in similar directions and have useful magnitudes, the dot product is a reasonable similarity score.

The pain appears when raw alignment is not the pattern: XOR-like interactions, rings, and local islands need something beyond a straight linear geometry.

Formal version

K(x,z)=xTzK(x,z)=x^Tz

For d input features, computing this value costs O(d) time. The feature map is the identity map, so phi(x)=x.

Interactive comparison

Kernel similarity lab

x * z: Keeps the original coordinates and measures ordinary alignment. RBF-only decay rate; other kernels keep fixed parameters.

Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.

A -> A2

similarity; dot 2, distance^2 0

A -> B3

similarity; dot 3, distance^2 1

A -> C1

similarity; dot 1, distance^2 5

A -> D-2

similarity; dot -2, distance^2 10

Implementation sketch

function linearKernel(x: number[], z: number[]) {
  return x.reduce((sum, value, index) => sum + value * z[index], 0);
}

Common confusions

  • Linear kernel does not mean the data is easy; it means the model compares original features linearly.
  • A linear kernel can still be strong with good engineered features.
  • Adding a polynomial or RBF kernel changes the geometry, not just the formula name.

Connections

Kernel function pathFeature maps motivate kernels; named kernels choose different notions of similarity.
Feature map

idea layer

Kernel

idea layer

Linear

named choice

Polynomial

named choice

RBF

named choice

Sigmoid

named choice

The polynomial kernel starts from this same dot product and raises an affine version of it to a degree. RBF leaves dot-product alignment and measures distance decay instead.

Exercises

  1. What is the feature map for the linear kernel?
  2. Why is O(d) the direct computation cost?
  3. Name one pattern that a linear kernel might struggle with.

Graph connections : Linear Kernel