Graph connections

Draft

Sigmoid Kernel

Squash an affine dot product with tanh, while remembering that this kernel is parameter-sensitive and not always valid.

concept intermediate machine-learningkernelssimilarity

Hook problem: dot products can grow without a bound

The linear and polynomial kernels can produce large positive or negative values. One tempting repair is to squash the dot-product score through a nonlinear function.

The sigmoid kernel uses tanh, similar in shape to an old neural-network activation.

Squashed dot-product similarityThe sigmoid kernel can saturate near -1 or 1, so parameter choices matter.
Sigmoid kernel from anchor A
PointDotDistance^2K(A, z)
A200.964
B310.987
C150.905
D-2100

First naive idea: squash and call it a kernel

Squashing feels attractive because the output is bounded. But a bounded similarity is not automatically a valid kernel. Kernel methods need the pairwise matrix to behave like feature-space inner products.

Formal version

K(x,z)=tanh(γxTz+c)K(x,z)=\tanh(\gamma x^Tz+c)

The parameter gamma controls the slope of the dot-product score. The parameter c shifts the score before tanh. The output lies between -1 and 1.

Interactive comparison

Kernel similarity lab

tanh(gamma x * z + c): Squashes an affine dot product, but is not valid for every parameter choice. RBF-only decay rate; other kernels keep fixed parameters.

Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.

A -> A0.964

similarity; dot 2, distance^2 0

A -> B0.987

similarity; dot 3, distance^2 1

A -> C0.905

similarity; dot 1, distance^2 5

A -> D0

similarity; dot -2, distance^2 10

Implementation sketch

function sigmoidKernel(dot: number, gamma = 0.5, c = 1) {
  return Math.tanh(gamma * dot + c);
}

Validity caveat

Unlike the linear, polynomial with standard parameters, and RBF kernels, the sigmoid kernel is not positive semidefinite for every parameter choice and data set. Treat it as a specialized option: verify that your library accepts the parameters and that validation performance justifies it.

Common confusions

  • Sigmoid kernel is not the same thing as logistic regression.
  • Bounded output does not guarantee a valid kernel.
  • Saturation can make several large dot products look almost identical.

Exercises

  1. What role does tanh play?
  2. Why is the sigmoid kernel more delicate than RBF?
  3. What does saturation hide?

Graph connections : Sigmoid Kernel