Draft
Sigmoid Kernel
Squash an affine dot product with tanh, while remembering that this kernel is parameter-sensitive and not always valid.
Hook problem: dot products can grow without a bound
The linear and polynomial kernels can produce large positive or negative values. One tempting repair is to squash the dot-product score through a nonlinear function.
The sigmoid kernel uses tanh, similar in shape to an old neural-network activation.
| Point | Dot | Distance^2 | K(A, z) |
|---|---|---|---|
| A | 2 | 0 | 0.964 |
| B | 3 | 1 | 0.987 |
| C | 1 | 5 | 0.905 |
| D | -2 | 10 | 0 |
First naive idea: squash and call it a kernel
Squashing feels attractive because the output is bounded. But a bounded similarity is not automatically a valid kernel. Kernel methods need the pairwise matrix to behave like feature-space inner products.
Formal version
The parameter gamma controls the slope of the dot-product score. The parameter c shifts the score before tanh. The output lies between -1 and 1.
Interactive comparison
Kernel similarity lab
tanh(gamma x * z + c): Squashes an affine dot product, but is not valid for every parameter choice. RBF-only decay rate; other kernels keep fixed parameters.
Compare every point with the chosen anchor. Notice how each kernel means a different kind of close.
similarity; dot 2, distance^2 0
similarity; dot 3, distance^2 1
similarity; dot 1, distance^2 5
similarity; dot -2, distance^2 10
Implementation sketch
function sigmoidKernel(dot: number, gamma = 0.5, c = 1) {
return Math.tanh(gamma * dot + c);
}
Validity caveat
Unlike the linear, polynomial with standard parameters, and RBF kernels, the sigmoid kernel is not positive semidefinite for every parameter choice and data set. Treat it as a specialized option: verify that your library accepts the parameters and that validation performance justifies it.
Common confusions
- Sigmoid kernel is not the same thing as logistic regression.
- Bounded output does not guarantee a valid kernel.
- Saturation can make several large dot products look almost identical.
Exercises
- What role does
tanhplay? - Why is the sigmoid kernel more delicate than RBF?
- What does saturation hide?
Graph connections : Sigmoid Kernel