Skip to content

Kernel Methods

🚀 Kernel Methods & Hilbert Spaces

Kernel methods are a class of algorithms for pattern analysis, which allow us to map data from a finite-dimensional space into an Infinite-Dimensional Feature Space, where non-linear patterns can be solved linearly.


🟢 1. Reproducing Kernel Hilbert Space (RKHS)

Hilbert Spaces (H\mathcal{H})

A Hilbert space is an abstract vector space possessing the structure of an inner product that allows length and angle to be measured. It is also complete, meaning that every Cauchy sequence in the space converges to an element in the space.

The Representer Theorem

A fundamental result which states that the solution to a wide range of optimization problems (like SVMs) can be expressed as a finite linear combination of kernel evaluations at the training data points: f(x)=i=1nαiK(xi,x)f^*(x) = \sum_{i=1}^n \alpha_i K(x_i, x)


🟡 2. The Kernel Trick

The “trick” allows us to operate in a high-dimensional feature space without ever explicitly computing the coordinates of the data in that space.

The Kernel Function (KK)

A kernel is a function that returns the inner product between the images of two points in the feature space: K(x,y)=ϕ(x),ϕ(y)HK(\mathbf{x}, \mathbf{y}) = \langle \phi(\mathbf{x}), \phi(\mathbf{y}) \rangle_{\mathcal{H}} Where ϕ\phi is the mapping from the input space to the Hilbert space.

Common Kernels

  • Linear: K(x,y)=xTyK(x, y) = x^T y
  • Polynomial: K(x,y)=(xTy+c)dK(x, y) = (x^T y + c)^d
  • Radial Basis Function (RBF/Gaussian): K(x,y)=exp(γxy2)K(x, y) = \exp(-\gamma \|x - y\|^2)

🔴 3. Advanced Kernel Theory

Mercer’s Theorem

A function K(x,y)K(x, y) is a valid kernel if and only if it is symmetric and positive semi-definite (PSD). This ensures that a corresponding feature mapping ϕ\phi exists and that the optimization problem remains convex.

Gaussian Processes (GP)

A GP is a stochastic process such that every finite collection of those random variables has a multivariate normal distribution. It is defined entirely by its Mean Function and Covariance (Kernel) Function.

  • GPs are used for Bayesian Optimization and non-parametric regression.