Understanding of support vector machine classifiers and linear separability of data in infinite dimensions

Question

Understanding of support vector machine classifiers and linear separability of data in infinite dimensions

78 Views Asked by Bumbble Comm At 26 Mar 2026 - 12:46

I am unclear on some details on how support vector machine classifiers actually work. I understand the general idea of how they work when the given data is linearly separable, so that we can separate the two classes of data by a hyperplane, and therefore we can also find the maximal margin hyperplane. If however, the input data is not linearly separable then it is my understanding that we map the data to a higher dimension in which the new data is linearly separable and we just find the maximal margin classifier in this new space. In particular, we can use a kernel K and replace the dot product $\langle x_j, x_k \rangle$ in the dual problem with $K(x_j,x_k)$ (here the $x_i$s are our input data). Then by the Moore-Aronszajn theorem, $K$ is the reproducing kernel of some reproducing kernel Hilbert space(RKHS) H, and we have $K(x_j,x_k) = \langle \phi(x_j), \phi(x_k) \rangle$, where $\phi$ is the canonical feature map. So we implicitly map our data to H in which it is linearly separable, and use $K$ in place of the inner product in H. Is this understanding sound? Moreover, what exactly happens when we use the Gaussian kernel? Since the corresponding RKHS is infinite dimensional, what exactly does it mean for the data to be linearly separable in this space? As we can't talk about a hyperplane in infinite dimensions, but it can be shown that the data will always be linearly separable (whatever this means) in this space up to the adjustment of some parameters.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I think you understand the basic ideas behind kernel SVMs correctly. Do note however, that the RKHS $H$ of a given kernel $K$ is not just any Hilbert space, but also a function space satisfying the reproducing property $$\langle h, K(x, \phi(x) \rangle = h(x)$$ for all $h \in H$ and $x \in X$. Here I denote $X$ for the sample space, often $\mathbb{R}^d$. In this high-dimensional space $H$ (it may even be infinite-dimensional, as in the case of the Gaussian kernel), we may now find a separating hyperplane. And yes, we may speak of hyperplanes even for these spaces. Admittedly, they do not have a nice geometric visualisation as in the case of two or three dimensions, but from a mathematical point of view, they are simply given by the set $$\{h \in H : w^T h = b\}$$ for some $w \in H, b \in \mathbb{R}$.

Of course, it is not clear a priori that finding the correct $w \in H$ is a feasible task. Here, the representer theorem comes in: It guarantees that the solution of the kernel SVM actually lies in the finite-dimensional subspace $$H_X = span( \{\phi(x_1),\dots,\phi(x_n)\}$$ Thus, we do not actually need to work in $H$ at all (what we need from $H_x$ is implicitly encoded in the kernel matrix).

Finally, a word on the Gaussian RKHS $H$. It satisfies the nice property of being universal. This means, loosely speaking, that we can approximate any decision boundary (between the two classes you are trying to classify using the SVM) by a function from $H$. In other words, for all data sets $X \subset \mathbb{R}^d$, there exists some $w \in H_X$ that achieves zero loss on $X$.

You may want to read the introductory chapter of the book "Support Vector Machines" by Steinwart and Christmann (Springer 2008).

Understanding of support vector machine classifiers and linear separability of data in infinite dimensions

There are 1 best solutions below

Related Questions in HILBERT-SPACES

Related Questions in MACHINE-LEARNING

Related Questions in REPRODUCING-KERNEL-HILBERT-SPACES

Trending Questions

Popular # Hahtags

Popular Questions