I am unclear on some details on how support vector machine classifiers actually work. I understand the general idea of how they work when the given data is linearly separable, so that we can separate the two classes of data by a hyperplane, and therefore we can also find the maximal margin hyperplane. If however, the input data is not linearly separable then it is my understanding that we map the data to a higher dimension in which the new data is linearly separable and we just find the maximal margin classifier in this new space. In particular, we can use a kernel K and replace the dot product $\langle x_j, x_k \rangle$ in the dual problem with $K(x_j,x_k)$ (here the $x_i$s are our input data). Then by the Moore-Aronszajn theorem, $K$ is the reproducing kernel of some reproducing kernel Hilbert space(RKHS) H, and we have $K(x_j,x_k) = \langle \phi(x_j), \phi(x_k) \rangle$, where $\phi$ is the canonical feature map. So we implicitly map our data to H in which it is linearly separable, and use $K$ in place of the inner product in H. Is this understanding sound? Moreover, what exactly happens when we use the Gaussian kernel? Since the corresponding RKHS is infinite dimensional, what exactly does it mean for the data to be linearly separable in this space? As we can't talk about a hyperplane in infinite dimensions, but it can be shown that the data will always be linearly separable (whatever this means) in this space up to the adjustment of some parameters.
2026-03-26 12:46:57.1774529217
Understanding of support vector machine classifiers and linear separability of data in infinite dimensions
78 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in HILBERT-SPACES
- $\| (I-T)^{-1}|_{\ker(I-T)^\perp} \| \geq 1$ for all compact operator $T$ in an infinite dimensional Hilbert space
- hyponormal operators
- a positive matrix of operators
- If $S=(S_1,S_2)$ hyponormal, why $S_1$ and $S_2$ are hyponormal?
- Is the cartesian product of two Hilbert spaces a Hilbert space?
- Show that $ Tf $ is continuous and measurable on a Hilbert space $H=L_2((0,\infty))$
- Kernel functions for vectors in discrete spaces
- The space $D(A^\infty)$
- Show that $Tf$ is well-defined and is continious
- construction of a sequence in a complex Hilbert space which fulfills some specific properties
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Related Questions in REPRODUCING-KERNEL-HILBERT-SPACES
- Prove strictly positive-definite kernel
- Example of an infinite dimensional Hilbert space that is not an RKHS
- Is a Reproducing Kernel Hilbert Space just a Hilbert space equipped with an "indexed basis"?
- RKHS rough definition
- Does the optimal function in kernel method have a sparse representation?
- A biliographic inquiry into Fredholm's kernel
- Orthonormal system $\{ e_n(t)\}_{n=1}^{\infty}$ is complete $\Leftrightarrow$ $k(t,t) = \sum_{n=0}^{\infty}{|e_n(t)|^2}, \forall t \in \Omega$
- Invertibility of Grammian in Reproducing Kernel Hilbert Space
- Linear regression with feature representation confusion - is design matrix column space the feature space?
- Is there a positive-semidefinite convolution kernel, that is continuous at $0$ but discontinuous elsewhere?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
I think you understand the basic ideas behind kernel SVMs correctly. Do note however, that the RKHS $H$ of a given kernel $K$ is not just any Hilbert space, but also a function space satisfying the reproducing property $$\langle h, K(x, \phi(x) \rangle = h(x)$$ for all $h \in H$ and $x \in X$. Here I denote $X$ for the sample space, often $\mathbb{R}^d$. In this high-dimensional space $H$ (it may even be infinite-dimensional, as in the case of the Gaussian kernel), we may now find a separating hyperplane. And yes, we may speak of hyperplanes even for these spaces. Admittedly, they do not have a nice geometric visualisation as in the case of two or three dimensions, but from a mathematical point of view, they are simply given by the set $$\{h \in H : w^T h = b\}$$ for some $w \in H, b \in \mathbb{R}$.
Of course, it is not clear a priori that finding the correct $w \in H$ is a feasible task. Here, the representer theorem comes in: It guarantees that the solution of the kernel SVM actually lies in the finite-dimensional subspace $$H_X = span( \{\phi(x_1),\dots,\phi(x_n)\}$$ Thus, we do not actually need to work in $H$ at all (what we need from $H_x$ is implicitly encoded in the kernel matrix).
Finally, a word on the Gaussian RKHS $H$. It satisfies the nice property of being universal. This means, loosely speaking, that we can approximate any decision boundary (between the two classes you are trying to classify using the SVM) by a function from $H$. In other words, for all data sets $X \subset \mathbb{R}^d$, there exists some $w \in H_X$ that achieves zero loss on $X$.
You may want to read the introductory chapter of the book "Support Vector Machines" by Steinwart and Christmann (Springer 2008).