Optimization problem using Reproducing Kernel Hilbert Spaces

Question

Optimization problem using Reproducing Kernel Hilbert Spaces

2k Views Asked by Bumbble Comm At 28 Mar 2026 - 1:50

I am encountering a problem concerning Reproducing Kernel Hilbert Spaces (RKHS) in the context of machine learning using Support Vector Machines (SVMs).

With refernece to this paper [Olivier Chapelle, 2006], Section 3, I will try to be brief and focused on my problem, thus I may avoid giving rigorous description of what I am using below.

Let the following optimization problem: $$ \displaystyle \min_{\mathbf{w},b}\: \lVert\mathbf{w}\rVert^2 + C\sum_{i=1}^{n}L(y_i, \mathbf{w}\cdot\mathbf{x}_i+b), $$ where $L(y,t)=\operatorname{max}(0,1-yt)$ is a loss function, the so-called "hinge loss". Trying to introduce kernels, in order to consider non-linear SVMs, the author reformulates the aforementioned optimization problem, looking for a function in a RKHS, $\mathcal{H}$, that minimizes the following functional: $$ F[f]=\lambda\lVert f \rVert^2_\mathcal{H} + \sum_{i=1}^{n}L(y_i, f(\mathbf{x}_i)+b). $$ I understand the following of his work; my question is the following: What if I had some other loss function (not the hinge-loss above), which is not expressed solely by the inner product $\mathbf{w}\cdot\mathbf{x}_i$, which -if I understand correctly- is "replaced" by $f(\mathbf{x}_i)$, but instead I had some loss function of the form: $$ \mathbf{w}\cdot\mathbf{x}_i+b+\sqrt{\mathbf{w}^TA\mathbf{w}}, $$ where $A$ is a positive-definite symmetric matrix? I mean, is there any way of expressing the above quadratic form ($\sqrt{\mathbf{w}^TA\mathbf{w}}$) using the function $f$, such that I can express my optimization problem in the context of RKHS?

On the other hand, the theory suggests that, whatever the loss-function, $L$, is, the solution of the above reformulated problem would be in the form: $$ f(\mathbf{x})=\sum_{i=1}^{n}\alpha_ik(\mathbf{x}_i, \mathbf{x}), $$ where $k$ is the kernel associated with the adopted RKHS. Have I understood that correctly? The solution would be the above even if my loss function included terms like $\sqrt{\mathbf{w}^TA\mathbf{w}}$?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 19 Jun 2014 - 7:59

Beginning with the answer to your second problem, suppose that $f \in H$, where $H$ is the reproducing kernel hilbert space. Let $S$ be the subspace spanned by the kernel functions $k(x_i, \cdot)$. Then by the theory of Hilbert spaces, $f$ can be written as $f = f_S + f_P$ where $$f_S(x) = \sum_{i=1}^n a_i k(x_i, x),$$ and $f_P$ is a function perpendicular to $S$. Moreover by the Pythagorean theorem $$\| f \|^2 = \| f_S \|^2 + \| f_P \|^2.$$ In particular this tells us that $\|f\| > \|f_S\|$ if $f_P \neq 0$.

Now consider $f(x_i)$, which can be written as $$f(x_i)=\langle f, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + \langle f_P, k(x_i, \cdot) \rangle = \langle f_S, k(x_i, \cdot) \rangle + 0 = \langle f_S, k(x_i, \cdot) \rangle = f_S(x_i)$$

Thus for every $f$ we have $$\sum_{i=1}^n L(y_i, f(x_i) + b) = \sum_{i=1}^n L(y_i, f_S(x_i) + b)$$

Hence, $$F[f] = \lambda \| f\|^2 + \sum_{i=1}^n L(y_i, f(x_i) + b) > \lambda \| f_S\|^2 + \sum_{i=1}^n L(y_i, f_S(x_i) + b) = F(f_S)$$ and this holds for all $f \in H$. This means if a function is going to minimize $F$, it must be in the subspace $S$ and is a linear combination of kernel functions.

As for the first question, quadratic terms resembling $w^T A w$ appear through what is known as the Graham matrix, which is made from kernels: $$K = \left( k(x_i,x_j) \right)_{i,j=1}^n.$$ It is straightforward to prove that this matrix is symmetric and positive (semi)-definite, since if $a = (a_1, a_2, ..., a_n)$ then $$a^T K a = \left\langle \sum_{i=1}^n a_i k(x_i, \cdot), \sum_{j=1}^n a_j k(x_j, \cdot)\right\rangle=\left\|\sum_{i=1}^n a_i k(x_i, \cdot)\right\|^2$$

This gives us our first hint at how to recast $w^T A w$ into our language of reproducing kernel hilbert spaces.

Take for instance $$A = diag(a_1,a_2,a_3,..., a_n)$$ where each $a_i > 0$. Then $$w^T A w = \sum_{i=1}^n a_i w_i^2$$

Now imagine replacing $w$ with $f$, and each $w_i = f(x_i)$. Then $$\sum_{i=1}^n a_i w_i^2 = \sum_{i=1}^n a_i f(x_i)^2$$

By the same reasoning above, $$\sum_{i=1}^n a_i f(x_i)^2 = \sum_{i=1}^n a_i f_S(x_i)^2$$, and so we may add this to the loss function and still be guaranteed that a minimizer will be a linear combination of kernel functions.

So in short, you may introduce the term you want into your loss function. Here keeping in mind that $w = (f(x_1), f(x_2),...,f(x_n))$.

**Bumbble Comm** · Accepted Answer

I would like to clarify my final question as descibed by the discussion with @Joel above (see the comments).

Let $\mathbf{w}=(w_1,\ldots,w_n)^T$, $\mathbf{x}_i=(x_{i1},\ldots,x_{in})^T\in\mathbb{R}^n$, $i=1,\ldots,m$, and $A=\big(a_{ij}\big)_{i,j=1}^{n}$ an $n\times n$ symmetric positive definite real matrix.

Let's suppose that we would like to minimize the following quantity with respect to $\mathbf{w}$

$$ J =\mathbf{w}\cdot\mathbf{w} + \sum_{i=1}^{m}\mathbf{w}\cdot\mathbf{x}_i + \mathbf{w}^TA\mathbf{w}. $$

Instead of the above optimization problem, we choose to look for a function $f$ that minimizes a functional, so that the problem remains equivalent with the first one. Let this function belong to a Reproducing Kernel Hilbert Space $\mathcal{H}$. The appropriate functional should be of the form $$ \Phi[f]= \big\lVert f \big\rVert^2_{\mathcal{H}} + \sum_{i=1}^{m}f(\mathbf{x}_i) + \cdots, $$ but I do not know how to express the quadratic form $\mathbf{w}^TA\mathbf{w}$ in terms of f. May you help?

What I have thought so far is as follows. We have "replaced" the quantity $\mathbf{w}\cdot\mathbf{w}$ by the norm $\big\lVert f \big\rVert^2_{\mathcal{H}}$, so we probably could write $$ \mathbf{w}^TA\mathbf{w} = \mathbf{w}^T\big(LDL^T\big)\mathbf{w} = \mathbf{w}^T \big(LD^{1/2}\big)\big(LD^{1/2}\big)^T \mathbf{w} = \mathbf{w'}\cdot\mathbf{w'}, $$ where $\mathbf{w'}=\Big(LD^{1/2}\Big)^T\mathbf{w}$. Could we then find a connection between the norm we should use instead?

Optimization problem using Reproducing Kernel Hilbert Spaces

There are 2 best solutions below

Related Questions in HILBERT-SPACES

Related Questions in CONVEX-OPTIMIZATION

Related Questions in CALCULUS-OF-VARIATIONS

Trending Questions

Popular # Hahtags

Popular Questions