I am encountering a problem concerning Reproducing Kernel Hilbert Spaces (RKHS) in the context of machine learning using Support Vector Machines (SVMs).
With refernece to this paper [Olivier Chapelle, 2006], Section 3, I will try to be brief and focused on my problem, thus I may avoid giving rigorous description of what I am using below.
Let the following optimization problem: $$ \displaystyle \min_{\mathbf{w},b}\: \lVert\mathbf{w}\rVert^2 + C\sum_{i=1}^{n}L(y_i, \mathbf{w}\cdot\mathbf{x}_i+b), $$ where $L(y,t)=\operatorname{max}(0,1-yt)$ is a loss function, the so-called "hinge loss". Trying to introduce kernels, in order to consider non-linear SVMs, the author reformulates the aforementioned optimization problem, looking for a function in a RKHS, $\mathcal{H}$, that minimizes the following functional: $$ F[f]=\lambda\lVert f \rVert^2_\mathcal{H} + \sum_{i=1}^{n}L(y_i, f(\mathbf{x}_i)+b). $$ I understand the following of his work; my question is the following: What if I had some other loss function (not the hinge-loss above), which is not expressed solely by the inner product $\mathbf{w}\cdot\mathbf{x}_i$, which -if I understand correctly- is "replaced" by $f(\mathbf{x}_i)$, but instead I had some loss function of the form: $$ \mathbf{w}\cdot\mathbf{x}_i+b+\sqrt{\mathbf{w}^TA\mathbf{w}}, $$ where $A$ is a positive-definite symmetric matrix? I mean, is there any way of expressing the above quadratic form ($\sqrt{\mathbf{w}^TA\mathbf{w}}$) using the function $f$, such that I can express my optimization problem in the context of RKHS?
On the other hand, the theory suggests that, whatever the loss-function, $L$, is, the solution of the above reformulated problem would be in the form: $$ f(\mathbf{x})=\sum_{i=1}^{n}\alpha_ik(\mathbf{x}_i, \mathbf{x}), $$ where $k$ is the kernel associated with the adopted RKHS. Have I understood that correctly? The solution would be the above even if my loss function included terms like $\sqrt{\mathbf{w}^TA\mathbf{w}}$?
I would like to clarify my final question as descibed by the discussion with @Joel above (see the comments).
Let $\mathbf{w}=(w_1,\ldots,w_n)^T$, $\mathbf{x}_i=(x_{i1},\ldots,x_{in})^T\in\mathbb{R}^n$, $i=1,\ldots,m$, and $A=\big(a_{ij}\big)_{i,j=1}^{n}$ an $n\times n$ symmetric positive definite real matrix.
Let's suppose that we would like to minimize the following quantity with respect to $\mathbf{w}$
$$ J =\mathbf{w}\cdot\mathbf{w} + \sum_{i=1}^{m}\mathbf{w}\cdot\mathbf{x}_i + \mathbf{w}^TA\mathbf{w}. $$
Instead of the above optimization problem, we choose to look for a function $f$ that minimizes a functional, so that the problem remains equivalent with the first one. Let this function belong to a Reproducing Kernel Hilbert Space $\mathcal{H}$. The appropriate functional should be of the form $$ \Phi[f]= \big\lVert f \big\rVert^2_{\mathcal{H}} + \sum_{i=1}^{m}f(\mathbf{x}_i) + \cdots, $$ but I do not know how to express the quadratic form $\mathbf{w}^TA\mathbf{w}$ in terms of f. May you help?
What I have thought so far is as follows. We have "replaced" the quantity $\mathbf{w}\cdot\mathbf{w}$ by the norm $\big\lVert f \big\rVert^2_{\mathcal{H}}$, so we probably could write $$ \mathbf{w}^TA\mathbf{w} = \mathbf{w}^T\big(LDL^T\big)\mathbf{w} = \mathbf{w}^T \big(LD^{1/2}\big)\big(LD^{1/2}\big)^T \mathbf{w} = \mathbf{w'}\cdot\mathbf{w'}, $$ where $\mathbf{w'}=\Big(LD^{1/2}\Big)^T\mathbf{w}$. Could we then find a connection between the norm we should use instead?