Does the optimal function in kernel method have a sparse representation?

49 Views Asked by At

The kernel least square method aims to solve the following problem,

$$f^*=\arg \min_{f\in \mathcal{H}} E_{x,y}[(f(x)-y)^2]+\frac{\lambda}{2}\Vert f\Vert_{\mathcal{H}}$$ , where $\mathcal{H}$ is some pre-specified RKHS with kernel $k(x,x')$, $x\in R^d$ is a feature vector, $y\in R$ is the target value.

In practice, the finite-sample estimator $f_n$ is computed given $n$ i.i.d. samples $(x_i,y_i)_{i=1,\dots,n}$ $$f_n=\arg \min_{f\in \mathcal{H}}\frac{1}{n}\sum_{i=1}^n (f(x_i)-y_i)^2 +\frac{\lambda}{2}\Vert f\Vert_{\mathcal{H}}$$

I have 3 questions in terms of the estimator $f_n$.

  1. is $f_n$ an unbiased estimator, i.e., $E[f_n]=f^*$?
  2. is the estimator $f_n$ asymptotically consistent, i.e., $\lim_{n\rightarrow \infty} \Vert f_n -f^*\Vert_{\mathcal{H}}=0$
  3. By the representer theorem, $f_n=\sum_{i=1}^n w_ik(\cdot, x_i)$ for some $w_1,w_2,\dots,w_n$. This means $f_n$ can be sparsely represented by finite number of kernels although $\mathcal{H}$ itself is infinite dimensional. Does $f^*$ also have a sparse representation, i.e., $\exists x_1,x_2,\dots,x_M\in R^d,w_1,w_2,\dots,w_M\in R,\text{ such that }f^*=\sum_{i=1}^M w_i k(\cdot, x_i)$? Under what conditions such sparse representation exists or does not exist?

Thanks in advance.