Closure of balls in Reproducing Kernel Hilbert Space (RKHS)

167 Views Asked by At

Let $X \subset \mathbb{R}^m$ be compact, and $k: X\times X \rightarrow \mathbb{R}$ be a universal kernel function, in the sense that the corresponding RKHS $\mathcal{H}_k$ is dense in $C(X)$ under the uniform metric $\| \cdot \|_{\infty}$. Denote by $\| \cdot \|_{\mathcal{H}_k}$ the RKHS-norm.

The literature in statistical learning tends to argue for the use of kernel methods based on the denseness of $\mathcal{H}_k$ in $(C(X),\|\cdot\|_{\infty})$. However, most of times the rigorous analysis in statistical learning theory is restricted to a subset of $\mathcal{H}_k$, say $$ B_M := \{ h\in \mathcal{H}_k : \|h\|_{\mathcal{H}_k} \leq M \} $$ for some constant $M>0$.

I am wondering how well $B_M$ can approximate $C(X)$. For that, I would like to know more about the closure (with respect to $\|\cdot\|_{\infty}$) of $B_M$. Is there a clean form of the closure?

Here is what I tried: By Mercer's theorem, we have $$ k(x,x') = \sum_{j=1}^{\infty} \lambda_j \phi_j(x) \phi_j(x'). $$ Let $\varphi_j(x) = \sqrt{\lambda_j} \phi_j(x)$, and then $(\varphi_j)$ is an orthonormal basis of $\mathcal{H}_k$. We rewrite $$ B_M = \{ x \mapsto \langle\beta, \varphi(x) \rangle_{\ell_2} : \|\beta\|_{\ell_2} \leq M \}. $$ But I don't know how to proceed further.

1

There are 1 best solutions below

0
On

It looks like $B_M$ is closed in $C(X)$. My knowledge of RKHS is restricted to Gaussian processes, so it would be better if someone checks this answer.

Denote by $M(X)$ the space of finite signed measures on $X$. Define the covariance operator $\mathcal K:M(X)\to \mathcal H_k$ corresponding to $k$ by $\mathcal K \mu (x) = \int_{X} k(x,y) \mu(dy)$.

Let $\mathscr{H}_k$ be the completion of $M(X)$ with respect to the scalar product $$(\nu, \mu)_{\mathscr{H}_k} = \int_{X^2} k(x,y) \mu(dx)\nu(dy) = \langle \mathcal K\mu,\nu\rangle. \tag{1}$$ The covariance operator $\mathcal K$ can be extended to $\mathscr H_k$ by continuity; it actually defines a Hilbert space isomorphism between $\mathscr H_k$ and $\mathcal H_k$.

Now consider a sequence $h_n = \mathcal K f_n \in B_M$ and assume that $h_n\to h$ in $C(X)$. Since $||f_n||_{\mathscr H_k} = ||h_n||_{\mathcal H_k}\le M$, there is a subsequence $f_{n_m}$ which converges weakly to some $f\in \mathscr H_k$, i.e. for all $g\in \mathscr H_k$, $$ (f_{n_m},g)_{\mathscr H_k} \to (f,g)_{\mathscr H_k}, m\to\infty. $$ Now take $g = \delta_x, x\in X$: $$ (f_{n_m},\delta_x)_{\mathscr H_k} = \langle \mathcal K f_{n_m},\delta_x\rangle = \mathcal Kf_{n_m}(x) = h_{n_m}(x) \to \mathcal (f,\delta_x)_{\mathscr H_k} = \mathcal K f(x), m\to\infty. $$ It follows that $h = \mathcal K f$, so $h\in \mathcal H_k$; it is easy to see that $||h||_{\mathcal H_k}\le M$.


Remarks:

  1. The density of $\mathcal H_k$ in $C(X)$ is important: it is equivalent to $\mathcal K$ being injective on $M(X)$ (and on $\mathscr H_k$).

  2. It seems moreover that $B_M$ is closed not only in $C(X)$, but also in $\mathscr H_k$. In this case we can't take $g = \delta_x$, since the elements of $\mathscr H_k$ are distributions. But we can observe from (1) that $\langle \mathcal K \cdot, \cdot\rangle$ is continuous on $\mathscr H_k\times \mathscr H_k$ and take arbitrary $g = \mathcal K b \in \mathcal H_k$ to get $$ \langle \mathcal Kf, g\rangle = (f,g)_{\mathscr H_k} = \lim_{m\to\infty} (f_{n_m},g)_{\mathscr H_k} = \lim_{m\to\infty} \langle h_{n_m},\mathcal K b \rangle = \langle h,\mathcal K b \rangle = \langle h,g\rangle, $$ which implies that $h = \mathcal K f$.