Connection between Representer Theorem and Mercer's Theorem?

477 Views Asked by At

In my studies about Reproducing Kernel Hilbert Spaces (RKHS) I came across a bit of confusion as to what kind of functions are defined in this RKHS $\mathcal{H}$. Specifically, the Representer Theorem and Mercer's Theorem both seem to provide conflicting definitions for the kind of functions in this space. Assume in the following $x \in \mathcal{X} \in \mathbb{R}^d$, $a \in \mathbb{R}$, $\lambda \in \mathbb{R}$, $n\in \mathbb{N}$, and that $k(x,x')$ is positive definite.


1) In the representer theorem, the RKHS $\mathcal{H}$ seems to contain functions $f$ of the form:

$$f(\cdot)=\sum^{n}_{i=1}a_ik(\cdot,x_i)$$ $$f(x)=\sum^{n}_{i=1}a_ik(x,x_i)$$ Equation 1

which apparently permits use of the reproducing property:

$$f(x)=\langle k(\cdot,x),f(\cdot) \rangle$$

So far so good.


2) If we use Mercer's Theorem, we can start with the spectral decomposition of the kernel:.

$$k(x,x')=\sum_{j=1}^{\infty}\lambda_je_j(x)e_j(x')$$

where $\lambda_j$ are the eigenvalues and $e_j$ the eigenfunctions of the decomposition. If we then consider the following vectors... $$k(x,\cdot)=[\sqrt{\lambda_1}e_1(x),...,\sqrt{\lambda_{\infty}}e_{\infty}(x)]^T$$ $$k(\cdot,x')=[\sqrt{\lambda_1}e_1(x'),...,\sqrt{\lambda_{\infty}}e_{\infty}(x')]^T$$ $$f(\cdot)=[\frac{a_1}{\sqrt{\lambda_1}},...,\frac{a_{\infty}}{\sqrt{\lambda_{\infty}}}]^T$$

...we can also define functions, but this time:

$$f(x)=\sum^{\infty}_{i=1}a_ie_i(x)$$ Equation 2

for which the reproducing property also holds:

$$f(x)=\langle k(\cdot,x),f(\cdot) \rangle$$


Obviously Equation 1 and Equation 2 seem different, but both describe functions defined in a RKHS, only that each RKHS seems to be finite in the case of the Representer Theorem (or infinite, if the cardinality of the set $\mathcal{X}$ is infinite) and infinite in the case of Mercer's Theorem (with each basis vector corresponding to an eigenfunction of the kernel).

Is there a fundamental, philosophical difference between the two approaches? Or are they really the same or at least related?

1

There are 1 best solutions below

1
On

The representer theorem doesn't state that any function in $H$ has the form $f(\cdot) = \sum_{i=1}^n \alpha_i k(x_i, \cdot).$ It just guarantees this form for minimizers of an empirical risk. In general it holds that the space $$ H_{\text{pre}} = \left\{ \sum_{i=1}^n \alpha_i k(x_i, \cdot) \, | \, n \in \mathbb{N}, \alpha_i \in \mathbb{R}, x_i \in X \right\} $$ is dense in the RKHS $H$ of the kernel $k$.