What is the RKHS associated with the polynomial kernel?

239 Views Asked by At

Let $K:\mathbb{R}^2\times\mathbb{R}^2\mapsto\mathbb{R}$ be a positive-type symmetric function such that $K(x,y)=(x\cdot y + c)^2$ where $c\in\mathbb{R}$. We can write

$$K(x,y)=(x_1y_1+x_2y_2+c)^2= \begin{bmatrix}x_1^2\\x_2^2\\\sqrt{2}x_1x_2\\\sqrt{2c}x_1\\\sqrt{2c}x_2\\c\end{bmatrix} \cdot \begin{bmatrix}y_1^2\\y_2^2\\\sqrt{2}y_1y_2\\\sqrt{2c}y_1\\\sqrt{2c}y_2\\c\end{bmatrix} $$

Hence, I reason that $K(x,.)$ can be regarded as a function from $\lbrace1,2,3,4,5,6\rbrace$ to $\lbrace x_1^2,x_2^2, \sqrt{2}x_1x_2,\sqrt{2c}x_1,\sqrt{2c}x_2,c\rbrace$, and we can generate (and complete) an inner product space from $x\in\mathbb{R}^2$.

On the other hand, one can also write

$$K(x,y)=(x_1y_1+x_2y_2+c)^2= \begin{bmatrix}x_1^2\\x_2^2\\x_1x_2\\x_1x_2\\\sqrt{2c}x_1\\\sqrt{2c}x_2\\c\end{bmatrix} \cdot \begin{bmatrix}y_1^2\\y_2^2\\y_1y_2\\y_1y_2\\\sqrt{2c}y_1\\\sqrt{2c}y_2\\c\end{bmatrix} $$

Although this is a slightly more boring mapping (and there might be a clue there), similar reasoning to the above permits one to regard $K(x,.)$ as a function outside the aforementioned space, since a sequence of length $7$ is not in the span of $K(x,.)$ as previously defined.

By the Moore-Aronszajn Theorem, $K(x,.)$ is a reproducing kernel in exactly one Hilbert space. However, since a sequence cannot be evaluated at $x\in\mathbb{R}^2$, it cannot be a reproducing kernel in either of the Hilbert spaces implied above. Which Hilbert space is it actually a reproducing kernel on?

2

There are 2 best solutions below

5
On

Hence, I reason that $K(x,\cdot)$ can be regarded as a function from $\lbrace1,2,3,4,5,6\rbrace$ to $\lbrace x_1^2,x_2^2, \sqrt{2}x_1x_2,\sqrt{2c}x_1,\sqrt{2c}x_2,c\rbrace$

This sentence doesn't make much sense, I assume you meant that $K(x,\cdot)$ can be regarded as a function from $\mathbb R^6$ to $\mathbb R^6$, right ? If that's what you meant, then it is wrong.

In both cases, what you've shown is that there exist two distinct feature maps $\phi : \mathbb R^2 \to \mathbb R^6$ and $\varphi : \mathbb R^2 \to \mathbb R^7$ such that for all $x,y\in\mathbb R^2$ : $$K(x,y) = \langle\phi(x),\phi(y)\rangle = \langle\varphi(x),\varphi(y)\rangle $$ As you've just shown, feature maps need not be unique (in fact you could split the terms in the expression of $K(x,y)$ further and come up with feature mappings to arbitrarily large dimension), but the point is that regardless, $K$ is still defined on $\mathbb R^2$, and therefore the RKHS associated with $K$ is given by the completion of $\text{span}\lbrace K(x,\cdot)\mid x\in \mathbb R^2\rbrace $ which is a Hilbert space of functions defined on $\mathbb R^2$, and there is no contradiction with Moore–Aronszajn theorem.

The existence of feature maps is what make kernel methods so popular in practical applications : they allow us to implicitly map our low dimensional data to a much higher dimensional space where the problem is easier to solve, with no extra computational expense.

5
On

Here is the answer I've come up with to my own question. I felt it necessary to try and show this, because most texts simply show there is a mapping from an original vector space $\mathbb{R}^d$ to one or more Hilbert spaces $\mathbb{R}^{d'}$ where $d<d'$, while failing to explain how $\mathbb{R}^{d'}$ is a function space, and waving hands while talking about reproducing kernels. While $\mathcal{H}$ is unique, the way it is defined can be generalised beyond $\mathbb{R}^M$, as StratosFair helpfully noted - see the comments below.

Proposition. The function $K:\mathbb{R}^d\times\mathbb{R}^d\mapsto\mathbb{R}$ such that $K(x,y)=(x\cdot y+c)^2, c\in\mathbb{R}$ is a reproducing kernel on the Hilbert space $\mathcal{H}$ comprising the set $$\big\{f:\mathbb{R}^d\mapsto\mathbb{R}\hspace{0.3cm} s.t. \hspace{0.2cm} f(.)=\langle x_f,\phi(.)\rangle_{\mathbb{R}^M}|\hspace{0.1cm} x_f\in\mathbb{R}^M\big\}$$ and the inner product $$\langle f,g\rangle_\mathcal{H}=\langle x_f,x_g \rangle_{\mathbb{R}^M}$$ where $M$ = $^{d+2}C_2$ and $\phi:\mathbb{R}^d\mapsto\mathbb{R}^M$ is the mapping $$\phi([x_1, x_2, ..., x_d]^T)= \begin{bmatrix}x_1^2\\\sqrt{2}x_1x_2\\x_2^2\\\sqrt{2}x_1x_3\\\sqrt{2}x_2x_3\\x_3^2\\\vdots\\\sqrt{2}x_1x_d\\\sqrt{2}x_2x_d\\\vdots\\\sqrt{2}x_{d-1}x_d\\x_d^2\\\sqrt{2c}x_1\\\sqrt{2c}x_2\\\vdots\\\sqrt{2c}x_d\\c\end{bmatrix}$$ $K$ is not a reproducing kernel on any other Hilbert space, and $\mathcal{H}$ admits no other reproducing kernel.

Proof. Define a mapping $I:\mathcal{H}\mapsto\mathbb{R}^M, I(f(.))=I(\langle x_f,.\rangle_{\mathbb{R}^M})=x_f$. Then $I$ is clearly a bijection. Furthermore, $$\lVert f(.) \rVert_{\mathcal{H}}^2=\langle f(.),f(.) \rangle_{\mathcal{H}}=\langle x_f,x_f \rangle_{\mathbb{R}^M}=\langle I(f(.)),I(f(.)) \rangle_{\mathbb{R}^M}=\lVert I(f(.)) \rVert_{\mathbb{R}^M}^2$$

Therefore, $I$ is an isometric isomorphism. Since $\mathbb{R}^M$ is a finite metric space, it is complete. Consequently, $\mathcal{H}$ is a Hilbert space.

Next, one can easily verify that for $x,y\in\mathbb{R^d}$, $K(x,y)=\langle\phi(x),\phi(y)\rangle_{\mathbb{R}^M}$. Therefore, the set of functions $\big\{K(x,.)|\hspace{0.1cm} x\in\mathbb{R}^d\big\}=\big\{\langle\phi(x),\phi(.)\rangle_{\mathbb{R}^M}|\hspace{0.1cm}x\in\mathbb{R}^d\big\}$ is in $\mathcal{H}$.

Now, for $f\in\mathcal{H}$ and any $x\in\mathbb{R}^d$, $$\langle f,K(x,.)\rangle_{\mathcal{H}}=\bigg\langle\langle x_f,\phi(.)\rangle_{\mathbb{R}^M},\langle\phi(x),\phi(.)\rangle_{\mathbb{R}^M}\bigg\rangle_{\mathcal{H}}=\langle x_f,\phi(x) \rangle_{\mathcal{H}}=\langle x_f, \phi(x)\rangle_{\mathbb{R}^M}=f(x)$$

which shows that $K$ is a reproducing kernel on $\mathcal{H}$. The uniqueness of the reproducing kernel Hilbert space associated with $K$ follows from the Moore-Aronszajn Theorem. To show that there is no other reproducing kernel on $\mathcal{H}$, consider for a moment the possibility that $R$ is another reproducing kernel. Then $$\langle R(x,.),K(y,.) \rangle_{\mathcal{H}} = R(x,y) \hspace{0.4cm} \text{and} \hspace{0.4cm} \langle K(y,.),R(x,.) \rangle_{\mathcal{H}} = K(y,x)$$

The symmetry of the inner product and of $K$ in their respective arguments leads to $R=K$.