Question about the definition of feature map arising in machine learning

338 Views Asked by At

I'm working through the following paper of learning a non-negative function in a reproducing kernel hilbert space setting (RKHS). In particular, section 2.2 on page 3 is a bit confusing to me in terms of the feature map.

We are given a kernel $K:X\times X\to\mathbb{R}$ and we denote with $\mathcal{H}$ the corresponding RKHS of $K$. Usually, one denotes the feature map by

$$k_x:=K(\cdot, x):X\to \mathcal{H}$$

with the property then that $K(x,y) = \langle k_y, k_x\rangle$. A function $\phi$ satisfying the property

$$ K(x,y) = \langle \phi(y), \phi(x)\rangle $$

is often called the feature map and the function $k_x$ the canonical feature map. It is well known that such a feature map is not unique.

The idea of the paper is to look at sum of square functions. This is strongly motivated from polynomials. In the polynomial case one says the polynomial $p(x)$ has a sum of square (SoS) representation if there are other polynomials $q_i$ such that

$$ p(x) = \sum_{i=1}^d q_i(x)^2$$

If we denote the vector of monomials up to degree $d$ by $\phi(x)=(1,x, \dots,x^d)^T$ one can write $q(x)=(q_1(x),\dots,q_d(x))^T$ as

$$q(x)=V\phi(x)$$

where the rows of $V$ contain the coefficient of the polynomials $q_i$. For univariate polynimoials its well known that being positive and having such a SoS representation is equivalent. In the above context one can then write

$$p(x)=\sum_i^dq_i(x)^2=q(x)^Tq(x) = \phi(x)^TV^TV\phi(x)$$

with $Q:=V^TV$ being pos semidefinite. Even for generic feature map $\phi$ (not necessary monomials) the condition on the pos. semidefiniteness of $Q$ leads to a non-negative function $p(x)$.

In the paper they try to generalize the above idea. For a kenerl $K:X\times X\to\mathbb{R}$ they define an associated feature map via $\phi:X\to\mathcal{H}$, where $\phi(x):=(\phi_i(x))_{i\in I}$ for $\phi_i:X\to\mathbb{R}$ and a index set $I$, e.g $\{1, 2, 3,\dots\}$. They define the space of sum of squares as

$$\mathcal{S}:=\{x\mapsto \phi^T(x)Q\phi(x): Q\ge 0\}\tag{1}$$

where $Q\ge 0$ means a positive semidefinite matrix $Q$. Couple of question to these definitions

  1. Why is such a $\phi$ a feature map? It seems to me that $\phi$ is a tuple of real valued functions while $K$ is scalar valued. That doesn't make much sense to me. In the polynomial motivation $\phi_i$ where basically a basis to represent $q$. But I don't see how this translate to the general one.
  2. They allow for countably infinite index set. How does this make sense in $(1)$?
  3. They claim that $\mathcal{S}$ is a subspace of the Hilbert Space $\mathcal{H}\otimes\mathcal{H}$. Why can $\mathcal{H}\otimes\mathcal{H}$ be represented as

$$\{x\mapsto \sum_{i,j} Q_{i,j}\phi_i(x)\phi_j(x):Q\in\mathbb{R}^{d\times d}\} $$