This question is motivated by a question I'm facing in vector-valued kernel methods (also known as Gaussian Processes and co-krieging).
Suppose I have $N$ data $X := \{x_n\}_{n=1}^N$ , where each $x_n \in \mathbb{R}^D$. My question: Under what conditions or choices of kernel functions will the following hold?
$$k(x, X) \; k(X, X)^{-1} \; k(X, x) = k(x, x)$$
For example, I think the following is true: If we choose kernel $k(\cdot, \cdot)$ as a vanilla inner (i.e. dot) product and if I slightly abuse notation by referring to $X$ as a matrix in $\mathbb{R}^{N \times D}$, then we have:
$$k(x, X) \; k(X, X)^{-1} \; k(X, x) = x^T X^T (X X^T)^{-1} X x$$
and we know that this will simplify to $x^T x = k(x, x)$ iff $x$ lives in the row space of $X$.
Is this correct?
Are more general versions of this result possible?
I'd be happy to take clarifying questions!
In general, you can think of kernels in terms of feature maps. Using the canonical feature map: $$ \begin{split} \phi: \mathbb{R}^D &\to \mathcal{H}\\ x &\mapsto k(\cdot, x)\,, \end{split} $$ we can write the same equation terms as: $$ \begin{align} k(x,X)k(X,X)^{-1}k(X,x) &= \phi(x)^T\Phi(\Phi^T\Phi)^{-1}\Phi^T\phi(x)\\ k(x,x) &= \phi(x)^T\phi(x) \end{align} $$ where $\phi(x)^T\phi(x') = \langle \phi(x), \phi(x')\rangle$ (inner product), and $\Phi := [\phi(x_1), \dots, \phi(x_N)]$. So the same that happened in the linear kernel case $k(x,x') = x^Tx'$ also happens in the more general case $k(x,x') = \phi(x)^T\phi(x')$, i.e., for $x_i \in X$, we have: $$\phi(x_i)^T\Phi(\Phi^T\Phi)^{-1}\Phi^T\phi(x_i) = \phi(x_i)^T\phi(x_i),$$ since $\phi(x_i)$ is one of the columns of $\Phi$.
For a general $x \in \mathbb{R}^D$, we can follow some intuition based on Gaussian processes. The posterior variance of a noise-free Gaussian process is given by: $$ \sigma^2(x) = k(x,x) - k(x,X)k(X,X)^{-1}k(X,x)\,. $$ So the posted equation is satisfied for $x\in \mathbb{R}^D$ such that $\sigma^2(x) = 0$. If $k$ corresponds to a stationary covariance function, then $\sigma^2(x) = 0$ for every point in the dataset $X$. For a non-stationary $k$, it depends on the type of kernel. Periodic kernels, for example, will have $\sigma^2(x) = 0$ for points in $X$ and then other points repeating throughout the domain on a periodic pattern.