If we consider the Gaussian kernel $k(x, y) = \exp(-\frac{\Vert x - y \Vert^2}{2\sigma^2})$, then is it true that...
$$ \int k(x_i, x)k(x_j, x)dx = k(x_i, x_j) $$
It seems I have seen this before, but I cannot find a proper derivation of it.
If we consider the Gaussian kernel $k(x, y) = \exp(-\frac{\Vert x - y \Vert^2}{2\sigma^2})$, then is it true that...
$$ \int k(x_i, x)k(x_j, x)dx = k(x_i, x_j) $$
It seems I have seen this before, but I cannot find a proper derivation of it.
On
The Gaussian kernel has its name from the fact, that it is annihilated by the diffusion operator with time $\sigma =t$
$$k(t,x) = \frac{1}{\sqrt{2\pi \ t}} \ e^{-\frac{x^2}{2 t}}$$
$$\partial_t k(t,x) - \frac{1}{2}\partial_{x,x} k(t,x)=0, \quad k(0,x) = \delta (x)$$ by $$\partial_t k(t,x) = (\frac{x^2}{t^2} - \frac{1}{2 t}) \ k(t,x), \quad \partial_x k(t,x) = -\frac{x}{t}\ k(t,x)$$
The Gaussian kernel as solution to the diffusion equation is the probability density at time t of a random walk starting at x=0, at t=0 in the limit of a continous Brownian motion, the Wiener process.
Your integral is a convolution integral
$$ (f\star g)(x-z)) = \int_{-\infty} ^\infty f(x-y)\ g(y-z) \ dy \to { (y-z=u) }\to \int_{-\infty} ^\infty f(x-z-u)\ g(u) \ du $$
Convolution integrals transform to products of their Fourier transform (up to factors of $\sqrt{2\pi}$)
$$\int_{-\infty} ^\infty f(x-u)\ g(u) \ du =\int \int \int_{-\infty} ^\infty \hat f(k) \hat g(m) e^{i (k(x-u)+m u)} du \ dk \ dm = \int_{-\infty} ^\infty \hat f(k) \hat g(k) e^{i k x} dk $$
The Fourier transform of a Gaussian is a Gaussian with inverse variance, leading to convoluted Gaussians with sum a variances.
Semantics: a Gaussian with variance $t$ is the distribution of a Wiener process $$W_t = \int_0^t dW_s$$ up to time t.
A Wiener process $ W_s, s>0$ with the start distribution of $W_t, t>0$ has Gaussian distribution with variance $t+s$ by the fundamental solution of the diffusion equation $$\partial_t f(t,x) -\partial_{x,x} f(t,x)=0, \quad t>0 , \quad f(0,x)=\delta(x) \quad \longrightarrow \quad f(t,x)= \theta(t)\ \frac{1}{\sqrt{2\pi} \ t} \ e^{-\frac{x ^2}{2 t}}$$.
$$\partial_t f(t,x) -\frac{1}{2}\ \partial_{x,x} f(t,x)=0, \quad t>0 , \quad f(0,x)=g(x) \quad \longrightarrow \quad f(t,x) =\theta(t)\ \frac{1}{\sqrt{2\pi} \ t} \int_{-\infty}^\infty \ e^{-\frac{(x-\xi)^2}{2 t}}\ g(\xi)\ d\xi $$
On
I think the formula does not hold as is, there is a couple of missing factors: there is a $\pi^{\frac{d}{2}}$ in the right-hand side, a $\sigma^d$ in the denominator and the $k$ in the right-hand side has a different $\sigma$.
More precisely, letting $k_\sigma(x, y):=\exp\left( -\frac{|x-y|^2}{2\sigma^2}\right)$, the correct formula should be $$ \int_{\mathbb R^d} k_\sigma(x_1, y)k_\sigma(x_2, y)\, dy= \left(\frac{\pi}{\sigma^2}\right)^\frac{d}{2} k_{\sigma\sqrt2}(x_1, x_2).$$
UPDATE : The claim in the OP is not true and the previous version of this answer based on Moore-Aronszajn theorem is wrong (unfortunately I could not delete it as it had been accepted by the OP). Thanks to Giuseppe for the catch !
As highlighted in Giuseppe's answer, the correct formula is indeed $$\int_{\mathbb R^d} k_\sigma(x_1, y)k_\sigma(x_2, y)\, dy= \left(\frac{\pi}{\sigma^2}\right)^\frac{d}{2} k_{\sigma\sqrt2}(x_1, x_2). $$ And a derivation can be found in section 4.2, page 84 of the book Gaussian Processes for Machine Learning by Rasmussen and Williams.
Although it's not as nice as before, we can still connect this result to the theory of Reproducing Kernel Hilbert Spaces : basically if we define the map $\Phi:\mathbb R^d\to L^2(\mathbb R^d)$ by $$\Phi(x) := \left[t\mapsto\frac{(2 \sigma)^{\frac{d}{2}}}{\pi^{\frac{d}{4}}} e^{- 2 \sigma^2 \lVert x - t \rVert^2} \right] $$ What we have is that $\Phi$ is a feature map for the Gaussian kernel $k_\sigma$, in the sense that we have $$k_\sigma(x,y) =\int_{\mathbb R^d} [\Phi(x)](t) \; [\Phi(y)](t) \, d t = \left\langle\Phi(x),\Phi(y)\right\rangle_{L^2(\mathbb R^d)} $$
for all $x,y\in\mathbb R^d$.
For a proof and more on the representations of the RKHS associated with the Gaussian kernel, one can check An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels (2006) by Steinwart, Hush and Scovel.