I have a basic conceptual question about how Bochner's theorem relates to the recent work on random fourier approximations (e.g. https://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf)
It seems that Bochner's theorem only ensures that the inverse Fourier transform of the shift-invariant kernel is a probability measure, and conversely that the Fourier transform of a probability measure is a shift-invariant kernel.
The work on random Fourier approximations, however, take a step further, by drawing random hyperplanes from the distribution of the probability measure itself.
Why is this justified? What exactly is the connection between the probability measure and the positive semidefinite kernel, in terms of approximations?
Edit: Some progress I made on the issue:
1) We have to assume that for some shift-invariant kernel $K(x,y) = K(x-y)$
$K(x-y) = \mathbb E_w[e^{jw^Tx} (e^{-jw^Tx})^T]$ [1]
which is plausible since fourier bases are dense in functional spaces.
To be totally explicit, $x$, $y$ are scalar vectors in $\mathbb R^d$ and $w$ is a random vector, also in $\mathbb R^d$.
2) Then by definition of expectation, $K(x-y) = \int_{w\in \mathbb R^d} P(w) e^{jw^T(x-y)}dw = \mathcal F^{-1}(P(w))$ where $P(w)$ is the pdf of $w$. Again, a nontrivial detail is that $w\in \mathbb R^d$, not a scalar. So don't imagine you can just compute $P(w)$.
3) Then $w\sim P(w)$ and $e^{j(x^Tw)}$ is your scalar Fourier representation of some high dimensional vector $x$.
But the key assumption needed for the representability of a random Fourier approximation is [1]...?