Help understanding a portion of research paper (Matern kernel, graph Laplacian)

86 Views Asked by At

I am trying to read this paper, and I don't seem to have some of the assumed mathematical background. I am trying to understand a certain section (the beginning of 3.1), which I will reproduce here.

If anybody can just explain in greater detail what's going on, I would appreciate it.

The basic idea of the paper is to create a graph-theoretic version of the Matern kernel for a Gaussian process on a (weighted) graph. (Normally the Matern kernel is used for a Gaussian process on Euclidean space.)

Note that $\Delta$ is the graph Laplacian.

Graph Fourier feature methods approximate the kernel matrix using a truncated eigenvalue expansion. Here, we first obtain the $\ell$ smallest eigenvalues and eigenvectors of $\Delta$ using a routine designed for efficiently computing the eigenvalues of sparse matrices, such as the Lanczos algorithm. Since the mappings

\begin{align} \lambda \mapsto \left( \frac{2\nu}{\kappa^2} + \lambda\right)^{-\nu} \qquad \qquad \lambda \mapsto e^{-\frac{\kappa^2}{4}\lambda} \end{align}

are decreasing functions on $\mathbb R^+$, we thus obtain the $\ell$ largest eigenvalues of the Matern or diffusion kernels. Once this is obtained, scalable training proceeds by relating the approximate GP with a Bayesian linear regression model, mirroring Euclidean Fourier features (Rahimi and Recht, 2008).

I know what eigenvalues and eigenvectors are. I have some idea of what a Gaussian process is. I am just looking for greater explanation of what's going in the quoted section. I will try to present my limited understanding of what's going on if that helps focus my question.

  1. The "kernel" is the covariance function of the Gaussian process, which is critical to many of the properties of the process, e.g. differentiability.
  2. Is it reasonable to think of the kernel as a matrix?
  3. If the kernel is too large to work with, we try to approximate it.
  4. So we use some off-the-shelf method, like the Lanczos method, to get $\ell$ smallest eigenvalues and eigenvectors of $\Delta$.
  5. Why the $\ell$ smallest?
  6. How does extracting eigenvectors and eigenvalues help us approximate a matrix? Is it because we can form (part of) the eigendecomposition?
  7. Why do those two functions being decreasing mean that we get the $\ell$ largest eigenvalues of the Matern kernel?
  8. Why are the largest eigenvalues of the Matern kernel the most important?

Thank you.