Let a gaussian kernel be defined as $K(x_i, x_j) \equiv \exp(-\alpha |x_i-x_j|^2)+\beta \delta_{ij}$, and define the kernel matrix of some set of datapoints $\{x_i\}_{i=1}^n$ as the $n\times n$ matrix $K$ with $K_{ij} = K(x_i, x_j)$.
This is a common construction in various fields, e.g. Gaussian Processes. Is there a fast way of calculating the inverse of the kernel matrix?
Thoughts: if we could break down the matrix $K$ into the form $\beta I + u u^T$ for some column vector $u$, we could use the Sherman–Morrison formula to quickly calculate the inverse, however, we know such a $u$ would be infinite dimensional. Is there another trick one could use?
Much works has been done on the scalability of Gaussian Processes. The main idea is to approximate your kernel matrix $K$ with another matrix that admits faster operations. This allows us to have fast matrix vector multiplications when optimizing hyper-parameters and use iterative algorithms such as Conjugate Gradients or GMRES to invert $K$.
The Structured Kernel Interpolation(SKI) paper gives a method for approximating your data with data on a grid, which allows you to use inducing point methods, toeplitz, kronecker methods to get the fast computations. See https://arxiv.org/abs/1503.01057