Determining How Addition in New Data Point Effects Hyperparameters in Gaussian Process with Squared Exponential Kernel

141 Views Asked by At

I want to determine how the inclusion of new data effects hyperparameters of the Gaussian Process kernel. For reference assuming square exponential kernels as provided here: $$K(x,x') = \sigma^2\exp\left(\frac{-(x-x')^T(x-x')}{2l^2}\right)$$ So the derivative with respect to length scale determines what the effect to the kernel when the lengthscale changes as follows: $$\frac{\partial K}{\partial l} = \sigma^2\exp\big(\frac{-(x-x')^T(x-x')}{2l^2}\big) \frac{(x-x')^T(x-x')}{l^3}$$

I however would like to determine what is the change or effect of a single new data point to the lengthscale. What should be the symbolic expression I need to evaluate the derivative of?

Is it $$\frac{\partial l}{\partial \mu}$$ of the GP? where $\mu$ is the predictive mean of the GP as follows:

$$\mu(x^*)=K(x^*,X)^\top[K(X,X)+\sigma_n^2\mathbf{I}]^{-1} \mathbf{y_n}$$ If so how can the derivative expression be formulated. (Initial expression atleast, I should be able to workout derivitave from there itself)

1

There are 1 best solutions below

13
On BEST ANSWER

Interesting question. Firstly, the lengthscale does not change with new data. Rather, it only changes when you re-optimize the hyperparameters. So I assume you care about how the optima of the NLML space parameterized by hyperparameters and data changes w.r.t. a new observation. That is to say: I see a new point and re-optimize the kernel function. The lengthscale changes, can we quantify this?

Unfortunately, a complete general answer to this is no (as far as I am aware) as the hyperparameter optimization space is non-analytic (unless you want to go about sampling the entire space and interperlating to fill in the gaps).

But hope is not lost entirely. What I suspect is that you care about the gradient of the hyperparameter space at the old optima when the new point is observed, or more completely the change about the region of the optima as the new point is observed. The change on the NLML hyperparamer space is just the difference in NLML$(x)$ and NLML$(x, \bar{x})$ and the same holds for the derivatives.

Each new point is a discrete event so you have to look at differences not analytic gradients.

Finally, if you care about the change of NLML$(x, \bar{x})$ with respect to the position of $\bar{x}$ we could analytically compute that derivative fairly easily (but I'll wait for feedback from you before I right it all out).