Gaussian Process Regression

796 Views Asked by At

Observations: $$ X= \begin{pmatrix} x_1 \\ x_2 \\ \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0.5 & 2 \\ \end{pmatrix} $$ $$ y= \begin{pmatrix} y_1 \\ y_2 \\ \end{pmatrix} = \begin{pmatrix} 2 \\ 5 \\ \end{pmatrix} $$

and $x_*=(0.5, 1)$ is a datapoint with unknown function value $f_*=f(x_*)$ A distribution of $f$ is given by a Gaussian process with mean $m(x)=0$ and covariance function $K(x,x')=exp(-1/2(x-x')^T(x-x'))$

Assuming a noise-free scenario, how to write down a joint distribution $f= \begin{pmatrix} y_1 \\ y_2 \\ f(x_*) \end{pmatrix}$?

1

There are 1 best solutions below

0
On BEST ANSWER

Great to see you are interested in Gaussian Processes! First off if you want an in depth read about using GPs I would refer you to GPML, but I should be able to answer your question in this post.

Step 1: Construct a Covariance Function

You already know the kernel you want to use so this is pretty trivial. We simply pop your data into the formula you provided (ps this kernel is the ARD RBF kernel with l=1 and sigma=1 for anyone who is curious). For each of the observations, x's, apply the formula like this example:

$$K(1,2) = exp(-\frac{(x_1 - x_2)^T(x_1 - x_2)}{2})$$

giving us

$$K(1,2) = 0.535$$

In most programming languages you can do this in vectorized notation for all K(i,j). the resulting covariance matrix is:

$$K = \left( \begin{array}{ccc} 1 & 0.535 & 0.882 \\ 0.535 & 1 & 0.607 \\ 0.882 & 0.607 & 1 \end{array} \right)$$

For notational purposes I will will split this matrix into 3 parts - the covariance between training data and training data, the covariance between training data and testing data and the the covariance between testing data and testing data,

$$k = \left( \begin{array}{cc} 1 & 0.535 \\ 0.535 & 1 \end{array} \right)$$

$$k_* = \left( \begin{array}{c} 0.882 \\ 0.607 \end{array} \right)$$

$$k_{**} = \left( 1 \right)$$

Step 2: Compute the Mean Expectation for the Observation

The mean expectation is computed as follows:

$$f(x_*) = k_*k^{-1}y$$

For your data it would be as follows:

$$f(x_*) = \left( \begin{array}{c} 0.882 \\ 0.607 \end{array} \right)\left( \begin{array}{cc} 1 & 0.535 \\ 0.535 & 1 \end{array} \right)^{-1}\left( \begin{array}{c} 2 \\ 5 \end{array} \right) = 2.5$$

Step 3: Compute the Variance for the Observation

While the model assumes zero noise on observations there is still uncertainty on our predictions as we can't predict with 100% certainty because there are infinite models that could possibly fit the two data points provided. As we observe more data we expect the variance to shrink, i.e. we are more confident about our predictions. The variance can be solved as follows:

$$V = k_{**} - k_*^Tk^{-1}k_*$$

So filling in your data

$$V = 1 - \left( \begin{array}{c} 0.882 \\ 0.607 \end{array} \right)^T\left( \begin{array}{cc} 1 & 0.535 \\ 0.535 & 1 \end{array} \right)^{-1} \left( \begin{array}{c} 0.882 \\ 0.607 \end{array} \right) = 0.2$$

NOTES: Calculations are not exact!