Help in understanding Derivation of Posterior in Gaussian Process

443 Views Asked by At

According to the textbook Gaussian Process in Machine Learning, it is given that \begin{align*} p(w\mid X,y) &\propto \exp\left(-\frac{1}{2\sigma_n^2}(y-X^Tw)(y-X^Tw)\right)\exp\left(-\frac{1}{2}w^T\Sigma_{p}^{-1}w\right) \\ &\propto \exp\left(-\frac{1}{2}(w-\bar{w})^T\left(\frac{1}{\sigma_n^2}XX^T + \Sigma_p^{-1}\right)(w-\bar{w})\right) \end{align*} where $\bar{w} = \sigma_n^{-2}(\sigma_n^{-2}XX^T + \Sigma_p^{-1})^{-1}Xy$.

I can't really understand how the first step leads to the second step. Can someone kindly show me how the derivation is done? Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

You need to show \begin{align} & \frac1{\sigma_n^2}(y-X^Tw)^T(y-X^Tw) + w^T\Sigma_{p}^{-1}w \\[10pt] = {} & (w-\bar{w})^T\left(\frac{1}{\sigma_n^2}XX^T + \Sigma_p^{-1}\right)(w-\bar{w}) + \text{constant} \end{align} and bear in mind that "constant" means not depending on $w.$

You had a typographical error: $(y-X^Tw)^T$ was needed where you have $y-X^Tw$.

You need this: \begin{align} & \frac1{\sigma_n^2}(y-X^Tw)^T(y-X^Tw) + w^T\Sigma_{p}^{-1}w \\[10pt] = {} & \frac 1 {\sigma_n^2} \left( y^Ty - y^T X^T w - w^T Xy + w^TXX^T w \right) + w^T\Sigma_p^{-1} w \\[10pt] = {} & w^T A w - b^T w - w^T b + \text{constant} \tag 1 \\[10pt] \overset{\Large\text{?}}= {} & (w-\bar{w})^T\left(\frac{1}{\sigma_n^2}XX^T + \Sigma_p^{-1}\right)(w-\bar{w}) + \text{constant} \end{align} where $$ A = \frac 1 {\sigma_n^2} X^T X + \Sigma_p^{-1} \quad \text{and} \quad b = Xy. $$

So the question is: How do you complete the square in an expression like $(1)$?

Here we need the fact that the matrix $A$ is a nonnegative-definite symmetric matrix with real entries, and that such matrices can be diagonalized by orthogonal matrices, and the diagonal entries (which are the eigenvalues) are nonnegative, and by taking square roots of the diagonal entries one can find a nonnegative-definite symmetric square root of $A$, which let us call $A^{1/2}$.

Here I will assume $X$ is a matrix with linearly independent rows (and of course it typically has more rows than columns). It follows that $A$ and $A^{1/2}$ are invertible, so we may speak of $A^{-1/2}$, which is also a positive-definite symmetric matrix.

Then we have \begin{align} & w^T A w -b^T w - w^T b \\[10pt] = {} & (A^{1/2} w)^T (A^{1/2} w) - (A^{-1/2}b)^T (A^{1/2}w) - (A^{1/2} w)^T (A^{-1/2} b) \\[15pt] \text{and so } & (A^{1/2} w)^T (A^{1/2} w) - (A^{-1/2}b)^T (A^{1/2}w) - (A^{1/2} w)^T (A^{-1/2} b) + b^T A^{-1} b \\[10pt] = {} & (A^{1/2} w - A^{-1/2} b)^T (A^{1/2} w - A^{-1/2} b). \end{align}

Thereafter proceed according to the incomplete answer by Ulfgard.

1
On

What did you try yourself? The steps involved are:

  1. expand the quadratic term (easiest in log domain to get rid of the exp...)
  2. gather all terms which involve y (they will form $\bar{w}$)
  3. create the quadratic term using the technique of "completing the squares"
  4. afterwards you have a superfluous quadratic term that is not depending on $w$. this will be swallowed by the $\propto$

you can as well take a look at the standard example of multiplying two gaussian distributions as this is essentially the same.