In a linear regression model ${\bf y} = {\bf x}^T {\bf w} + \epsilon$, assuming Gaussian noise ($\epsilon\sim N(0,\sigma_n^2)$), and Gaussian priors on the weights (${\bf w}\sim N({\bf 0}, \Sigma_p))$, one can show that the posterior distribution for the weights given data $X,{\bf y}$ is also Gaussian, with: $$p({\bf w}|X,{\bf y})\sim N\left(\bar{\bf w}=\frac{1}{\sigma_n^2}A^{-1}X{\bf y}, A^{-1}\right)$$ Where $A = \sigma_n^{-2}X X^T + \Sigma_p^{-1}$. In order to use this posterior to find the probability distribution $p(f({\bf x_*}))$ at some new point $\bf x_*$, we need to average over all possible weights (sometimes termed the Bayesian estimator): $$p(f({\bf x_*})|{\bf x_*},X,{\bf y})=\int p(f({\bf x_*})|{\bf x_*},{\bf w})p({\bf w}|X,{\bf y})d\bf{w}$$ I'd like to prove that this is just the Gaussian: $$\sim N\left(\frac{1}{\sigma_n^2}{\bf x_*}^T A^{-1}X {\bf y}, {\bf x_*}^T A^{-1}{\bf x_*}\right)$$ However, something in the integration doesn't seem to work out.
Can someone show how this is done?
Unless I am missing something it should just be a straight forward application of the following
Now apply the above to the case that $$ p(\mathbf{w}) = \mathcal{N}(\mathbf{w}|\sigma_n^{-1}\mathbf{A}^{-1}\mathbf{Xy}, \mathbf{A}^{-1}), $$ and so for this particular instance we have $$ \boldsymbol{\mu}:=\frac{1}{\sigma_n^2}\mathbf{A}^{-1}\mathbf{Xy}, \qquad \boldsymbol{\Lambda}:=\mathbf{A}, $$ and $$ \mathbf{P}:=(\mathbf{x}_*)^T, \qquad \mathbf{b}=\mathbf{0}, \qquad \mathbf{L}^{-1}:= \sigma_n^2\mathbf{I}. $$ Putting this together you get $$ p(\mathbf{f}^*)=\mathcal{N}\left(\mathbf{f}^* \; \bigg| \; \frac{1}{\sigma_n^2}\mathbf{x_*}^T\mathbf{A}^{-1}\mathbf{Xy}, \; \sigma_n^2\mathbf{I}+\mathbf{x}_*^T\mathbf{A}^{-1}\mathbf{x}_* \right), $$ note the presence of the independent noise term, $\sigma_n^2\mathbf{I}$, in the predictive distribution.
I have left the conditioning on $\mathbf{x}_*, \mathbf{X}, \mathbf{y}$ unstated but you can then go back and put that in by writing $p(\cdot) = p(\cdot | \mathbf{x}_*, \mathbf{X}, \mathbf{y})$ if you like.
As an aside I am a little confused in this instance as to whether $\mathbf{y}=\mathbf{y}(\mathbf{x})$, and therefore whether the noise $\epsilon$, intended to be vector or scalar valued but you can clean than up and make everything conform to the model you have in mind.