Mean of the posterior distribution in bayesian linear regression with infinitely broad prior

180 Views Asked by At

Currently reading from Christopher Bishop's Pattern Recognition and Machine Learning book about parameter distribution under a bayesian linear regression.

On page 153, the author deduces that the posterior distribution over weights for a model of the form $y(\bf x, \bf w) + \varepsilon$ after $N$ observations with $y({\bf x}, {\bf w}) = {\bf w}^T{\bf \phi(x)}$, $p(\varepsilon) = \mathcal{N}(0, \beta)$ and with a conjugate prior of the form

$$ p({\bf w}) = \mathcal{N}({\bf w} | {\bf m}_0, {\bf S}_0) $$

is

$$ \begin{align} {\bf m}_N &= {\bf S}_N({\bf S}_0^{-1}{\bf m}_0 + \beta{\bf \Phi}^T{\bf t}) \\ {\bf S}_N^{-1} &= {\bf S}_0^{-1} + \beta{\bf \Phi}^T{\bf \Phi} \end{align} $$

Then, he continues by saying that

If we consider an infinitely broad prior ${\bf S}_0 = \alpha^{-1}{\bf I}$ with $\alpha \to 0$, the mean ${\bf m}_N$ of the posterior distribution reduces to the maximum likelihood value ${\bf w}_{\text{ML}}$

Where ${\bf w}_{\text{ML}} =({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t}$.

In doing so, I arrive at $$ {\bf m}_N = {\bf m}_0 + \alpha^{-1}\beta{\bf \Phi}^T{\bf t} + \alpha\beta^{-1}({\bf \Phi}^T{\bf \Phi})^{-1}{\bf m}_0 + ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t} $$

For which the only way to get rid of ${\bf m}_0$ if it is a zero vector. Thus, we should consider a zero mean isotropic gaussian as a prior. In doing so I arrive at

$$ {\bf m}_N = \alpha^{-1}\beta{\bf \Phi}^T{\bf t} + ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t} $$

Finally, if I take the limit $\alpha \to 0$ in ${\bf m}_0$, then $\lim_{\alpha \to 0^+} 1/\alpha = \infty $ and $\lim_{\alpha \to 0^-} 1/\alpha = -\infty $, which does not converge and cannot reduced ${\bf m}_N$ to ${\bf w}_{\text{ML}}$. Since we want an infinitely broad prior, ${\bf S}_0$ has to be ${\bf S}_0 = \alpha^{-1}{\bf I}$ with $\alpha \to 0$.

What's the argument I need to conclude that ${\bf m}_N$ does indeed converge to ${\bf w}_{\text{ML}}$?

1

There are 1 best solutions below

0
On

As $\alpha \to 0$: $${\bf S}^{-1}_0= \alpha {\bf I} \to 0$$ $${\bf S}_N^{-1}={\bf S}_0^{-1} + \beta {\bf \Phi}^T{\bf \Phi} \to \beta {\bf \Phi}^T{\bf \Phi}$$ $${\bf m}_N = {\bf S}_N({\bf S}_0^{-1}{\bf m}_0 + \beta{\bf \Phi}^T{\bf t}) \to ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}^T {\bf t}={\bf w}_{\text{ML}}$$