Currently reading from Christopher Bishop's Pattern Recognition and Machine Learning book about parameter distribution under a bayesian linear regression.
On page 153, the author deduces that the posterior distribution over weights for a model of the form $y(\bf x, \bf w) + \varepsilon$ after $N$ observations with $y({\bf x}, {\bf w}) = {\bf w}^T{\bf \phi(x)}$, $p(\varepsilon) = \mathcal{N}(0, \beta)$ and with a conjugate prior of the form
$$ p({\bf w}) = \mathcal{N}({\bf w} | {\bf m}_0, {\bf S}_0) $$
is
$$ \begin{align} {\bf m}_N &= {\bf S}_N({\bf S}_0^{-1}{\bf m}_0 + \beta{\bf \Phi}^T{\bf t}) \\ {\bf S}_N^{-1} &= {\bf S}_0^{-1} + \beta{\bf \Phi}^T{\bf \Phi} \end{align} $$
Then, he continues by saying that
If we consider an infinitely broad prior ${\bf S}_0 = \alpha^{-1}{\bf I}$ with $\alpha \to 0$, the mean ${\bf m}_N$ of the posterior distribution reduces to the maximum likelihood value ${\bf w}_{\text{ML}}$
Where ${\bf w}_{\text{ML}} =({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t}$.
In doing so, I arrive at $$ {\bf m}_N = {\bf m}_0 + \alpha^{-1}\beta{\bf \Phi}^T{\bf t} + \alpha\beta^{-1}({\bf \Phi}^T{\bf \Phi})^{-1}{\bf m}_0 + ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t} $$
For which the only way to get rid of ${\bf m}_0$ if it is a zero vector. Thus, we should consider a zero mean isotropic gaussian as a prior. In doing so I arrive at
$$ {\bf m}_N = \alpha^{-1}\beta{\bf \Phi}^T{\bf t} + ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}{\bf t} $$
Finally, if I take the limit $\alpha \to 0$ in ${\bf m}_0$, then $\lim_{\alpha \to 0^+} 1/\alpha = \infty $ and $\lim_{\alpha \to 0^-} 1/\alpha = -\infty $, which does not converge and cannot reduced ${\bf m}_N$ to ${\bf w}_{\text{ML}}$. Since we want an infinitely broad prior, ${\bf S}_0$ has to be ${\bf S}_0 = \alpha^{-1}{\bf I}$ with $\alpha \to 0$.
What's the argument I need to conclude that ${\bf m}_N$ does indeed converge to ${\bf w}_{\text{ML}}$?
As $\alpha \to 0$: $${\bf S}^{-1}_0= \alpha {\bf I} \to 0$$ $${\bf S}_N^{-1}={\bf S}_0^{-1} + \beta {\bf \Phi}^T{\bf \Phi} \to \beta {\bf \Phi}^T{\bf \Phi}$$ $${\bf m}_N = {\bf S}_N({\bf S}_0^{-1}{\bf m}_0 + \beta{\bf \Phi}^T{\bf t}) \to ({\bf \Phi}^T{\bf \Phi})^{-1}{\bf \Phi}^T {\bf t}={\bf w}_{\text{ML}}$$