At the moment I take a look at the book Pattern Recognition and Machine Learning from Christopher Bishop and as I try to understand the basics of the probability theory I get stuck trying to understand the maximum posterior technique.
The equation says:
$$ p(\bf{w}\mid\bf{x},\bf{t},\alpha,\beta) \propto p(\bf{t}\mid\bf{x},\bf{w},\beta)p(\bf{w}\mid\alpha) $$
Given is:
$$ \ln p(\bf{t}\mid\bf{x},\bf{w},\beta) = -\frac{\beta}{2} \sum_{n=1}^N (y(x_n,\bf{w}) - t_n)^2 + \frac{N}{2} \ln \beta - \frac{N}{2}\ln(2\pi) $$
Which I was able to understand so far. But then as the chapter proceeds and it comes to $ p(\bf{w}\mid\alpha) $, it just is considered that
$$ p(\bf{w}\mid\alpha) = \mathcal{N}(\bf{w}\mid\bf{0},\alpha^{-1}\bf{I}) = \left(\frac{\alpha}{2\pi}\right)^{\frac{M+1}{2}} \exp\left\{-\frac{\alpha}{2}\bf{w}^T\bf{w}\right\} $$
but sadly it is not explained why this is the case. I tried to understand this myself but did not succeed since I am not such an enviably mathematician.
Could someone explain to me why the probability $ p(\bf{w}\mid\alpha) $ does look like this?
Thank you!