I want to prove that the estimator of ridge regression is the mean of the posterior distribution under Gaussian prior.
$$y \sim N(X\beta,\sigma^2I),\quad \text{prior }\beta \sim N(0,\gamma^2 I).$$
$$\hat{\beta} = \left(X^TX + \frac{\sigma^2}{\gamma^2}I\right)^{-1}X^Ty.$$
What I'm trying to show is want to show that $\mu$ = $\hat{B}$, for $\mu$ in $$-\frac{1}{2}(\beta - \mu)^T\Sigma^{-1}(\beta - \mu)$$ $\Sigma^{-1}$ is the covariance matrix for the posterior distribution $p(\beta\mid X,y)$.
There is a solution to this question the last couple of lines on page 3 from http://ssli.ee.washington.edu/courses/ee511/HW/hw3_solns.pdf. I'm baffled as to how it does this. (The problem is exercise 3.6.)
Edit: $\mu$ is the mean of the posterior.
Edit2: Last couple of lines of problem 3.6 say "which is the single $\beta$ term in the $p(\beta|y, X)$ equation." What is the single $\beta$ term? This sentence makes no sense to me. I'm not sure what the relevance of saying something is the the singe $\beta$ term to this proof.
Edit2 continued: For convenience, "$m_b = \frac{1}{\sigma^2I}\Sigma_bX^Ty$, and
$B^T\Sigma_b^{-1}m_b = \frac{1}{\sigma^2}B^tX^ty$, which is the single $\beta$ term in $p(\beta |X,y)$ equation." (me: okay, how is this helpful to the proof?)