The derivation of Maximum A Posteriori estimation doesn't make any sence to me

559 Views Asked by At

I have spent last 3 days in deriving MAP estimation. I started from WIKI but the derivation seems not self-contained because it didn't derive the posterior variance which is required in performing MAP estimation https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation

Then, I tried to read through a classical machine learning book-machine_learning_a_bayesian_perspective written by Sergios Theodoridis. It gave me enter image description here but the derivation is in problem 3.23 and the solution is given in the solution manual so I bought it online. However, the derivation is still not convincing because I don't know why equation-1 could be directly expand to equation-2 without any x_k^2 term. Although we could explain that this term hides in alpha_2. Also, I can't understand why by comparing equation-4 and equation-3, sigma_N^2 could be derived. Again, if we expand (mu-mu_N)^2 and ignore the squared mu_N, everything could be explained by ignoring alpha_2 & alpha_3. Of course we could again attribute the missing squared term to another coefficient to be combined with alpha_3 which is as equation-5.

Is there any online course or post which explain MAP in detained mathematical derivation?

Thanks!

enter image description here

1

There are 1 best solutions below

0
On

For step 1 to step 2, the terms containing $x_k^2$ are considered irrelevant because they do not contain $\mu$, therefore they are shoved away in the change from multiplicative factor $\alpha_1$ to $\alpha_2$. If you were to write these factors out explicitly, you would see that your $x_k^2$'s have gone into $\alpha_2/\alpha_1$. The same trick happens in the step from 3 to 4.