I have spent last 3 days in deriving MAP estimation. I started from WIKI but the derivation seems not self-contained because it didn't derive the posterior variance which is required in performing MAP estimation https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation
Then, I tried to read through a classical machine learning book-machine_learning_a_bayesian_perspective written by Sergios Theodoridis.
It gave me
but the derivation is in problem 3.23 and the solution is given in the solution manual so I bought it online.
However, the derivation is still not convincing because
I don't know why equation-1 could be directly expand to equation-2 without any x_k^2 term.
Although we could explain that this term hides in alpha_2.
Also, I can't understand why by comparing equation-4 and equation-3, sigma_N^2 could be derived.
Again, if we expand (mu-mu_N)^2 and ignore the squared mu_N, everything could be explained by ignoring alpha_2 & alpha_3.
Of course we could again attribute the missing squared term to another coefficient to be combined with alpha_3 which is as equation-5.
Is there any online course or post which explain MAP in detained mathematical derivation?
Thanks!

For step 1 to step 2, the terms containing $x_k^2$ are considered irrelevant because they do not contain $\mu$, therefore they are shoved away in the change from multiplicative factor $\alpha_1$ to $\alpha_2$. If you were to write these factors out explicitly, you would see that your $x_k^2$'s have gone into $\alpha_2/\alpha_1$. The same trick happens in the step from 3 to 4.