I just proved to myself why the regularization is added rather than multiplied to loss function.
I did so by taking the MLE formula...
$$argmax\sum log(P(x_{i}|\Theta ))$$
and since we know that MAP uses a prior belief distribution...
$$P(\Theta | x) = \frac{P(x|\Theta )P(\Theta )}{P(x)}$$
We can write MAP as...
$$argmax\sum log(P(x_{i}|\Theta )P(\Theta)$$
If we redistribute the logs, we can see that $log(P(\Theta))$ is the regularization terms, as shown below...
$$log(P(x_{i}|\Theta)) + log(P(\Theta))$$
but now I would like to show how L2 itself is derived. L2 is defined as...
$$\lambda \sum_{k}\sum_{l}W^{2}_{k,l}$$
which is the element-wise multiplication of the weights. Where did this equation come from? What values for $P(x_{i}|\Theta)$ and $P(\Theta)$, for example, do I need to use to derive this L2 formula? Can someone please explain it to me step-by-step?
Take the prior $\mathbb{P}(\Theta)$ to be multivariate normal $\mathcal{N}_l(\mathbf{0},I),$ where $l$ is the dimension of the parameter $\Theta.$ $I$ is the $l\times l$ identity matrix.
Here is a reference