Best example to demonstate my question is the elastic net, which has the Risk (here for linear regression). For some $D=\{(x_i,y_i)\}_{i=1}^n$ with $ x_i\in \mathbb{R}^d, y_i\in\mathbb{R}$ and some $ \lambda_1, \lambda_2 \geq 0$.
$$R(w) = \sum_{i=1}^n (w^Tx_i -y_i)^2 + \lambda_1 ||w||_1 + \lambda_2 ||w||_2^2$$
Why is the $L_2$ norm squared, just because of the derivative, or is there some reason behind this?
I asked this question also on a forum of my university. One thing an assistant mentioned:
The norm $||w||_1$ is linear in $|w_i|$ and $||w||^2$ is linear in $|w_i|^2$. Which makes the non-squared $L_1$ and squared $L_2$ norms computationally and theoretically more interesting and easier to handle.
Note that $||w||_1^2 = ||w||_2^2 + \sum_{j}\sum_{i\neq j}|w_j|\,|w_i|$ is not linear in $|w_i|$ and also $||w||_2$ itself is not linear in $|w_i|$. Thus squared $L_1$ and and non-squared $L_2$ are hard to handle.