Why use an un-squared L2-norm regularizer?

114 Views Asked by At

Presumably it would be useful if you want to penalize small values more and large values less, but L1-norm shouldn't work too bad in those cases either and you get the benefit of sparsity. The only advantage I can think of is rotational invariance, but why don't take the square and make life easier?