In this paper they talk about Sqrt-LASSO which is simply just trying to minimize $\|Ax-b\|_2 + \lambda\|x\|_1$ rather than the regular LASSO $\|Ax-b\|_2^2 + \lambda\|x\|_1$.
Can anyone point out the theoretical differences between the two in terms of whether one is more robust to outliers, do we still have sparsity, etc? What about in practice, do these implementations have much of a difference?