From a robust statistics perspective are there any advantages of the Huber loss vs. L1 loss (apart from differentiability at the origin) ? Specifically, if I don't care about gradients (for e.g. when using tree based methods), does Huber loss offer any other advantages vis-a-vis robustness ?
Moreover, are there any guidelines for choosing the value of the change point between the linear and quadratic pieces of the Huber loss ? Thanks.
The Huber function is less sensitive to small errors than the $\ell_1$ norm, but becomes linear in the error for large errors. To visualize this, notice that function $| \cdot |$ accentuates (i.e. becomes sensitive to) points near to the origin as compared to Huber (which would in fact be quadratic in this region). Therefore the Huber loss is preferred to the $\ell_1$ in certain cases for which there are both large outliers as well as small (ideally Gaussian) perturbations.
The point of interpolation between the linear and quadratic pieces will be a function of how often outliers or large shocks occur in your data (eg. "outliers constitute 1% of the data"). It's common in practice to use a robust measure of standard deviation to decide on this cutoff.
Huber's monograph, Robust Statistics, discusses the theoretical properties of his estimator. For more practical matters (implementation and rules of thumb), check out Faraway's very accessible text, Linear Models with R.