I'm reading through a paper which presents at some point an optimization step to a function of the form:
$$ E = \sum_i \left|\alpha_i - \beta_i \right| $$
where $\alpha_i$ and $\beta_i$ are also functions, it doesn't really matter the specific form. But what is claimed is that the problem is solved using a L-BFGS method. I thought there was a mistake in the formula but they took inspiration from another paper where the same formula is showed, with the difference in the latter paper a Quasi Newton method is mentioned.
Now... as far as I know the function needs to be at least twice differentiable, and the function $E$ isn't.
Is there some form of trick that is usually applied? I know you can approximate the $|\cdot|$ by a differentiable function (such as $f_n(x) = \frac{1}{n} \ln(\cosh(nx))$, but there's no mention to any approximation.
The other "trick" I can think of is just to compute the derivative of the step function as the signum function (regardless of what happens at 0).
What I would do is doing something like what I mentioned, my question is... is this what would have been done in practice?
The reasons are unknown yet, and nobody possesses a proof, but it turns out that in practice BFGS style algorithms do converge on many non-smooth problems too, as was pointed out by Prof. Adrian Lewis in this excellent talk on slides 3-4.
Update
It seems that some results explaining why quasi-newton methods work in this case are available. See, for example, this paper.