Levenberg-Marquardt stalling

179 Views Asked by At

I am trying to solve an optimization problem of the form $$\min_q E(q) = \min_q \frac{1}{2}f(q)\cdot f(q),$$ for $f:\mathbb{R}^n \to \mathbb{R}^m$, using Levenberg-Marquardt, i.e. starting from an initial $q$ I iteratively solve

$$\left(\nabla f^T\nabla f + \lambda \operatorname{diag}(\nabla f^T\nabla f)\right)x = -\nabla f^T f, \tag{1}$$ increasing $\lambda$ until $E(q+x) < E(q)$, and then set $$q \leftarrow q+x.$$

In some cases this is working fine, but in others the algorithm stalls: I end up in a situation where

1) $\|\nabla f^T f\|$ is well away from zero, and decreasing very slowly, so I have not converged to a critical point of $E$;

2) Each iteration requires $\lambda$ to become larger and larger before $E(q+x) < E(q)$, so that $E(q)$ stops noticeably decreasing;

3) The residual to the linear system above (1) is tiny (<$10^{-15}$), so nothing is going wrong with the linear solve as far as I can tell.

What might be going wrong with the optimization? Do the above offer any clues, other than that $\nabla f^T \nabla f$ is a very bad approximation to the full Hessian $\nabla f^T\nabla f + Hf f$ at the stall point? Is there anything I might try to improve convergence?

EDIT: Here is more information:

4) Gradient descent works fine at the stall point: with step size of about $10^{-2}$ the objective decreases a nontrivial amount.

5) $\nabla f^T\nabla f$ is nonsingular, but poorly conditioned, at the stall point: the largest singular value is about $10^8$ and the smallest $0.01$.