Levenberg-Marquardt - What is preferable (A + mu.I) or (A + mu.diag[A])?

73 Views Asked by At

The step size is computed by solving

(A + mu I) h = -g

I could find in some literature that one can compute the step size by solving

(A + mu A') h = -g where, A' = diagonal(A)

It is said that this is helpful for error valley problems, where the error surface at minima is flat and long. I am not able to decide whether the diagonal(A) should be used in general for all cases, or the identity matrix I is more appropriate.