The step size is computed by solving
(A + mu I) h = -g
I could find in some literature that one can compute the step size by solving
(A + mu A') h = -g where, A' = diagonal(A)
It is said that this is helpful for error valley problems, where the error surface at minima is flat and long. I am not able to decide whether the diagonal(A) should be used in general for all cases, or the identity matrix I is more appropriate.