I am looking for some tips regarding the BFGS method. I am solving an optimization problem with 10K parameters. I try to minimize a functional $ \pi(\mu)=||u(\mu)-u_{measured}||^2 $ for the parameter $\mu$. I tried two methods:
- Self implemented Gauss-Newton (G-N), which uses $J^TJ$ where $J$ is the jacobian matrix $J=\partial u / \partial \mu$.
- Scipy's BFGS method "fmin_l_bfgs_b" which takes the functional and gradient. I use it in conjunction with the adjoint method to compute the gradient efficiently.
I find Gauss-Newton to converge super-fast (3-5 iterations), whereas BFGS is terribly slow requiring thousands (!). I want to use BFGS since the jacobian for G-N is very expensive to compute, but with so many iterations BFGS is not practical. I also found "fmin_l_bfgs_b" to be very sensitive to input parameters. For example, I needed very tight tolerances (much higher than those suggested in the scipy page) to get it to work and not exit prematurely.
I also noticed is that BFGS tends to "smooth out" the solution, so that for the same value of the functional the G-N method gives qualitatively much nicer results (for $\mu$) compared to BFGS.
I am looking for suggestion how to get BFGS to converge faster, and perhaps not to move towards smoothed out solutions (I already tried playing around with the "fmin_l_bfgs_b" parameters, but did not get much improvement).