In my class about numerics we were told to implement a globalized inverse BFGS method. Because BFGS is a quasi-newton-method it can happend that the calculated vector is not a descent vector (this means it is not directed towards the minimum but rather away from it).
In this case, so we were told, we should reset the calculation of the approximated inverse hessian matrix $B$. My way of dealing with this is to set $B = I$ and $d = -\nabla f(x)$ like this (Matlab syntax):
d = -( B * grad(x) );
if ( grad(x)' * d > - norm(d) )
d = - grad(x);
B = eye(dim);
end
But somehow this drastically reduces performance.
Two examples (precision $10^{-8}$):
- $f(x) = 100 * (x_1 - 2)^4 + (x_1 - 2 * x_2)^2$ with $x_0 = \begin{bmatrix}0\\-1\end{bmatrix}$
- g(x) = $\Sigma_{i = 1}^{N-1} (1 - x_i)^2 + 100 * (x_{i+1} - x_i^2)^2$ for $N = 100$ and $x_0 = [-1, -1, ..., -1]^T$
$f(x)$ needed $23$ steps without resetting the hessian matrix approximation, and $64086$ with resetting ($\begin{bmatrix}2\\1\end{bmatrix}$ being the result).
$g(x)$ needed $166$ steps without and $28979$ steps with reset ($[1, 1, ..., 1]^T$ being the result).
Because I get the correct result with my implementation I assume that I got it right (I also get correct result with other test functions).
Any ideas from you guys what could be wrong here?
Thank you
Edit: I just figured out that I probably compared it wrong, if I change it to this:
if ( grad(x)' * d <= - norm(d) )
d = - grad(x);
B = eye(dim);
end
I get $26$ for $f(x)$ and $632$ for $g(x)$. Nevertheless, it is still noticeably worse for $g(x)$.
Maybe resetting $B = I$ is just a bad idea in general?