Gradient descent stopping criteria

193 Views Asked by At

One criteria for gradient descent stopping is using $reltol$ - stopping gradient descent when improvement drops below a threshold.

In practice, is there any difference between stopping when empirical improvement approaches zero v.s. stopping when $|\nabla f(x)|$ approaches zero?

1

There are 1 best solutions below

0
On BEST ANSWER

The gradient criterion $$||\nabla f(x_{k})|| \leq \epsilon$$ is scale-sensitive, i.e. it matters in which units your data is measured (e.g., meter vs. kilometer). In principle, both $x$ and its gradient can be made arbitrarily large or small by adjusting the measurement unit. On the other hand, the relative-parameter-change criterion $$|| x_{k} - x_{k-1} || \leq \epsilon$$ is scale-invariant. Keep in mind, however, that since $x_{k}$ is (most likely) a vector, each of its elements are treated the same here. Ideally, you would like to weigh each parameter with some curvature information, i.e. the Hessian matrix $\nabla^{2} f(x_{k})$, but that's usually not computationally feasible. For a more detailed discussion on stopping criteria, have a look at pages 305-312 in Gill, Murray & Wright.