Gradient descent stopping criteria

193 Views Asked by Bumbble Comm At 28 Mar 2026 - 10:03

One criteria for gradient descent stopping is using $reltol$ - stopping gradient descent when improvement drops below a threshold.

In practice, is there any difference between stopping when empirical improvement approaches zero v.s. stopping when $|\nabla f(x)|$ approaches zero?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 16 Oct 2020 - 9:41 BEST ANSWER

The gradient criterion $$||\nabla f(x_{k})|| \leq \epsilon$$ is scale-sensitive, i.e. it matters in which units your data is measured (e.g., meter vs. kilometer). In principle, both $x$ and its gradient can be made arbitrarily large or small by adjusting the measurement unit. On the other hand, the relative-parameter-change criterion $$|| x_{k} - x_{k-1} || \leq \epsilon$$ is scale-invariant. Keep in mind, however, that since $x_{k}$ is (most likely) a vector, each of its elements are treated the same here. Ideally, you would like to weigh each parameter with some curvature information, i.e. the Hessian matrix $\nabla^{2} f(x_{k})$, but that's usually not computationally feasible. For a more detailed discussion on stopping criteria, have a look at pages 305-312 in Gill, Murray & Wright.

Gradient descent stopping criteria

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions