Relationship of gradient descent step size to the second-order derivative?

158 Views Asked by Bumbble Comm At 16 May 2026 - 4:10

The update rule in Newton's method is $ x \leftarrow x - \mathbf{H}^{-1} \nabla $, where $ \mathbf{H} $ is the Hessian and $ \nabla $ is the gradient.

In the case of gradient descent, the update rule is $ x \leftarrow - \alpha \nabla $, where $ \alpha $ is the step size or learning rate.

Created this visual (below) for understanding the impact of the learning rate of gradient descent. The orange circle indicates the initial value of $x$. The 'Insights' table shows the first 10 steps of gradient descent for the chosen step size.

For this particular problem, Newton's method suggests a step size (reciprocal of second-derivative) of 0.5. Some observations from this interactive visualization:

A step size of 0.5 with gradient descent will reach the minimum in one step. (Just like Newton's method would for a convex quadratic function).
Any learning rate lower than 0.5 leads to slower convergence.
Any learning rate higher than 0.5 (but less than 1) leads to oscillations around the minimum till it finally lands at the minimum.
A learning rate $\alpha=1$ for this function. The iterations just keep oscillating.
A learning rate greater than 1 leads to the iterations moving away from the minimum.

Now the question is: "1" is twice the step size of Newton's method for this optimization problem. Will it be the case that a learning rate that is twice (or some other finite multiple) of the reciprocal of the second-order derivative is undesirable for optimization with gradient descent? Is anyone familiar with any research along these lines?

Gradient descent demo on $ \min x^2 $

Original Q&A

Relationship of gradient descent step size to the second-order derivative?

Related Questions in CONVEX-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions