Gradient descent and the derivative

188 Views Asked by Bumbble Comm At 31 Mar 2026 - 5:23

I have just learned how gradient descent work and for two-dimensional case (which I'm going to use it here for simplicity) we know that to update the value of x we do the following (I drop subscripts of the iteration number for simplicity):

$$x=x- \alpha \frac{df}{dx} $$

But I think we really use the derivative for the sign, to know in which direction we should go and when to stop.

So is there anything special about the value of the derivative or we can also use:

$$x=x- \alpha \ \ sign(\frac{df}{dx}) $$

I know that the value of the derivative will make the process take larger steps when the value of the derivative is large and smaller steps when it is small but does that improve the accuracy or the speed of the gradient descent.

Note: I use two-dimensional case here, so my question for a higher dimensions should apply the sign function for each parameter alone not for the gradient of that higher dimension function at once (i.e apply the sign function to the gradient elementwise).

Original Q&A

There are 1 best solutions below

Bumbble Comm On 22 Oct 2020 - 3:04

The gradient is a vector in $\mathbf{R}^n$. Its sign is a scalar. You just can't replace one with the other when $n > 1$. Maybe you saw examples in your class in which the descent direction is 1 or -1, but these are univariate examples ($n = 1$).

Besides, the value of the gradient indicates how fast the function changes locally. Gradient descent tends to compute larger steps when the gradient is large in magnitude, and smaller steps when the gradient is small in magnitude.

Gradient descent and the derivative

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions