I have just learned how gradient descent work and for two-dimensional case (which I'm going to use it here for simplicity) we know that to update the value of x we do the following (I drop subscripts of the iteration number for simplicity):
$$x=x- \alpha \frac{df}{dx} $$
But I think we really use the derivative for the sign, to know in which direction we should go and when to stop.
So is there anything special about the value of the derivative or we can also use:
$$x=x- \alpha \ \ sign(\frac{df}{dx}) $$
I know that the value of the derivative will make the process take larger steps when the value of the derivative is large and smaller steps when it is small but does that improve the accuracy or the speed of the gradient descent.
Note: I use two-dimensional case here, so my question for a higher dimensions should apply the sign function for each parameter alone not for the gradient of that higher dimension function at once (i.e apply the sign function to the gradient elementwise).
The gradient is a vector in $\mathbf{R}^n$. Its sign is a scalar. You just can't replace one with the other when $n > 1$. Maybe you saw examples in your class in which the descent direction is 1 or -1, but these are univariate examples ($n = 1$).
Besides, the value of the gradient indicates how fast the function changes locally. Gradient descent tends to compute larger steps when the gradient is large in magnitude, and smaller steps when the gradient is small in magnitude.