Gradient descent is numerical optimization method for finding local/global minimum of function. It is given by following formula: $$ x_{n+1} = x_n - \alpha \nabla f(x_n) $$
For sake of simplicity let us take one variable function $f(x)$. In that case, gradient becomes derivative $\frac {df} {dx} $ and formula for gradient descent becomes: $$ x_{n+1} = x_n - \alpha \frac {df} {dx} $$
My question is: How can we get new iterands $x_{n+1}$ from change in value of $f$? Gradient defines both direction and value of biggest increase of $f$ at certain point, not how much $x$ changes and so it has no sense to me that we use it in formula to compute new values of $x$.
The intuitive notion behind this choice is that a graph is usually steeper farther away from an extreme and flatter closer to an extreme.
So if the gradient is large (as in far away from $0$), then we are presumably far away from an extreme, and we can take a large step without worrying too much about stepping past the extreme we're looking for.
If the gradient is small (as in close to $0$), then presumably we are getting close to a local extreme, so we reduce our step size accordingly so that we don't stray too far away.
Using the gradient directly rather than trying to make a more qualified guess as to how far away the extreme is makes this algorithm easy to calculate, while at the same time it turns out to give decent results. A lot rides on the choice of $\alpha$, though. Which technically doesn't have to be constant.