I can't quite grasp the concept of the why gradient of a function points in the direction of steepest ascent. Thinking about it lead me to the basic notions of derivative, but here example first:
$$f(x) = x^2$$
$$f'(x) = 2x$$ $$f'(-1) = -2, f'(1) = 2$$
Derivative basically says how much our function will change with respect to little change in its argument. I prefer to think about derivative as the velocity with which function changes in the given point.
So the absolute value of the derivative will tell us how much it will change. And the sign of derivative corresponds to the direction in which we should change the argument to get increase in the function(we can do $ x_{next} = x + hf'(x)$, where $h$ is a small number and we always will get increase in the value of a function)
So I do not understand why the derivative(or partial derivative, same stuff) always points in the direction in which we should change the argument to get increase in the function, given that value of derivative tells us the speed of a change of this function?
In multiple dimensions, if a function is differentiable, then you can prove that the partial derivative along a direction $v$ is
$$ D_v f = \nabla f \cdot v$$
(Note: for a partial derivative, we care about direction. The norm of the vector does not matter. To simplify the treatment, then, it is assumed that $\|v\| = 1$.) So you want to find the direction $v$ that maximizes that scalar product. It’s easy to see that it is maximized when $v$ and $\nabla f$ have the same direction, and since $v$ needs to have unit norm, we find that the only choice is $\displaystyle v= \frac{\nabla f}{\|\nabla f\|}$.
So you proved that the direction for which the directional derivative is maximized is the same as the direction of the gradient.
This of course also works in one dimension, but it’s less intuitive as there’s only only possible choice.