it was a while ago I read multivariable calculus so I need to refresh certain results.
Given $ f:R^n\to R $, at a local stationary point $ x $ the gradient is $ \nabla f(x) = 0 $. However, given the fact that the gradient points at the direction which $f$ increases the most, how come the gradient is zero at a local minima?
Also, about Gradient descent we use that fact to find a local minima, as per saying that if $ \nabla f$ points in the direction with maximum increase, $-\nabla f $ points in the direction of maximum decrease.
How is that equivalent?
You are right in saying that gradient points in direction where $\nabla f$ increases the most and when $f(x)$ is decreasing we have that $\nabla f(x)$ is negative.
A $\nabla f(x) = 0$ at a local minima or a local $\bf{maxima}$ (or an inflection, but we can ignore it for now)!
Why does gradient descent take us to local minima ?
Well because gradient descent is pushing in the direction of $-\nabla f(x)$ !!
$$a_{n+1} = a_n + \lambda (-\nabla f(x))$$
Your each subsequent step $a_{n+1}$ is $\lambda$ sized stride in direction opposite of the steepest increase or in other words in the direction of steepest decrease.