The gradient of a scalar field $f(\textbf{x})$ at a point $\bf{x}_{0}$, $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}$, points toward the direction of steepest ascent. By a Taylor approximation, we have $$f(\textbf{x}_{0}+\Delta\textbf{x})\approx f(\textbf{x}_{0})+\bigg [\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot\frac{\Delta \textbf{x}}{||\Delta \textbf{x}||}\bigg ]||\Delta \textbf{x}|| $$ or, $$f(\textbf{x}_{0}+\Delta\textbf{x})\approx f(\textbf{x}_{0})+\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot \Delta \textbf{x} ~.$$
But $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot \Delta \textbf{x}$ ( and thus $f(\textbf{x}_{0}+\Delta\textbf{x})$ ) is maximum only when $\Delta \textbf{x}$ points in the same direction as $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}$.
So would it be wrong to say that the gradient also points toward the next greatest value?
$f(x+\Delta x) \approx f(x)+\nabla f(x)\cdot \Delta x+\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x + O(\Delta x^3)$
To get the maximum we go alongside $\Delta x$ until we reach a point where $\nabla f(x).\Delta x =0, \forall \Delta x$ which indicates that $\nabla f(x)=0$. Therefore the approximation becomes
$f(x+\Delta x) \approx f(x)+\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x + O(\Delta x^3)$
But we must have $\forall \Delta x:f(x+\Delta x)\le f(x)$ at the point $x$ which indicates that $\forall \Delta x:\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x \ge0$ because we assume $\Vert \Delta x\Vert \rightarrow0$ or for very small changes, thus the dominant term becomes $\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x$ and $O(\Delta x^3) \rightarrow0$. Therefore the Hessian must be positive (semi) definite in order to have maximum at a region.
Equivalently to have a minimum, the Hessian must be negative (semi) definite which means $\forall \Delta x:\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x \le0$
$\nabla^2f(x)$ is the hessian matrix which have the information about the curvature of a scalar field contained in it and is defined $\left[\nabla^2f(x)\right]_{ij} = \frac{\partial^2 f(x)}{\partial x_i\partial x_j}$ where $x_i$ is the $i$'th element of vector $x$.
If one knows $f(x)$ is bounded above and proves Hessian is positive (semi) definite and starts from an arbitrary initial guess $x_0$ and updates the initial guess in an iterative manner according to $x_{n+1}=x_{n}+G\nabla f(x), n=0,1,\cdot $, then in finite number of steps it can get arbitrarily close to maximum. $G$ is a matrix that varies from step to step (iteration $n$ to iteration $n+1$) at its most general form, but you can choose a constant scalar in its place like $\epsilon$. The difference will be at convergence speed. You do not need to exacly follow gradient, but every direction that makes an angle less than $\frac{\pi}{2}$ with gradient is still a good direction (angle in the $n$-dimensional space). Such directions are called "Descent" directions.