Would it be wrong to say that, since the gradient points toward the steepest ascent, it then also points toward the next greatest value?

36 Views Asked by At

The gradient of a scalar field $f(\textbf{x})$ at a point $\bf{x}_{0}$, $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}$, points toward the direction of steepest ascent. By a Taylor approximation, we have $$f(\textbf{x}_{0}+\Delta\textbf{x})\approx f(\textbf{x}_{0})+\bigg [\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot\frac{\Delta \textbf{x}}{||\Delta \textbf{x}||}\bigg ]||\Delta \textbf{x}|| $$ or, $$f(\textbf{x}_{0}+\Delta\textbf{x})\approx f(\textbf{x}_{0})+\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot \Delta \textbf{x} ~.$$

But $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}\cdot \Delta \textbf{x}$ ( and thus $f(\textbf{x}_{0}+\Delta\textbf{x})$ ) is maximum only when $\Delta \textbf{x}$ points in the same direction as $\nabla f(\textbf{x}) |_{\textbf{x}_{0}}$.

So would it be wrong to say that the gradient also points toward the next greatest value?

1

There are 1 best solutions below

0
On

$f(x+\Delta x) \approx f(x)+\nabla f(x)\cdot \Delta x+\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x + O(\Delta x^3)$

To get the maximum we go alongside $\Delta x$ until we reach a point where $\nabla f(x).\Delta x =0, \forall \Delta x$ which indicates that $\nabla f(x)=0$. Therefore the approximation becomes

$f(x+\Delta x) \approx f(x)+\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x + O(\Delta x^3)$

But we must have $\forall \Delta x:f(x+\Delta x)\le f(x)$ at the point $x$ which indicates that $\forall \Delta x:\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x \ge0$ because we assume $\Vert \Delta x\Vert \rightarrow0$ or for very small changes, thus the dominant term becomes $\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x$ and $O(\Delta x^3) \rightarrow0$. Therefore the Hessian must be positive (semi) definite in order to have maximum at a region.

Equivalently to have a minimum, the Hessian must be negative (semi) definite which means $\forall \Delta x:\frac{1}{2}\Delta x \cdot \nabla^2f(x)\cdot\Delta x \le0$

$\nabla^2f(x)$ is the hessian matrix which have the information about the curvature of a scalar field contained in it and is defined $\left[\nabla^2f(x)\right]_{ij} = \frac{\partial^2 f(x)}{\partial x_i\partial x_j}$ where $x_i$ is the $i$'th element of vector $x$.

If one knows $f(x)$ is bounded above and proves Hessian is positive (semi) definite and starts from an arbitrary initial guess $x_0$ and updates the initial guess in an iterative manner according to $x_{n+1}=x_{n}+G\nabla f(x), n=0,1,\cdot $, then in finite number of steps it can get arbitrarily close to maximum. $G$ is a matrix that varies from step to step (iteration $n$ to iteration $n+1$) at its most general form, but you can choose a constant scalar in its place like $\epsilon$. The difference will be at convergence speed. You do not need to exacly follow gradient, but every direction that makes an angle less than $\frac{\pi}{2}$ with gradient is still a good direction (angle in the $n$-dimensional space). Such directions are called "Descent" directions.