Gradient direction / descent

Question

Gradient direction / descent

153 Views Asked by Bumbble Comm At 15 May 2026 - 5:03

it was a while ago I read multivariable calculus so I need to refresh certain results.

Given $ f:R^n\to R $, at a local stationary point $ x $ the gradient is $ \nabla f(x) = 0 $. However, given the fact that the gradient points at the direction which $f$ increases the most, how come the gradient is zero at a local minima?

Also, about Gradient descent we use that fact to find a local minima, as per saying that if $ \nabla f$ points in the direction with maximum increase, $-\nabla f $ points in the direction of maximum decrease.

How is that equivalent?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 24 Jan 2021 - 6:23

Main Answer

the gradient points at the direction which $f$ increases the most

…

$-\nabla f $ points in the direction of maximum decrease

Those statements are a good rule of thumb, but they are only guaranteed to hold under four conditions:

$\nabla f$ is not zero at the point.
You're looking "close enough" to the point.
$f$ is differentiable at the point.
You're sort of "averaging" $f$'s behavior near the point (or the derivative of $f$ is continuous at the point).

If $f$ has a nice formula, then typically the last two conditions hold. And you might have already had the intuition about the second condition. But the first condition is the crux of your question.

1. Nonzero Gradient

Single Variable Functions

In order to understand the basic idea of why we need $\nabla f\ne\boldsymbol 0$, it suffices to look at $f:\mathbb R\to\mathbb R$. The derivative at a point then tells you the slope of a line that the function looks approximately like as you zoom in arbitrarily far. (See, for instance How is the derivative truly, literally the "best linear approximation" near a point?.)

If $f'(a)>0$, then the line has positive slope, and the value of $f$ would generally increase if you move to the right (the direction of $\nabla f=\langle f'(a)\rangle$) from $a$ a small amount. Similarly, if $f'(a)<0$, then the line has negative slope, and the value of $f$ would generally increase if you move to the left from $a$ a small amount.

But if $f'(a)=0$ a lot of things could happen. For example, $f(x)=x^4-3x^2$ has a local maximum at $x=0$ that's not a global maximum. $f(x)=7$ has no direction of increase at all. $f(x)=x^3$ always has a direction of increase to the right, but still looks approximately like a horizontal line when you zoom in to the origin, so $f'(0)=0$.

Even if the function is given by a nice formula, the idea is that "looks approximately like a horizontal line" doesn't tell you which of these things the function acts like. To determine the direction of increase of a nice function with $f'(a)=0$, you need more information than the first derivative provides.

If the second derivative were a nonzero value, you'd know both directions increase or decrease, according to the second derivative test. If $f''(a)=0$ as well, it might be the case that one direction increases and the other does not. If we then have, say, $f'''(a)<0$, then Taylor's theorem tells us that the function is approximately $f'''(a)(x-a)^3+f(a)$, which we know increases to the left of $a$ since it basically looks like $-x^3$.

Higher Dimensions

In $2\text{D}$, the gradient isn't just a number, so there are many directions it can point. But (assuming differentiability), it still tells us about a good local approximation: $f(x,y)$ is approximately $\nabla f(\mathbf a)\bullet\left(\langle x,y\rangle-\mathbf a\right)+f(\mathbf a)$ near $\mathbf a$.

If $\nabla f(\mathbf a)\ne\boldsymbol 0$, then this dot product is maximized when $\langle x,y\rangle-\mathbf a$ goes in the same direction as $\nabla f(\mathbf a)$. So that explains why it would be the "direction of greatest increase".

But if $\nabla f(\mathbf a)=\boldsymbol 0$, then the best local linear approximation is the horizontal plane through $f(\mathbf a)$. And just like in $1\text{D}$, this doesn't really tell you anything about where nearby the function might increase/decrease.

The $n\text{D}$ case completely analogous.

2. "Close Enough" and Gradient Ascent/Descent

As mentioned above, $f(x)=x^4-3x^2$ has a local maximum at the origin. So if you start in the interval $\left(-\sqrt{\dfrac32},\sqrt{\dfrac32}\right)$ and follow the gradient/sign of the derivative, you'll end up at the local maximum value $0$. But if you start outside of that interval, you'll reach values with $|x|>\sqrt3$ where the function is positive, and $f$ (and $|f'|$) will increase without bound. So "greatest increase" has to be interpreted in a local sense for this sort of reason.

3. Differentiability

A lot of the intuition here works out if $f$ is differentiable. But things can break down quite badly if that doesn't hold.

Directions of Ascent

For example, consider the function $f(x,y)=\sqrt[3]{xy}+x$. The gradient at the origin is $\langle1,0\rangle$. But the directional derivative doesn't even exist in any direction not aligned with the axes.

The Wrong Direction

We can still have problems even if we require that the function is continuous around the point and all directional derivatives exist. Consider:

$$f(x,y)=\cases{\dfrac{x^2y}{x^2+y^2} + \dfrac{y}2 & if $(x,y)\ne(0,0)$\\0 & if $(x,y)=(0,0)$}$$

The gradient at the origin is $\langle0,\dfrac12\rangle$. We might hope the direction of greatest increase is $\langle0,1\rangle$, with directional derivative $\dfrac12$. But the directional derivative is attained in, for instance, the direction $\left\langle\dfrac1{\sqrt2},\dfrac1{\sqrt2}\right\rangle$. That directional derivative is $\left(\dfrac{\left(\dfrac{h}{\sqrt{2}}\right)^{3}}{2\left(\dfrac{h}{\sqrt{2}}\right)^{2}}+\dfrac{\dfrac{h}{\sqrt{2}}}{2}\right)/h=\dfrac{1}{\sqrt2}>\dfrac12$

4. "Averaging"

Even if the first three conditions hold, it needn't be the case that $f$ is actually increasing on a tiny line segment with one end at the point in question. $f$ is just well-approximated by something that is increasing.

Scattered Points

Consider the function $f(x)=\cases{x^2+x & if $x\in\mathbb Q$\\x & if $x\not\in\mathbb Q$}$ and note that $f'(0)=1$. For $x$ near zero, the points on the graph of $f$ jump up and down between the parabola $x^2+x$ and the line $x$. Because of this, the sense in which $f$ is "increasing to the right of $0$" is a bit strained.

Infinite Oscillation

For a function with a more connected graph, consider $f(x)=\cases{x^2\sin\left(\dfrac1x\right) - \dfrac{x}2 & if $x\ne0$\\0 & if $x=0$}$. We have $f'(0)=-\dfrac12$. However, at points of the form $\dfrac{1}{(2k+1)\pi}$, the derivative is $\dfrac12>0$. So even though the function "should" be increasing as we move leftwards from zero, it actually decreases near zero on infinitely many intervals.

**Bumbble Comm** · Accepted Answer

You are right in saying that gradient points in direction where $\nabla f$ increases the most and when $f(x)$ is decreasing we have that $\nabla f(x)$ is negative.

A $\nabla f(x) = 0$ at a local minima or a local $\bf{maxima}$ (or an inflection, but we can ignore it for now)!

Why does gradient descent take us to local minima ?

Well because gradient descent is pushing in the direction of $-\nabla f(x)$ !!

$$a_{n+1} = a_n + \lambda (-\nabla f(x))$$

Your each subsequent step $a_{n+1}$ is $\lambda$ sized stride in direction opposite of the steepest increase or in other words in the direction of steepest decrease.

Gradient direction / descent

There are 2 best solutions below

Main Answer

1. Nonzero Gradient

Single Variable Functions

Higher Dimensions

2. "Close Enough" and Gradient Ascent/Descent

3. Differentiability

Directions of Ascent

The Wrong Direction

4. "Averaging"

Scattered Points

Infinite Oscillation

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in VECTOR-ANALYSIS

Related Questions in SCALAR-FIELDS

Trending Questions

Popular # Hahtags

Popular Questions