Why is gradient the direction of steepest ascent?

98.2k Views Asked by At

$$f(x_1,x_2,\dots, x_n):\mathbb{R}^n \to \mathbb{R}$$ The definition of the gradient is $$ \frac{\partial f}{\partial x_1}\hat{e}_1 +\ \cdots +\frac{\partial f}{\partial x_n}\hat{e}_n$$

which is a vector.

Reading this definition makes me consider that each component of the gradient corresponds to the rate of change with respect to my objective function if I go along with the direction $\hat{e}_i$.

But I can't see why this vector (defined by the definition of the gradient) has anything to do with the steepest ascent.

Why do I get maximal value again if I move along with the direction of gradient?

6

There are 6 best solutions below

10
On BEST ANSWER

Each component of the gradient tells you how fast your function is changing with respect to the standard basis. It's not too far-fetched then to wonder, how fast the function might be changing with respect to some arbitrary direction? Letting $\vec v$ denote a unit vector, we can project along this direction in the natural way, namely via the dot product $\text{grad}( f(a))\cdot \vec v$. This is a fairly common definition of the directional derivative.

We can then ask in what direction is this quantity maximal? You'll recall that $$\text{grad}( f(a))\cdot \vec v = |\text{grad}( f(a))|| \vec v|\text{cos}(\theta)$$

Since $\vec v$ is unit, we have $|\text{grad}( f)|\text{cos}(\theta)$, which is maximal when $\cos(\theta)=1$, in particular when $\vec v$ points in the same direction as $\text{grad}(f(a))$.

3
On

Consider a Taylor expansion of this function, $$f({\bf r}+{\bf\delta r})=f({\bf r})+(\nabla f)\cdot{\bf\delta r}+\ldots$$ The linear correction term $(\nabla f)\cdot{\bf\delta r}$ is maximized when ${\bf\delta r}$ is in the direction of $\nabla f$.

0
On

The question you're asking can be rephrased as "In which direction is the directional derivative $\nabla_{\hat{u}}f$ a maximum?".

Assuming differentiability, $\nabla_{\hat{u}}f$ can be written as:

$$\nabla_{\hat{u}}f = \nabla f(\textbf{x}) \cdot \hat{u} =|\nabla f(\textbf{x})||\hat{u}|\cos \theta = |\nabla f(\textbf{x})|\cos \theta$$

which is a maximum when $\theta =0$: when $\nabla f(\textbf{x})$ and $\hat{u}$ are parallel.

0
On

Each component of the derivative $$ \frac{\partial f}{\partial x_1}\ ... \frac{\partial f}{\partial x_n}$$ tells you how fast your function is changing with respect to the standard basis.
It's now possible to make a basetransformation to an orthogonal base with $ n-1 $ base Directions with $0$ ascent and the gradient direction. In such a base the gradient direction must be the steepest since any adding of other base directions adds length but no ascent.

For a 3 dimensional Vector space the base could look like this $$ \left( \left( \begin{matrix} \partial x_2 \\ -\partial x_1 \\ 0 \end{matrix} \right) \left( \begin{matrix} \partial x_1 \\ \partial x_2 \\ -\dfrac{(\partial x_1)²+(\partial x_2)²}{\partial x_3} \end{matrix} \right) \left( \begin{matrix} \partial x_1 \\ \partial x_2 \\ \partial x_3 \end{matrix} \right) \right) $$ By complete induction it can now be shown that such a base is constructable for an n-Dimensional Vector space. $$ \left( \left( \begin{matrix} \partial x_2 \\ -\partial x_1 \\ 0 \\ 0 \end{matrix} \right) \left( \begin{matrix} \color{blue}{\partial x_1 \\ \partial x_2} \\ -\dfrac{(\partial x_1)²+(\partial x_2)²}{\partial x_3} \\ 0 \end{matrix} \right) \left( \begin{matrix} \color{blue}{\partial x_1 \\ \partial x_2} \\ \color{green}{\partial x_3} \\ -\dfrac{(\partial x_1)²+(\partial x_2)²+(\partial x_3)²}{\partial x_4} \end{matrix} \right) \left(\begin{matrix} \color{blue}{\partial x_1 \\ \partial x_2} \\ \color{green}{\partial x_3} \\ \color{orange}{\partial x_4} \end{matrix} \right) \right) $$ One can see here that the first Basevector demands the first 2 Elements of the following Basevectors to be $\partial x_1$ & $\partial x_2$ because of the orthogonal condition,
similarly the 2nd vector demands all the 3rd elements of the following vectors to be $\partial x_3$
as does the 3rd vector for the 4th element them being $\partial x_4$.

If another dimension is added the n+1 Element of the n$th$ Vector needs to be $$-\dfrac{(\partial x_1)²+...+(\partial x_n)²}{\partial x_{n+1}}$$ to meet the $0$ ascension condition which in turn forces the new n+1$th$ Vector to be of the form $$\left(\begin{matrix}\partial x_1 \\ ... \\ \partial x_{n+1}\end{matrix}\right)$$ for it to be orthogonal to the rest.

0
On

Sorry for posting so late, but I found that a few more details added to the first post made it easier for me to understand, so I thought about posting it here, also

Let $\vec{n}$ be a unit vector oriented in an arbitrary direction and $T(x_{0}, y_{0}, z_{0})$ a scalar function which describes the temperature at the point $(x_{0}, y_{0}, z_{0})$ in space. The directional derivative of $T$ along this direction would be $$\frac{\partial T}{\partial \vec{n}} = \nabla T \cdot \vec{n} = \| \nabla T \| cos(\theta)$$, where $\theta$ is the angle between the gradient vector and the unit vector $\vec{n}$.

Now, consider three cases:

  1. $\theta =0$ - steepest increase In this case, $$\nabla T \cdot \vec{n} = \| \nabla T \|$$ Now multiply this equation by $\nabla T$ and you get $$ \| \nabla T \| ^{2} \vec{n} =\| \nabla T \| \nabla T $$, so if you divide by $ \| \nabla T \| ^{2}$, you get that $$ \vec{n}= \frac{\nabla T}{\| \nabla T \|}$$ Let's look at that for a moment: the direction in space ($\vec{n}$) for which you get the steepest increase ($\theta=0$) is in the same direction and has the same orientation as the gradient vector (since the multiplying factor is just a positive constant). That means that the gradient's orientation coincides with the direction of steepest increase (steepest increase because the directional derivative has the maximum value it can have)

  2. $\theta=\pi$ - steepest decrease In this case you get $$ \vec{n}= -\frac{\nabla T}{\| \nabla T \|}$$ So the gradient's orientation is opposite to that of steepest decrease (steepest decrease because the directional derivative has the "most negative" value)

  3. $\theta=\pi /2$ - no change Here you get that the dot product between the direction defined by $\vec{n}$ and the gradient's one is 0, so you have no change in the field (because the directional derivative is 0). Interesting, along the direction which is perpendicular to the gradient vector you have constant values for the scalar function, $T$. Which makes sense, since the gradient field is perpendicular to the contour lines

0
On

The key is the linear approximation of the function $f$.

First you must realize that near a given point $(x_1, x_2, ..,x_n)$, the change of $f$ is dominated by its first order partial derivatives. It can be approximated by a plane near that point:

$$ \Delta f(x_1+\Delta x_1, .. , x_n+\Delta x_n)=\frac{\partial f}{\partial x_1}\Delta x_1 + ... + \frac{\partial f}{\partial x_n}\Delta x_n $$

which is just the dot product between the gradient vector $\nabla f$ and the "change vector" $(\Delta x_1, .., \Delta x_n)$. So for a given length of the the "change vector", $\Delta f$ is the greatest when the change vector is in the same direction as the gradient.