Understanding why $\nabla f$ points to greatest rate of change in a function, & why the derivative of $f(\mathbf r + \mathbf{\delta r})$ is $\nabla f$

739 Views Asked by At

The most helpful answer I could find was Jonathan's answer here, and I decided not to comment and bump a literally 6 years old thread for clarification on the explanation, and also because my question, I believe, is not a direct duplicate, because it asks for a question following this, and because it can't be contained by the word limit.

His explanation, in "Why is gradient the direction of steepest ascent?" is this:

Consider a Taylor expansion of this function, $$f({\bf r}+{\bf\delta r})=f({\bf r})+(\nabla f)\cdot{\bf\delta r}+\ldots$$ The linear correction term $(\nabla f)\cdot{\bf\delta r}$ is maximized when ${\bf\delta r}$ is in the direction of $\nabla f$.

I find this is a very graceful answer, but I have one confusion.

For the linear term, it is implied that the derivative for $f(\mathbf r + \mathbf{\delta r})$ is $\mathbf{\nabla f}$, and I can't seem to figure out why this is. Also it seems clear that a Taylor series for a vector-valued function seems to replace multiplication by what seems like its vector analog, the dot product, for which I have no reliable, rigorous understanding as to why other than it being sort of what I would assume it would be if one created a Taylor series for a vector-valued function.

2

There are 2 best solutions below

14
On

Without Taylor series or polynomials, it follows directly from the directional derivative formula for a differentiable function. The directional derivative (instantaneous rate of change) of $f$ at $\mathbf a$ in the direction of a unit vector $\mathbf v$ is given by $$D_{\mathbf v}f(\mathbf a) = \nabla f(\mathbf a)\cdot\mathbf v,$$ and so you get the maximum rate of change when you move in the direction of $\nabla f(\mathbf a)$ and a zero rate of change when you move orthogonal (perpendicular) to $\mathbf a$. (This is why the gradient vector gives the normal vector to level sets of $f$.)

0
On

Not sure if it is of help anymore, but I've been trying to answer the same question myself. To get the intuition behind why the gradient is related to the rate of maximum change of a function. And I believe it is covered very satisfactorily in https://tutorial.math.lamar.edu/classes/calciii/directionalderiv.aspx

For completeness related to the thread, and in order to summarize the aforementioned link, I would say the following:

  • The directional derivative contains the information of the rate of change of a function, say $z=f(x,y)$ for simplicity, since in its definition it accounts for how much more or less $x$ changes with respect to $y$ when moving from an initial point $(x,y)$ to $(x+ah, y+bh)$, via the vector $\langle a,b \rangle$.
  • The gradient is a vector defined as $\nabla f = \langle f_x,f_y \rangle$, so that the directional derivative can be written $D_{\vec{u}}f = \nabla f \cdot \vec{u}$, where $\vec{u}$ is a unit vector in the direction of interest.
  • The last formula can be rewritten taking into account the moduli and angle of the vectors. In fact, ${ D_{\vec{u}}f = \lVert \nabla f \rVert \lVert\vec{u}\rVert \cos\theta = \lVert \nabla f \rVert \cos\theta }$, with $\theta$ the angle in between the gradient vector and the unit direction vector.
  • Obviously, then, the maximum directional derivative occurs when $\theta=0$ and, thus, the gradient expresses by definition both the direction and value of the greatest rate of change of the function.

Also, taking into account that the gradient vector happens to be perpendicular to the level curves, and considering that the curves cluster together when the rate of change of the function is maximal, could give a further intuition on the matter.