What is intuition behind direction of derivative of a function?

392 Views Asked by At

I can't quite grasp the concept of the why gradient of a function points in the direction of steepest ascent. Thinking about it lead me to the basic notions of derivative, but here example first:

$$f(x) = x^2$$

$$f'(x) = 2x$$ $$f'(-1) = -2, f'(1) = 2$$

Derivative basically says how much our function will change with respect to little change in its argument. I prefer to think about derivative as the velocity with which function changes in the given point.

So the absolute value of the derivative will tell us how much it will change. And the sign of derivative corresponds to the direction in which we should change the argument to get increase in the function(we can do $ x_{next} = x + hf'(x)$, where $h$ is a small number and we always will get increase in the value of a function)

So I do not understand why the derivative(or partial derivative, same stuff) always points in the direction in which we should change the argument to get increase in the function, given that value of derivative tells us the speed of a change of this function?

3

There are 3 best solutions below

4
On

In multiple dimensions, if a function is differentiable, then you can prove that the partial derivative along a direction $v$ is

$$ D_v f = \nabla f \cdot v$$

(Note: for a partial derivative, we care about direction. The norm of the vector does not matter. To simplify the treatment, then, it is assumed that $\|v\| = 1$.) So you want to find the direction $v$ that maximizes that scalar product. It’s easy to see that it is maximized when $v$ and $\nabla f$ have the same direction, and since $v$ needs to have unit norm, we find that the only choice is $\displaystyle v= \frac{\nabla f}{\|\nabla f\|}$.

So you proved that the direction for which the directional derivative is maximized is the same as the direction of the gradient.

This of course also works in one dimension, but it’s less intuitive as there’s only only possible choice.

9
On

I will try to show some kind of geometric intuition.

The derivative, in the one variable case, points to the direction where the function is changing as "time" increases.

In the multivariable case, assuming that $f:X\subset\Bbb R^n\to\Bbb R$, the derivative at $x_0$ define the tangent (hyper)plane at the point $(x_0,f(x_0))$

$$H(f(x_0)):=\{(x_0+h,f(x_0)+\nabla f(x_0)\cdot h)\in\Bbb R^{n+1}: h\in\Bbb R^n\}\tag1$$

of the (hyper)surface defined by the graph of $f$, $$G(f):=\{(x,f(x))\in\Bbb R^{n+1}:x\in X\}\tag2$$

As in the one variable case the (hyper)plane $H(f(x_0))$ approximates linearly (and optimally) $G(f)$ as $h\to 0$. This is the meaning of the derivative as the direction of the tangent at a point.

Now the basic intuition is that the tangent (hyper)plane at $(x_0,f(x_0))$ is defined locally by the changes of $G(f)$ around $(x_0,f(x_0))$. Then it makes sense that $\nabla f(x_0)$ points to some significative direction that reflects the way that the function changes in an (arbitrarily small) neighborhood of $x_0$.

Because it must holds that

$$\lim_{h\to 0}\frac{\|f(x_0+h)-f(x_0)-\nabla f(x_0)\cdot h\|}{\|h\|}=0$$

then if $\|f(x_0+h_1)-f(x_0)\|\ge\|f(x_0+h_2)-f(x_0)\|$ it is intuitive to think that $\|\nabla f(x_0)\cdot h_1\|\ge\|\nabla f(x_0)\cdot h_2\|$. Thus this imply that $\nabla f(x_0)$ points in the direction of maximum change of $f$ in an (arbitrarily small) neighborhood of $x_0$.

Then the direction of $\nabla f(x_0)$ can be thought as the direction of instantaneous maximum change of $f$.


Other way to think about it: the gradient is defined by the partial derivatives of the function at a point. Each partial derivative act as a derivative of one variable, and all partial derivatives are just directional derivatives that are orthogonal one to each other.

And all partial derivatives together in a matrix form represent the derivative of a function at a point. In the case of a functional (a functional is a function from a vector space to it field) this matrix is just a vector named the gradient.

8
On

I'll write down here my thoughts that I am ending up with, may be someone will find it useful. Thanks for everyone who answered!

So as as I see it which is kind of mind boggling - gradient is not a value of a function, neither the argument of a function, it is a relation of both. But gradient gets treated like it is a value of a function in a way that it is placed in the same space where the value of a function "lives" as a vector. This is what made me confused for a while, because I was trying to connect it intuitively with both - change in a function and a change in an argument.

And for a while I thought of derivative as a speed of a change of a function(Feynman's lectures), which has it is full intuitive meaning for physical equations because change in time is only positive there - we cannot go back in time(i.e. if a function depends on time we cannot adjust time parameter decreasing it to get increase in a function), so thinking in this fashion do not rises questions which appear in more common place.

Mu current intuition - given single parameter of a multivariable function we take partial derivative of this function with respect to single parameter and we get rate of a change of a function with respect to a little change in its parameter. If $f(x + \Delta x) - f(x)$ is positive, then function increases give a positive change in its parameter, if it is negative then in decreases given positive change in its parameter. So as will the ratio $(f(x + \Delta x) - f(x)) / \Delta x$ .

Then as it is ratio positive, we can say that increase in a $x$ gives increase in a function value, if ratio is negative then increase $x$ gives decrease in a function value. So $sign((f(x + \Delta x) - f(x)) / \Delta x)$ can be used as a direction in which our parameter $x$ needs to be adjusted to get increase in a function. We basically have only 2 directions in which we can adjust our parameter $x$ value, so it is ether gets increase or decrease in a function(or in some special cases neither I guess, for example $f(x) = 1 + x - x$), and it holds true for every parameter in a multivariable function.

So for $f(x_1,x_2,...,x_n)$ we will get a bunch of directions, $\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_1},\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_2},,...,\frac{ \partial f(x_1,x_2,...,x_n)}{\partial x_n}$ and each of them gives direction for a change in a parameter to get increase in a function. Combined in a vector, it tells us direction in a parameter space which we should move to get increase in a function. And in is the stepeest direction because each of our single parameter derivatives points in a direction of increase(and the second and only direction for a single parameter is a decrease in a function value), so by that how partial derivative is defined we get that direction of steepest ascent.

Though I do not see how yet dot product comes in handy.

It sounds and looks quite simple now though it is hard for me to grasp, hope someone might find it helpful.