Intuition regarding the directional derivative

130 Views Asked by At

The directional derivative is defined as,

$$ D_{\vec{v}} f(x,y,z) = \nabla f \cdot \vec{v}$$

Now, this gives a scalar indicating how much a scalar function changes in direction of some unit vector. What confuses is me is that, sometimes I see it written as this

$$ D_{v} f( \vec{x}) = \nabla f \cdot v$$

Like, giving a vector input into the function. What is the idea behind this? From what I know, vectors and points are completely different kind of mathematical objects (but related)

And, other times in physics, I see it written as

$$ D_{\vec{v}} \phi = \frac{ d \phi( \vec{ r (t) } ) }{dt}$$

Now how exactly is all these different definitions connected?

1

There are 1 best solutions below

4
On BEST ANSWER

On a very technical level, points and vectors are different, that's true. Vectors are elements of a vector space, and points are elements of an affine space (the definition of which is not important for this answer). Derivatives of all kind are at first only defined on vector spaces. To rigorously define derivatives on affine spaces, we have to do some more legwork, which is usually done in differential geometry (affine spaces are an especially simple kind of manifold, the central object of study in differential geometry). But affine spaces have the nice property that they are almost like a vector space in the sense that we can just pick a point, declare it to be the origin, and then treat every point the same as the vector connecting the origin with that point. So any function on a set of "points" can be thought of as a function on a set of vectors, and any calculus we do on the vector function translates perfectly to calculus on the point function. And in that sense, it really makes no difference wether we write a function using coordinates of a point as arguments, or vectors. In many ways, using vectors is actually nicer, because coordinate free math usually tells us a lot more about the actual structure of the mathematical theory, because it doesn't depend on arbitrary choices (like picking a coordinate system).

And about the directional derivative: the first two are not good definitions, in my opinion. They are formulas which, under specific circumstances, can be derived from a definition which actually captures the essence of a directional derivative: the instantaneous rate of change at a point if that point is approached along a specific path (the "direction"). In the "physical" definition, $r(t)$ is the path, and $\phi(r(t))$ is the function evaluated along the path. The rate of change along the path at $r_0:=r(t_0)$ is then

$$\left.\frac{\mathrm d\phi(r(t))}{\mathrm dt}\right\vert_{t=t_0}.$$

So that's a good definition of the directional derivative along the path $r$. And if $r(t)=r_0+v(t-t_0)$, then the path is a straight line through $r_0$ in the direction $v$, and we can say that the directional derivative at $r_0$ in direction $v$ is the directional derivative along the path $r(t):=r_0+v(t-t_0)$ at $r(t_0)=r_0$.

And only now can we also find that if a function is totally differentiable (which it doesn't need to be, even if all directional derivatives exist!), then the directional derivative in the direction $v$ can be calculated as $\mathrm D_v f=\nabla f\cdot v$. It's just a special case of the multivariable chain rule applied to $f\circ r$, which only holds for totally differentiable functions.