Why is the directional derivative a dot product of a vector and a gradient?

257 Views Asked by At

In this question I'm looking for an intuitive explanation, which could provide me some "a-ha!" moments.


As we all know, the derivative is just a $\tan$ of an angle in the triangle, whose sides are $df$ and $dx$.

The formula for the directional derivative is $\nabla f \cdot d \mathbf{\vec{v}} = \mathbf{\vec{v}_x} \frac{\partial f}{\partial x}+\mathbf{\vec{v}_y} \frac{\partial f}{\partial y}+\mathbf{\vec{v}_z} \frac{\partial f}{\partial z} + ...$

If we start to read this formula out loud, we'll end up getting this:

We take a $\tan$ of an angle of a triangle, whose sides are $\partial f$ and $\partial x$, then we multiply this $\tan$ by $\vec{v}_x$. What it gives us is a $\tan$ of a new triangle, whose sides are $\partial f * \vec{v}_x$ and $\partial x$. This feels kind of arbitrary, but I guess it may still make some sense.

But then we take the $\tan$ of this triangle, and add it up with another $\tan$ of a triangle, whose sides are $\partial f * \vec{v}_y$ and $\partial y$. What does this operation even mean? Why do we add two tangets, what geometrical means does this operation even has?


If we try to shift our perspective towards the geometrical interpretation of the dot product operation, it still barely makes any sense to me.

We take a gradient of a function at some point, which is a vector $\nabla f$, whose coordinates are all numerically equivalent to the $\tan$ values of infinitely small triangles in $x, y, z, ...$ planes. If we pause and ponder here, it may look like a pretty strange vector, whose properties are not obvious at all.

Then we project our vector $\vec{v}$ onto the vector $\nabla f$, and multiply the length of the projected vector with the length of $\nabla f$. And then we somehow end up with a $\tan$ of an infinitely small triangle in a plane constructed from the vector $\vec{v}$ and the function result axis.

To me it is not obvious how this could lead us to the result we wanted.

2

There are 2 best solutions below

0
On

Consider that the dot product is a weighted projection of the vectors (commutative incidentally). Also consider in the 2D case $\nabla f(x,y)$ as being defined by an infinitesimal 3D-oriented square at $(x,y,f(x,y))$. Projecting your direction-vector on $\nabla f$ is the same as vertically projecting your direction vector on that hillslope (square) and seeing how much higher the vector takes you compared to your starting point. The magnitude of $\nabla f$ is the rate of height increase in the uphill direction of the square (unique for non-zero gradients), and the further you move along the square's plane in some direction, the higher you go, in proportion with the projectional ratio ($\cos$) between the two vector directions of $\nabla f$ and $\vec{v}$. Make sure you understand that $\nabla f$ is one of the direction vectors itself.

0
On

The dot product (and inner products in general) are a measure of alignment of two vectors. If the product is negative, we can say the vectors are pointing in opposite directions, in some sense, and the vectors are most aligned when they are linearly dependent with a positive constant (the Cauchy-Schwarz inequality formalizes this). The dot product definition follows from being thought of as a projection in terms of how aligned the direction is with the direction of the gradient, which is best thought of as the direction of maximal increase.