Why isn't the directional derivative generally scaled down to the unit vector?

4.6k Views Asked by At

I'm starting to learn how to intuitively interpret the directional derivative, and I can't understand why you wouldn't scale down your direction vector $\vec{v}$ to be a unit vector.

Currently, my intuition is the idea of slicing the 3D graph of the function along its direction vector and then computing the slope of the curve created by the intersection of the plane.

But I can't really understand how the directional derivative would be a directional derivative if it were not scaled down to be a change in unit length in the direction of $\vec{v}$. Is there an intuitive understanding I can grasp onto? I'm just starting out so maybe I haven't gotten there yet.

Note, I think there may be a nice analogy to linearization, like if you take "twice as big of a step" in the direction of $\vec{v}$ , then the change to the function due to the change in this step is twice as big. Is this an okay way to think about it?

7

There are 7 best solutions below

6
On BEST ANSWER

The intuition I think of for a directional derivative in the direction on $\overrightarrow{v}$ is that it is how fast the function changes if the input changes with a velocity of $\overrightarrow{v}$. So if you move the input across the domain twice as fast, the function changes twice as fast.

More precisely, this corresponds to the following process that relates calculus in multiple variables to calculus in a single variable. In particular, we can define a line based at a point $\overrightarrow{p}$ with velocity $\overrightarrow{v}$ parametrically as a curve: $$\gamma(t)=\overrightarrow{p}+t\overrightarrow{v}.$$ This is a map from $\mathbb R$ to $\mathbb R^n$. However, if $f:\mathbb R^n\rightarrow \mathbb R$ is another map, we can define the composite $$(f\circ \gamma)(t)=f(\gamma(t))$$ and observe that this is a map $\mathbb R\rightarrow\mathbb R$ so we can study its derivative! In particular, we define the directional derivative of $f$ at $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$ to be the derivative of $f\circ\gamma$ at $0$.

However, when we do this, we only see a "slice" of the domain of $f$ - in particular, we only see the line passing through $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$. This corresponds to the notion of slicing you bring up in your question. In particular, we do not see any values of $f$ outside of the image of $\gamma$, so we are only studying $f$ on some restricted set.

1
On

Let $f : \mathbb{R}^n \to \mathbb{R}^m$ and (if the limit exists) $$D_v f(x) = \lim_{h \to 0} \frac{f(x+hv)-f(x)}{h}$$ be the directional derivative in the direction $v$. This way, if the function is differentiabble $$ D_{au+bv} f(x) = a\, D_{u} f(x)+b\, D_{v} f(x) \qquad (a,b) \in \mathbb{R}^2$$ ie. the directional derivative is linear in the direction. Indeed $$D_{v} f(x) = J_x v$$ where $J_x$ is the Jacobian matrix.

You'll have some problems for saying and understanding that if you restrict to $\|v\|=1$, or worse if you normalize $D_vf(x)$

3
On

I used to feel uncomfortable about this also. One point is that there is no harm in allowing $\vec v$ not to be a unit vector, and it is arguably simpler to omit this requirement because it's not necessary anyway. Another point is that it is sometimes interesting and useful to think of the directional derivative $D_{\vec v}f(x)$ as a function of $\vec v$, with $x$ held fixed. This function has the nice property that if you scale the input, the output gets scaled the same way. But in order to make this statement, we must not require $\vec v$ to be a unit vector.

0
On

I originally left comments on other answers, but perhaps they deserve to be combined into an answer of their own.


To make the reasoning in Milo's answer less abstract, imagine the function $f$ that we're interested in gives the altitude at a given point of land, and we're driving around. Then our velocity as we pass through the point $p$ is given by some vector $v$, and we can work out how fast our altitude is changing by finding the directional derivative of $f$ in the direction of $v$ (at point $p$).

You should really think of directional derivatives in terms of a function $\nabla_p$, the gradient of $f$ (at $p$), which takes any vector based at $p$ as input and gives the directional derivative of $f$ in the direction of $v$ (at point $p$) as output. As a function of vectors based at $p$, $\nabla_p$ is linear (as user1952009 indicated), and this is what makes it so useful: for example, it follows that for any two vectors $v$, $w$, $\nabla_p(v+w) = \nabla_p(v) + \nabla_p(w)$. And, as you noted, $\nabla_p(av) = a\nabla_p(v)$ for any scalar $a$.

In general, the reason derivatives are useful in the first place is precisely because they allow us to approximate arbitrary differentiable functions near a given point using only linear functions. The latter are far simpler, with the nice behaviour illustrated above, which enables many useful constructions - first in single- and multi-variable calculus, and later in differential and Riemannian geometry. For example, the fundamental theorem of calculus (that differentiation and integration are "inverse" operations) generalizes to Stokes' theorem for manifolds, a result which is both beautiful and used in an incredibly diverse range of settings.

7
On

Unit vectors are vastly overrated — the notion of vector is far more computationally convenient when treated as a whole rather than decomposed into separate notions of direction and magnitude.

I claim it leads to better understanding as well.

Thus, one should not introduce unit vectors by habit — such a manipulation should be reserved for those circumstances when it does something useful.

Similarly, a good definition or computational tool shouldn't force unit vectors upon the user, unless there is a very good reason for doing so.


Algebraically, the directional derivative is not the main idea — the main idea is the differential of a function: in usual terms, $\nabla f$ is the row vector given by

$$ \nabla f(\vec{x}) = \begin{pmatrix} f_1(\vec{x}) & f_2(\vec{x}) & f_3(\vec{x}) \end{pmatrix} $$

where by $f_k$, I mean the derivative of the function $f$ in its $k$-th place. The directional derivative is simply the product of the differential with the given (column) vector:

$$ \nabla_\vec{v} f = (\nabla f) \vec{v} $$

As such, restricting to unit vectors is unnatural thing to do. Rescaling the input vector to be a unit vector is extremely unnatural.

Note that some people use $\nabla f$ to refer to a column vector, or even treat row and column vectors as the same thing. This is unfortunate, because it is computationally awkward when you change variables, and gets in the way of understanding the difference between vectors and covectors, and the close relationship between the inner product and the transpose operation.


Finally, it's worth noting that derivatives — even directional derivatives — make sense in contexts where there is no notion of length, and thus there is no notion of a "unit" vector that can be applied.

1
On

Two such reasons:

  1. This will fail for the zero vector.

  2. Do you normalize derivatives for the 1-dimensional case? Should you? Most would say no.

2
On

Here's a decent reason to have the thing we call the directional derivative. And not require the reference vector to be of unit magnitude.

Start with the case of $f:\mathbb{R}\to\mathbb{R}$. Write the directional derivative with respect to some arbitrary constant $h\ne0$, and apply first principles

$D_{h}f\left[a\right]=\lim_{t\to0}\frac{f[a+th]-f[a]}{t}$

$=h\lim_{t\to0}\frac{f[a+th]-f[a]}{th}$

$=hf^{\prime}\left[a\right]$.

Neither surprising nor particularly interesting. It's just another way of saying what we already knew.

$x\left[t\right]=a+th$

$g\left[t\right]=f\left[x\left[t\right]\right]$

$g^{\prime}\left[0\right]=f^{\prime}\left[a\right]x^{\prime}=hf^{\prime}\left[a\right].$

But it gets a bit more interesting if we use it to build up a Taylor series for a real-valued function of a real argument. I'll leave the notation to speak for itself.

$D_{h}D_{h}f\left[a\right]=D_{h}^{2}f\left[a\right]=h^{2}f^{\prime\prime}\left[a\right]$.

$D_{h}^{r}f\left[a\right]=h^{r}f^{(r)}\left[a\right]$.

$D_{h}^{0}f\left[a\right]=f\left[a\right]$.

$f(a+h)=\sum_{r=0}^{k}\frac{D_{h}^{r}f\left[a\right]}{r!}+\frac{D_{h}^{k+1}f\left[\zeta\right]}{(k+1)!}$.

I think that's kind of pretty.

Now lets call $f:\mathbb{R}^{n}\to\mathbb{R}$. I use this non-standard notation as a shorthand:

$\{\mathbb{J}\}=\left\{ \left\{ j_{i}\geq0\right\} _{n}|\sum_{i=1}^{n}j_{i}=k\right\} $.

Think of the $\mathbb{J}^{'s}$ as multi-indices. Each $\mathbb{J}$ is a set of non-negative integers $\{j_{1},\dots,j_{n}\}$ such that

$j_{1}+\dots+j_{n}=k$.

Now use those indecies to write the multinomial expansion of a sum of n terms:

$\left(x_{1}+\dots+x_{n}\right)^{k}=\sum_{\mathbb{J}}\begin{pmatrix}k\\ j_{1}\dots j_{n} \end{pmatrix}x_{1}^{j_{1}}\dots x_{n}^{j_{n}}$.

Here the coefficients are the multinomial generalization of the binomial coefficients:

$\begin{pmatrix}k\\ j_{1}\dots j_{n} \end{pmatrix}=\frac{k!}{j_{1}!\dots j_{n}!}=\begin{pmatrix}k\\ \mathbb{J} \end{pmatrix}$.

Now write the Taylor series as before, but using $\mathfrak{h}\in\mathbb{R}^{n}$.

$f(\mathfrak{a}+\mathfrak{h})=\sum_{r=0}^{k}\frac{D_{\mathfrak{h}}^{r}f\left[\mathfrak{a}\right]}{r!}+\frac{D_{\mathfrak{h}}^{k+1}f\left[\vec{\zeta}\right]}{(k+1)!}$

$=P_{k}\left[\mathfrak{h}\right]+R_{k}\left[\mathfrak{h}\right].$

The directional derivative is:

$D_{\mathfrak{h}}f\left[\mathfrak{a}\right]=\mathfrak{h}\cdot\nabla\left[f\left[\mathfrak{a}\right]\right]=\left(h_{1}D_{1}+\dots+h_{n}D_{n}\right)\left[f\left[\mathfrak{a}\right]\right].$

So:

$D_{\mathfrak{h}}^{r}f\left[\mathfrak{a}\right]=\left(h_{1}D_{1}+\dots+h_{n}D_{n}\right)^{r}\left[f\left[\mathfrak{a}\right]\right].$

So the Taylor polynomial of degree $k$ is:

$P_{k}\left[\mathfrak{h}\right]=\sum_{r=0}^{k}\sum_{\mathbb{J}}\begin{pmatrix}r\\ j_{1}\dots j_{n} \end{pmatrix}\left(h_{1}D_{1}\right)^{j_{1}}\dots\left(h_{n}D_{n}\right)^{j_{n}}\left[f\right]$.

The $\left[f\right]$ indicates that the beast to the left is an opperator being applied to $f$.