Definition of directional derivative: Why does it work?

287 Views Asked by At

The definition of the directional derivative in my textbook is

$$ \nabla_{\vec{v}} f = \lim\limits_{h\to 0}\frac{f(\vec{x} + h \vec{v} )-f(\vec{x})}{h} $$

with $\vec{x} = (x_1, x_2)$ and $\vec{v} = (v_1, v_2)$ (I first want to consider the two variable case).

However, before looking at this definition I tried to come up with it on my own: Say I'm given a function $f(x_1, x_2)$ and wanted to evaluate the directional derivative for a vector $\vec{v}$ for $x_1 = 0$ and $x_2 = 0$. Similar to the one-variable case I would consider the slope of the secant $\Delta f$ $$ \Delta f = \frac{f(x_1 + v_1, x_2 + v_2) -f(x_1, x_2)}{\Vert \vec{v}\Vert} $$ so when introducing some real number $h > 0$ which will allow us to make the change in $(x_1, x_2)$ arbitrary small we can consider the limit $$ \nabla_{\vec{v}} f = \lim\limits_{h\to 0}\frac{f(\vec{x} + h \vec{v} )-f(\vec{x})}{h\Vert\vec{v}\Vert} $$ whereby $\Vert\cdot\Vert$ denotes the standard $\mathbb{R}^2$ norm.

Is the latter definition also correct? I doubt it, because if

$$ \nabla_{\vec{v}} f = \lim\limits_{h\to 0}\frac{f(\vec{x} + h \vec{v} )-f(\vec{x})}{h} = L \neq 0 $$ then

$$ \nabla_{\vec{v}} f = \lim\limits_{h\to 0}\frac{f(\vec{x} + h \vec{v} )-f(\vec{x})}{h\Vert\vec{v}\Vert} = \frac{1}{\Vert \vec{v}\Vert} \lim\limits_{h\to 0}\frac{f(\vec{x} + h \vec{v} )-f(\vec{x})}{h} = \frac{1}{\Vert \vec{v}\Vert} L \neq L. $$

I hope someone can clarify this for me.

Edit: I know that the first definition is more general because it doesn't require the existence of a norm but I still don't see why it is correct. Also, in my attempt I don't assume that $\Vert\vec{v}\Vert = 1$ because I just think of it as the length of the vector. Why is this incorrect?

4

There are 4 best solutions below

0
On BEST ANSWER

Your definition of the directional derivative is indeed correct, or at least it is one way how one could define it. In fact, your definition is equivalent to the one in terms of a unit vector, which intuitively can be explained by noticing that we are only interested in the direction of the vector rather than its magnitude, because when considering the limit, the magnitude of the vector won't matter. Moreover, using your definition (provided a norm exists) one obtains the same results as using the definition in terms of the unit vector.

Let $\overline{v} = \begin{bmatrix} a\\b \end{bmatrix} $ denote the unit vector and $\vec{v} = \begin{bmatrix} v_1\\v_2 \end{bmatrix} = \overline{v} \cdot \lambda, \ 1 < \lambda \in \mathbb{R}$ some unnormed vector.

First, consider only $$ \lim_{h \to 0} \frac{f(x + h a, y + hb)}{h} $$ whereby it can be shown that (if you want me to clarify this please say so)$$ \nabla_{\overline{v}}f(x, y) = f_x(x, y) \cdot a + f_y(x, y) \cdot b $$

so $$\lim_{h \to 0} \frac{f(x + h v_1, y + hv_2)}{h\cdot \|\vec{v}\|} = \frac{1}{\|\vec{v}\|}\cdot \lim_{h \to 0} \frac{f(x + h v_1, y + hv_2)}{h} = \frac{1}{\|\vec{v}\|} \cdot \left( f_x(x, y) \cdot v_1 + f_y(x, y) \cdot v_2 \right) $$

$$ =\frac{1}{\|\vec{v}\|} \cdot \left( f_x(x, y) \cdot a\cdot \lambda + f_y(x, y) \cdot b\cdot \lambda \right) = \frac{\lambda}{\|\vec{v}\|} \left( f_x(x, y) \cdot a\cdot + f_y(x, y) \cdot b\cdot \right) $$

$$ = f_x(x, y) \cdot a\cdot + f_y(x, y) \cdot b. $$

Now, when there doesn't exist a norm, things are getting more abstract and we no longer have this intuitive interpretation of the directional derivative and hence no longer "care" about the potential factors. Notice however, that when using the first definition you mentioned, there is no unique directional derivative considering one direction, which is an important distinction.

2
On

You pointed it out yourself, the difference is marginal and just envolves a constant. In particular, the existence of the limit of both definitions is equivalent, they differ only by a factor (which in fact is one, if one restrict directions to unit vectors). Note that if one defines the directional derivative via the gradient, i.e. $(\nabla f(x),v)$, then this only equals the first expression.

5
On

The two definitions are the same if $v$ is a unit vector. So it comes down to what you want "directional derivative" to mean when $v$ is not a unit vector. The textbook definition will give different results for different positive scalar multiples of $v$ (it depends on both the direction and the magnitude of ), and yours will not (yours depends only on the direction of $v$). Both conventions are reasonable, and I've seen both used in textbooks.

This quote from Ted Shifrin's Multivariable Mathematics may be of interest. (He uses the same definition as your textbook.)

Note: Ted is a MSE user; hopefully he won't mind me transcribing this excerpt. I'm making this post a CW because the explanation is his, not mine.


Remark Our terminology [directional derivative] may be a bit misleading. Note that since

$$\begin{aligned} D_{cv}f(a) &= \lim_{t \to 0}\frac{f(a + t(cv)) - f(a)}{t} \\ &= \lim_{t \to 0}\frac{f(a + (ct)v) - f(a)}{t} \\ &= c \lim_{t \to 0}\frac{f(a + (ct)v) - f(a)}{ct} \\ &= c\lim_{s \to 0}\frac{f(a + sv) - f(a)}{s} \\ &= cD_v f(a), \\ \end{aligned}$$ the directional derivative depends not only on the direction of $v$, but also on its magnitude. It is for this reason that many calculus books require that one specify a unit vector $v$. It makes more sense to think of $D_v f(a)$ as the rate of change of $f$ as experienced by an observer moving with instantaneous velocity $v$.

5
On

As already observed in the other answers and comments, the two definitions agree up to a harmless multiplicative constant, so they can be used more or less interchangeably.

There is, however, an important property which makes the textbook's definition $$ \nabla_v f(x)=\lim_{h\to 0} \frac{ f(x+hv)-f(x)}{h}$$ superior to the other, in certain fields of mathematics. With this definition, there is the following linearity property: $$ \nabla_{av+bw} f(x)=a\nabla_v f(x)+ b\nabla_w f(x), $$ provided that $f$ is differentiable at $x$. This is one of the building blocks of differential geometry; see here.

On the other hand, I have the impression that the definition $$\nabla_v f(x)=\lim_{h\to 0} \frac{ f(x+hv)-f(x)}{h\lVert v \rVert}$$ is more widely used in applied mathematics. This is probably due to dimensional analysis; if $x$ and $v$ have the units of lenght, then $\nabla_v$ has the correct units of length^{-1}, unlike the previous definition.