How do I make sense of the total derivative in the limit case of $\Bbb R \to \Bbb R$ functions?

106 Views Asked by At

In my notes it is stated as a proposition that the total derivative of a linear map $T: V \to W$ at every point $v \in V$ is T itself: $DT(v)=T$. It also says that in the particular case of $\Bbb R \to \Bbb R$ functions it is just the ordinary single-variable calculus derivative (also for instance in wikipedia: "when f is a function of a single variable, the total derivative is the same as the ordinary derivative of the function" https://en.wikipedia.org/wiki/Total_derivative)

But how can than be?

From the given proposition I can deduce that all derivatives of order $k \ge 2$ are equal to the map T itself: $D^kT=T$, because every time I get the same linear map. Right? Although this is a little odd when I think about it: in plain words it says that derivatives of a linear map are the same linear map

But then if $T:\Bbb R \to \Bbb R$ $T(x)=ax$ From single-variable calculus all derivatives of order $k\ge 2$ are $0$ :$DT(x)=a$, $D^kT=0, k \ge 2$. What is going on? Why is there no match (apparently)?

4

There are 4 best solutions below

7
On BEST ANSWER

Here’s our definition.

Definition.

Let $V,W$ be finite-dimensional normed vector spaces, $A\subset V$ open, $f:A\to W$ a given function and $a\in A$ a given point. We say $f$ is Frechet differentiable at $a$ if there exists a linear transformation $T:V\to W$ such that \begin{align} \lim\limits_{h\to 0}\frac{\|f(a+h)-f(a)-T(h)\|_W}{\|h\|_V}&=0. \end{align} The linear map $T$ appearing in the definition above is unique. This therefore gives us the right to denote this linear transformation as $Df_a$ (or $df_a$ or $Df(a)$ or $df(a)$, but I don’t like each of these for some small reason or other).

If $f$ is differentiable at each point $a\in A$, then we simply say $f$ is differentiable on $A$.

Now, I’m going to make a bunch of statements (all true). You tell me which one you have issue with (also, notice that I’m careful to keep a distinction between $Df_a$ as a linear map vs $f’(a)$ as a matrix representation… something which people often don’t maintain).

  1. If $f:A\subset V\to W$ is differentiable on $A$, then for each $a\in A$, $Df_a:V\to W$ is a linear map, i.e $Df_a\in \text{Hom}(V,W)$.
  2. In the special case $V=\Bbb{R}^n,W=\Bbb{R}^m$, we have that $Df_a:\Bbb{R}^n\to\Bbb{R}^m$ is a linear map, i.e $Df_a\in\text{Hom}(\Bbb{R}^n,\Bbb{R}^m)$.
  3. Continuing from 2, if we choose the standard basis $\sigma_n=\{e_1,\dots, e_n\}$ on the domain, and $\sigma_m=\{e_1,\dots, e_m\}$ on the target, then we can assign an $m\times n$ matrix representation $[Df_a]_{\sigma_n}^{\sigma_m}$. It is common to denote this matrix as $f’(a)$. So, $f’(a)$ is by definition then $m\times n$ matrix representation of $Df_a$, relative to the standard ordered bases of $\Bbb{R}^n,\Bbb{R}^m$.
  4. Specializing to $m=n=1$, $Df_a:\Bbb{R}\to\Bbb{R}$ is a linear map, and its matrix representation $f’(a)$ is a $1\times 1$ matrix, i.e simply $f’(a)\in\Bbb{R}$ is a real number.

Now, let us fix a linear map $T:V\to W$.

  1. For each point $a\in V$, $T$ is differentiable at $a$, and $DT_a=T$, i.e for all $h\in V$, we have $DT_a(h)=T(h)$.
  2. $DT$ is a function from $V\to \text{Hom}(V,W)$.
  3. $DT:V\to\text{Hom}(V,W)$ is a constant function with constant value $T$, i.e $DT_a=T$ for all $a\in V$.
  4. $DT:V\to\text{Hom}(V,W)$ is a constant function so its derivative is zero identically, i.e $D^2T=D(DT):V\to\text{Hom}(V,\text{Hom}(V,W))$ is the zero function.
  5. For all $k\geq 2$, $D^kT=0$ identically (only thing to be mindful of is that they all have different target spaces).

Consider now the function $f:\Bbb{R}\to\Bbb{R}$ given by $f(x)=3x$.

  1. $f$ is a linear function
  2. The matrix representation (relative to the bases $\{1\}$ on $\Bbb{R}$) of $f$ as a linear map is $(3)$, i.e the $1\times 1$ matrix with single entry $3$, i.e $[f]= (3)$.
  3. $f’(x)=3$ for all $x\in\Bbb{R}$.
  4. For all $x\in\Bbb{R}$, we thus have $Df_x=f$, i.e for all $x\in\Bbb{R}$ and all $h\in\Bbb{R}$, we have $Df_x(h)=f(h)=3h$.
  5. Taking the matrix representation of $Df_x=f$ from statement 13, we get $[Df_x]=[f]$, and thus $f’(x)=3$. So, statements 12 and 13 are consistent with each other.
  6. $f’’(x)=0$ for all $x\in\Bbb{R}$ because $f’:\Bbb{R}\to\Bbb{R}$ is a constant function (with value $3$).
  7. Since $Df:\Bbb{R}\to\text{Hom}(\Bbb{R},\Bbb{R})$, $x\mapsto f$ (or even more explicitly $Df_x=3\text{id}_{\Bbb{R}}$ for all $x$) is a constant mapping, its derivative $D(Df)$ is identically zero (as a mapping $\Bbb{R}\to\text{Hom}(\Bbb{R},\text{Hom}(\Bbb{R},\Bbb{R}))$).
  8. statements 15 and 16 are completely consistent with each other.

Finally, How does the idea of a differential dx work if derivatives are not fractions? might serve as a helpful side-answer.

0
On

(1) Total derivatives are defined at a point: for a fixed $x \in V$, the total derivative at $x$ $DF_x : V \to W$ of $F : V \to W$ is "the best linear approximation to $F$ near $x$" in the sense that $$ F(x+h) \approx F(x) + DF_x(h). \tag{$*$} $$ Giving a rigorous defininiton for $\approx$ gives a rigorous definition of $DF_x$.

When $F$ is linear, then for all $x$ we have $DF_x(h) = F(h)$.

(2) When $F : \mathbb R \to \mathbb R$, then $x, h \in \mathbb R$ and $$ DF_x(h) = \frac{\mathrm dF(x)}{\mathrm dx}h. $$ Notice the similarity the following familiar equation shares with ($*$): $$ F(x + h) \approx F(x) + \frac{\mathrm dF(x)}{\mathrm dx}h. $$

1
On

Let us just look at what a total derivative is: The wikipedia entry you mention (which works in $\mathbb R^d$'s, which is just fine) states: Let $f\colon U\to\mathbb R^m$ be a function defined on an open subset $U\subseteq\mathbb R^n$. We say it's totally differentiable at $a\in U$ if there exists a linear map $\mathrm df_a\colon\mathbb R^n\to\mathbb R^m$ such that $$\lim_{x\to a}\frac{\|f(x)-f(a)-\mathrm df_a(x-a)\|}{\|x-a\|}=0.$$ In this case, we call $\mathrm df_a$ the total differential of $f$ at $a$.
Now it is a direct check that for linear $f$ we can choose $\mathrm df_a=f$ for every $a\in U$ (by linearity).
Hence the total derivative of $T\colon\mathbb R\to\mathbb R,x\mapsto ax$ at $a\in\mathbb R$ is $\mathrm dT_a\colon\mathbb R\to\mathbb R,x\mapsto cx$ and not $\mathrm dT_a\colon\mathbb R\to\mathbb R,x\mapsto c$, as you suggest (which is not even linear unless $c=0$).
So the total differential and the single-variable calculus derivative to not directly agree in the sense that they give you the same object (linear map vs. real number) but the total differential of a linear map $\mathbb R\to\mathbb R$ is just given by multiplication with the single-variable calculus derivative of that map (without specification of a point since it's constant).
However, this is "close" (and inambigous) enough to justify the quote "when f is a function of a single variable, the total derivative is the same as the ordinary derivative of the function", at least in the opinion of some people.

0
On

You have to be very careful what it is you are taking the derivative of, since there are many maps at play here.

  1. The originally given map $f:\mathbb R\to\mathbb R$.
  2. It's derivative at a point $x\in\mathbb R$, $\mathrm Df(x)$ is also a map $\mathbb R\to\mathbb R$. Note that $x$ is not the argument of the map. There are multiple derivatives, each at a different point, and $x$ just specifies which of these derivatives we are talking about, each of which is itself a map.
  3. The derivative map $\mathrm Df:\mathbb R\to\mathrm{Map}(\mathbb R,\mathbb R)$, which takes a point $x$ as its argument and sends it to the derivative $\mathrm Df(x)$ at that point. As such, it maps numbers to maps. Note that the $x$ in $\mathrm Df(x)$ is the argument of the derivative map, but not of the derivative. These are different maps!

Now the following holds:

  1. If $f$ is linear, then $\mathrm Df(x)=f$ for all $x$.
  2. As a corollary: The derivative of $\mathrm Df(x)$ is equal to $\mathrm Df(x)$, since it is always linear.
  3. The derivative of the derivative map $\mathrm Df$ is $0$ if $f$ is linear, since then $\mathrm Df$ is constant (it's always the same linear map!).

Comparing this to the 1d case to which you are used: Let's say $f(x)=3x$, which is linear. Then $f'(x)=3$, or $\mathrm Df(x)=(h\mapsto 3h)$. Now the drivative of the map $h\mapsto 3h$ is again $3$ or $h\mapsto 3h$. However, the derivative of the derivative map $f'$, which maps $x\to3$, is $0$.