Suppose $v,p\in\mathbb{R}^n$, then $D_v|_p:C^\infty\to\mathbb{R}$ is defined by $$D_v|_p(f):=\frac{d}{dt}|_{t=0}f(p+tv).$$
Now I'm looking at a proposition that states that the map $v\mapsto D_v|_p$ is a linear isomorphism from $\mathbb{R}^n\to T_p\mathbb{R}^n$. I understand most of the proof, but I'm confused by the linearity part:
$$\frac{d}{dt}|_{t=0}f(p+tv+\lambda tw) = \frac{d}{dt}|_{t=0}f(p+tv) + \lambda\cdot\frac{d}{dt}|_{t=0}f(p+tw)$$
for $v,w,p\in\mathbb{R}^n$ and $\lambda\in\mathbb{R}$.
This equality should immediately follow from the chain rule, but I don't see how. Probably I'm just a bit confused by the notation of (directional) derivatives... but I already searched quite a bit and just can't see how the chain rule is used here. Can someone explain this?
confusion about chain rule in linearity proof
167 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
It is the chain rule.
The derivative of a multivariable function $\phi : \mathbb R^m \to \mathbb R^k$ at $\xi \in \mathbb R^m$ is a linear map $d\phi \mid_\xi : \mathbb R^m \to \mathbb R^k$. If $m = 1$, the linear map $d\phi\mid_\xi$ is uniquely determined by the vector $\phi'(\xi) = \frac{d}{dx}\mid_{x =\xi}\phi = d\phi\mid_\xi(1) \in \mathbb R^n$. Note that the components of $\phi'(\xi)$ are the usual derivatives $\phi_i'(\xi)$ of the coordinate functions $\phi_1,\ldots,\phi_k$ of $\phi$.
In your question we have a $C^\infty$-function $f : \mathbb R^n \to \mathbb R$. For $p,u \in \mathbb R^n$ we define $l_{p,u} : \mathbb R \to \mathbb R^n, l_{p,u}(t) = p + tu$, which is also a $C^\infty$-function. This gives a $C^\infty$-function $f \circ l_{p,u} : \mathbb R \to \mathbb R$. The chain rule says $$d(f \circ l_{p,u})\mid_0 = df\mid_{l_{p,u}(0)} dl_{p,u}\mid_0 = df\mid_p dl_{p,u}\mid_0 .$$ This shows $$\frac{d}{dt} \mid_{t=0}(f \circ l_{p,u}) = d(f \circ l_{p,u})\mid_0(1) = (df\mid_p dl_{p,u}\mid_0)(1) = df\mid_p (dl_{p,u}\mid_0(1)) \\= df\mid_p (\frac{d}{dt} \mid_{t=0}l_{p,u}) = df\mid_p(u).$$ In your question the LHS of this equation is denoted by $\frac{d}{dt} \mid_{t=0}f(p + tu)$.
Since $df\mid_p$ is linear, you get the desired result.
On
The confusing part in your definition is, that the function you need to take the derivative of is in fact a composite function. With the definition
$$g:\mathbb{R}\rightarrow\mathbb{R}^n,\quad g(t):=p+tv,$$
you find
$$D_{v}|_p(f):=\frac{d}{dt}|_{t=0}f(p+tv)=\frac{d}{dt}|_{t=0} (f\circ g)(t)$$
Of course, you need to use the multivariate chain rule in this situation.
Let me write it down for your specific situation:
Let $g:\mathbb{R}\rightarrow \mathbb{R}^n$ differentiable in $0$ and $f:\mathbb{R}^n\rightarrow \mathbb{R}$ differentiable in $p=g(0)$, then $f\circ g:\mathbb{R}\rightarrow\mathbb{R}$ is differentiable in $0$ and it holds $$\frac{d}{dt}|_{t=0} (f\circ g)(t)=D(f\circ g)(0)=(Df)(g(0))\cdot(Dg)(0)$$
Note, that the left hand side of the first equation makes sense, since domain and codomain of $f\circ g$ are just $\mathbb{R}$, i.e. $f\circ g:\mathbb{R}\rightarrow \mathbb{R}$.
Now, let's define some curves, we need for the calculations $$ \begin{aligned} g_{v+\lambda w}&:\mathbb{R}\rightarrow\mathbb{R}^n,\quad g_{v+\lambda w}(t):=p+t(v+\lambda w),\\ g_{v}&:\mathbb{R}\rightarrow\mathbb{R}^n,\quad g_{v}(t):=p+tv,\\ g_{ w}&:\mathbb{R}\rightarrow\mathbb{R}^n,\quad g_{ w}(t):=p+t w, \end{aligned}$$ which are clearly differentiable. Note that $p=g_{v+\lambda w}(0)=g_{v}(0)=g_{ w}(0)$ and the jacobi matrices of these functions in point $0$ are
$$(Dg_{v+\lambda w})(0)=v+\lambda w\in\mathbb{R}^{n\times 1},\quad (Dg_{v})(0)=v\in\mathbb{R}^{n\times 1},\quad (Dg_{ w})(0)= w\in\mathbb{R}^{n\times 1}. $$
The jacobi matrix of $f$ in point $p$ is $(Df)(p)\in\mathbb{R}^{1\times n}$.
With this and the multivariate chain rule above, you find
$$ \begin{aligned} D_{v+\lambda w}|_p(f)=\frac{d}{dt}|_{t=0} (f\circ g_{v+\lambda w})(t)&=(Df)(g_{v+\lambda w}(0))\cdot(v+\lambda w)\\ &=(Df)(g_{v}(0))\cdot v+\lambda (Df)(g_{ w}(0))\cdot w\\ &=\frac{d}{dt}|_{t=0} (f\circ g_{v})(t)+\lambda \frac{d}{dt}|_{t=0} (f\circ g_{ w})(t)\\ &=D_v|_p(f)+\lambda D_w|_p(f). \end{aligned}$$
Note that $\lambda\in \mathbb{R}$ and for any matrix $A\in\mathbb{R}^{1\times n}$ and vectors $v,w\in\mathbb{R}^{n\times 1}$, it holds $A(v+\lambda w)=Av+ \lambda Aw$.
We have $$ D_v|_p(f):=\left.\dfrac{d}{dt}\right|_{t=0}f(p+tv) $$ Set $h(t)=p+tv.$ Then $$ D_p(f)(v):=\left.\dfrac{d}{dt}\right|_{t=0}f(p+tv)=\left.\dfrac{d}{dt}\right|_{t=0}f(h(t))=f'(h(0))\cdot h'(0)=f'(p)\cdot v $$ Next, set $g(t)=p+tv+\lambda tw.$ It is $f'=D_p(f)\cdot v$ reading "Derivative at point $x=p$ of function $f$ (= gradient = Jacobi matrix = linear approximation) in direction of $v$."
\begin{align*} D_p(f)\cdot (v+\lambda w)&= \left. \frac{d}{dt}\right|_{t=0}f(p+t(v+\lambda w)) \\&=\left. \frac{d}{dt}\right|_{t=0}f(g(t)) \\ &=\left. \frac{d}{dt}\right|_{g(t)=g(0)}f((g(t)) \cdot \left. \frac{d}{dt}\right|_{t=0}g(t)\\ &=\left. \frac{d}{ds}\right|_{s=0}f(s)\cdot (v+\lambda w)\\ &=f'(p)\cdot v +\lambda \cdot f'(p)\cdot w\\ &=\left. \dfrac{d}{dt}\right|_{t=0}f(p+tv)+\lambda \cdot \left. \dfrac{d}{dt}\right|_{t=0}f(p+tw)\\ &=D_p(f)\cdot v +\lambda \cdot D_p(f)\cdot w\\ &=D_p(f)(v)+\lambda D_p(f)(w) \end{align*}