Chain rule using the Jacobian for general spaces

87 Views Asked by At

If $f:X \rightarrow Y$ is differentiate at $x$ and $g:Y \rightarrow Z$ is differentiable at $y = f(x)$, then $g \circ f: X \rightarrow Z$ is differentiable at x and $$F(g\circ f)(x) = Dg(f(x)) Df(x)$$

I am able to prove it for univariate functions or for $\mathbb{R}^n \rightarrow \mathbb{R}$ functions, but not for more general spaces such as $X, Y$ And $Z$. Can somebody help me on that?

2

There are 2 best solutions below

0
On BEST ANSWER

I will assume that $X,Y,Z$ are Banach spaces. The "easiest" way to prove this, imho, is to use the characterization of differentiability that $f$ is differentiable at $x$ iff there exists a linear map $\mathrm Df(x):X\to Y$ and a remainder $R_{f,x}:X\to V$ such that

$$f(x+h)=f(x)+\mathrm Df(x)(h)+R_{f,x}(h)$$

and

$$\lim_{h\to0}\frac{R_{f,x}(h)}{\Vert h\Vert}=0.$$

Similarly, $g$ is differentiable at $y$ iff there exists a linear map $\mathrm Dg(y):Y\to Z$ and a remainder $R_{g,y}:Y\to Z$ such that

$$g(y+h)=g(y)+\mathrm Dg(y)(h)+R_{g,y}(h)$$

and

$$\lim_{h\to0}\frac{R_{g,y}(h)}{\Vert h\Vert}=0.$$

Now combine these to get

$$\begin{align}g\circ f(x+h)&=g(f(x)+\mathrm Df(x)(h)+R_{f,x}(h))\\ &=g(y+\mathrm Df(x)(h)+R_{f,x}(h))\\ &=g(y)+\mathrm Dg(y)[\mathrm Df(x)(h)+R_{f,x}(h)]+R_{g,y}[\mathrm Df(x)(h)+R_{f,x}(h)]\\ &=g(y)+\mathrm Dg(y)\mathrm Df(x)(h)+\left\{\mathrm D g(y)(R_{f,x}(h))+R_{g,y}[\mathrm Df(x)(h)+R_{f,x}(h)]\right\}. \end{align}$$

If you can show that

$$\lim_{h\to0}\frac{\mathrm D g(y)(R_{f,x}(h))+R_{g,y}[\mathrm Df(x)(h)+R_{f,x}(h)]}{\Vert h\Vert}=0,$$

then you're done (the numerator is the big term in curly brackets above). Because then $\mathrm Dg(y)\mathrm Df(x)$ is the required differential and the term in curly brackets the remainder for $g\circ f$, which is then consequently differentiable with differential $\mathrm Dg(y)\mathrm Df(x)=\mathrm Dg(f(x))\mathrm Df(x)$.

0
On

I like the Weierstraß formulation of differentiation for such cases:
$f$ is differentiable at $x$ if there is a linear function $L$ and a remainder function $r$ such that $f(x+v)=f(x)+L(v)+r(v)$ with $\lim_{v\to 0}\frac{r(v)}{\|v\|}=0.$
This has the advantage that you do not have to bother coordinates or specific directions. This way we get matrix multiplication for the derivatives and only have to work on the remainders a bit.