Prove the chain rule for normed vector spaces

314 Views Asked by At

I'm trying to prove the chain rule. Could you please verify if my proof looks fine or contains logical gaps/errors? Thank you so much for your help!

Let $X$ be a metric space and $Y,G$ normed vector spaces. Suppose $f: X \rightarrow Y$ is differentiable at $x_{0}$ and $g: Y \rightarrow G$ is differentiable at $y_{0}:=f\left(x_{0}\right)$. Then $g \circ f: X \rightarrow G$ is differentiable at $x_{0}$, and the derivative is given by $$\partial(g \circ f)\left(x_{0}\right) = \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right)$$


My attempt:

We have $f(x)=f\left(x_{0}\right) + \partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|$ for all $x \in X$ and $g(y) = g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(y-y_{0}\right)+s(y)\left\|y-y_{0}\right\|$ for $y \in Y$. Here $r: X \rightarrow Y$ and $s: Y \rightarrow G$ are continuous at $x_{0}$ and $y_{0}$ respectively. Moreover, $r\left(x_{0}\right)=0$ and $s\left(y_{0}\right)=0$.

Our goal is to find a function $t:X \to G$ such that $$(f \circ g)(x) = (f \circ g) \left(x_{0}\right) + \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right) \left(x-x_{0}\right)+t(x)\left\|x-x_{0}\right\|$$ for all $x \in X$ and that $t$ is continuous at $x_{0}$ and $t(x_0)=0$. We substitute $y=f(x)$ and get

$$\begin{aligned} (f \circ g)(x) &= g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(f\left(x_{0}\right)+\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|-y_{0}\right)\\ & \quad + s(y)\left\|y-y_{0}\right\|\\ &= g\left(y_{0}\right)+\partial g\left(y_{0}\right)\left(\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|\right)\\ & \quad+ s(y)\left\|y-y_{0}\right\|\\ &= g\left(f(x_0)\right)+\partial g\left(f(x_0)\right) \circ \partial f\left(x_{0}\right)\left(x-x_{0}\right) \\ &\quad+\partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\| \end{aligned}$$

Equalizing $$g\left(f(x_0)\right)+\partial g\left(f(x_0)\right) \circ \partial f\left(x_{0}\right)\left(x-x_{0}\right) + \partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\|$$ and $$(g \circ f) \left(x_{0}\right) + \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right) \left(x-x_{0}\right)+t(x)\left\|x-x_{0}\right\|$$ we get

$$t(x) \|x-x_0\| = \partial g\left(f(x_0)\right) \circ r(x)\left\|x-x_{0} \right\|+ s(y)\left\|y-y_{0}\right\|$$ and consequently $$\begin{aligned} t(x) &= \partial g\left(f(x_0)\right) \circ r(x) + s(y) \frac{\left\|y-y_{0}\right\|}{\left\|x-x_{0}\right\|}\\ &= \partial g\left(f(x_0)\right) \circ r(x) + s(f(x)) \left\| \frac{\partial f\left(x_{0}\right)\left(x-x_{0}\right)+r(x)\left\|x-x_{0}\right\|}{\|x-x_0\|} \right\| \\&= \partial g\left(f(x_0)\right) \circ r(x) + s(f(x)) \left\| \partial f\left(x_{0}\right) \frac{x-x_0}{\|x-x_0\|} +r(x) \right\|\end{aligned}$$ for all $x \neq x_0$. We further define $t(x_0)=0$. It is easy to check that $t$ satisfies our requirement. Hence $\partial(g \circ f)\left(x_{0}\right) = \partial g\left(f\left(x_{0}\right)\right) \circ \partial f\left(x_{0}\right)$.

1

There are 1 best solutions below

1
On BEST ANSWER

Looks OK to me. I'll try to rewrite it neater a little. There are a lot of symbols so one thing you can do is to do some reductions: by considering $\tilde f(x)= f(x+x_0)$ instead of $f(x)$ we can assume that $x_0=0$. Then by considering $\tilde g(y) = g(y+f(0))$ instead of $g$, and $\hat{f}(x) = \tilde f(x)-\tilde f(0)$ instead of $\tilde f$, we can assume that $f(0) = 0$. See below [*]. Finally, adding a constant to $g$ doesn't change its derivatives so we can assume $g(0)=0$. So OK, now we can assume that \begin{align} f(h)&=\partial f(0)h + r(h)\|h\|_X, & h\xrightarrow{X} 0 \\ g(v) &= \partial g(0)v + s(v)\|v\|_Y, & v \xrightarrow{Y} 0 \end{align} Full proof follows. This implies that $f(h)\xrightarrow{Y}0$ when $h\xrightarrow{X} 0$, so \begin{align} (g\circ f)(h) &= g(f(h)) \\ &= \partial g(0) f(h) + s(f(h))\|f(h)\|_Y \\ &=\partial g(0)\big[\partial f(0)h + r(h)\|h\|_X\big ] + s(f(h))\|f(h)\|_Y \\ &= \partial g(0)\partial f(0)h +\big[ \partial g(0) r(h) + s(f(h))\frac{\|f(h)\|_Y}{\|h\|_X} \big]\|h\|_X \\ &= \partial g(0)\partial f(0)h +\left[ \partial g(0) r(h) + s(f(h))\left \|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{\ Y} \right]\|h\|_X \\ &= \partial g(0)\partial f(0)h + t(h)\|h\|_X\end{align} where $t:X\to G$ is defined by $$ t(h) = \begin{cases}0 & h=0 \\ \partial g(0) r(h) + s(f(h))\left \|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{\ Y} & h\neq 0\end{cases} $$ But its clear that $t$ is continuous away from $h=0$, and $$ \| \partial g(0) r(h) \|_G \le \| \partial g(0) \|_{Y\to X} \|r(h) \|_Y \to 0,$$

$$\left\|s(f(h)) \left\|\partial f(0)\frac{h}{\|h\|_X} + r(h) \right\|_{ Y}\right \|_{G}\le \|s(f(h))\|_G\left (\|\partial f(0)\|_{X\to Y} + \|r(h) \right\|_{\ Y}) \to 0,$$ so $t(h)\xrightarrow{G} 0=t(0)$, and hence $t$ is continuous, which concludes the proof.

(Also compare with the Caratheodory definition of a derivative in 1D)


[*] Indeed, suppose we knew the result at $x=0$ for any functions $g,f$ with $f(0)=0$, we would have for general functions $f,g$, $$ \partial (g\circ f)(x_0) = \partial [g\circ (f(\bullet+x_0))](0) = \partial (g\circ \tilde f)(0) $$ and $$g(\tilde f(x)) = g(\tilde f(x)-\tilde f(0) + \tilde f(0)) = g(\hat f(x)+\tilde f(0)) = \tilde g \circ \hat f(x)$$ so$$ \partial (g\circ \tilde f)(0) = \partial (\tilde g\circ \hat f)(0) = \partial \tilde g(0)\circ \partial \hat f(0)= \partial g(\tilde f(0)) \partial \tilde f(0)=\partial g(f(x_0))\partial f(x_0).$$