Chain rule proof by definition

189 Views Asked by At

So i was trying to prove the following:

Let $f \colon \mathbb{R}^n \rightarrow \mathbb{R}^m $, $g \colon \mathbb{R}^m \rightarrow \mathbb{R}^k $ and $h \colon \mathbb{R}^n \rightarrow \mathbb{R}^k $ such that $h = g \circ f$. If $f$ is differentiable a t $p \in \mathbb{R}^n$ and $g$ is differentiable at $q = f(p) \in \mathbb{R}^m$ then $h$ is differentiable at $p$ and $Dh(p) = Dg(q) \circ Df(p)$


My attempt: As $f$ is differentiable at $p$ and $g$ so it is at $q$, we have

\begin{align*} f(x) &= f(p) + Df(x - p) + R_f(x),\text{ with }\frac{\Vert R_f(x)\Vert}{\Vert x-p\Vert} \rightarrow 0\text{ as } x \rightarrow p \\ g(x) &= g(q) + Dg(x - q) + R_g(x),\text{ with }\frac{\Vert R_g(x)\Vert}{\Vert x-q\Vert} \rightarrow 0\text{ as } x \rightarrow q \end{align*} Then, \begin{align*} h(x)&= g(f(x)) = g(f(p)) + Dg(f(x) - f(p)) + R_g(f(x)) \\ &= g(f(p)) + Dg(Df(x-p) + R_f(x)) + R_g(f(x)) \\ &= (g \circ f)(p) + (Dg \circ Df)(x-p) + Dg(R_f(x)) + R_g(f(x))\end{align*}

If $R(x) = Dg(R_f(x)) + R_g(f(x))$, proving $\frac{\Vert R(x)\Vert}{\Vert x - p\Vert} \rightarrow 0$ as $x \rightarrow p$ would be enough to complete the proof.

\begin{align*} \frac{\Vert R(x)\Vert}{\Vert x - p\Vert} = \frac{\Vert Dg(R_f(x)) + R_g(f(x))\Vert}{\Vert x - p\Vert} \leq \frac{\Vert Dg(R_f(x))\Vert}{\Vert x - p\Vert} + \frac{\Vert R_g(f(x))\Vert}{\Vert x - p\Vert} \end{align*}

So, i need to bound both terms. For the first one, i was thinking of $\frac{\Vert Dg(R_f(x))\Vert}{\Vert x - p\Vert} \leq \frac{\Vert D_g\Vert \Vert R_f(x)\Vert}{\Vert x - p\Vert} \leq \frac{\Vert D_g + \epsilon_1 \Vert \Vert R_f(x)\Vert}{\Vert x - p\Vert}$ for some $\epsilon_1 > 0$, and thus, taking $\delta_1 > 0$ such that $\frac{\Vert R_f(x)\Vert}{\Vert x - p\Vert} < \frac{\epsilon_2}{2(\Vert Dg + \epsilon_1\Vert)}$ if $\Vert x - p\Vert < \delta_1$ so $\frac{\Vert D_g + \epsilon_1 \Vert \Vert R_f(x)\Vert}{\Vert x - p\Vert} \leq \frac{\epsilon_2}{2}$ if $\Vert x - p \Vert < \delta_1$.

Yet, i don't know how to bound the second term. Any help? Thanks in advance.

2

There are 2 best solutions below

2
On BEST ANSWER

It the proof is convenient to express the remainders in Peano's form

  • $R_f(x)=\Vert x-p\Vert \cdot\omega_f(x-p)$ with $\omega_f(x-p) \to 0\text{ as } x \rightarrow p$
  • $R_g(x)=\Vert x-q\Vert \cdot\omega_g(x-q)$ with $\omega_g(x-q) \to 0\text{ as } x \rightarrow q$

Then

$$h(x)= g(f(x)) = g(f(p)) + Dg(f(x) - f(p)) + R_g(f(x))$$

with

$$R_g(f(x))=\Vert f(x)-f(p)\Vert \cdot\omega_g(f(x)-f(p))$$

thus

$$h(x)= g(f(x)) = g(f(p)) + Dg(f(x) - f(p)) + \Vert f(x)-f(p)\Vert \cdot\omega_g(f(x)-f(p))$$

$$= g(f(p)) + Dg(Df(x-p) + \Vert x-p\Vert \cdot\omega_f(x-p)) + \Vert f(x)-f(p)\Vert \cdot\omega_g(f(x)-f(p))$$

$$= (g \circ f)(p) + (Dg \circ Df)(x-p) + Dg(R_f(x)) + R_g(f(x))$$

where

  • $Dg(R_f(x))=\Vert x-p\Vert Jg \cdot \omega_f(x-p)$
  • $R_g(f(x))=\Vert f(x)-f(p)\Vert \cdot\omega_g(f(x)-f(p))$

and finally we obtain

$$\frac{|Dg(R_f(x))|}{\Vert x-p\Vert}\le |Jg||(\omega_f(x-p))|\to0$$

$$\frac{|R_g(f(x))|}{\Vert x-p\Vert}=\frac{\Vert f(x)-f(p)\Vert}{\Vert x-p\Vert} \cdot |\omega_g(f(x)-f(p))|\to 0$$

0
On

I find it a bit easier to prove this using asymptotic notation. Letting $DF_{\mathbf p}$ denote the differential of $F$ at $\mathbf p$, and $\Delta F_{\mathbf p}[\mathbf x] = F(\mathbf p+\mathbf x)-F(\mathbf p)$, so that $\Delta F_\mathbf p[\mathbf x] = DF_{\mathbf p}[\mathbf x] + o(\mathbf x)$, we have $$\begin{align} \Delta h_{\mathbf p}[\mathbf x] &= g(f(\mathbf p+\mathbf x))-g(f(\mathbf p)) \\ &= g(f(\mathbf p)+\Delta f_{\mathbf p}[\mathbf x])-g(f(\mathbf p)) \\ &= \Delta g_{f(\mathbf p)}[\Delta f_{\mathbf p}[\mathbf x]] \\ &= Dg_{f(\mathbf p)}[\Delta f_{\mathbf p}[\mathbf x]] + \phi(\Delta f_{\mathbf p}(\mathbf x)) \\ &= Dg_{f(\mathbf p)}[Df_{\mathbf p}[\mathbf x]+o(\mathbf x)] + \phi(\Delta f_{\mathbf p}(\mathbf x)) \\ &= Dg_{f(\mathbf p)}[Df_{\mathbf p}[\mathbf x]] + Dg_{f(\mathbf p)}[o(\mathbf x)] + \phi\circ\psi(\mathbf x), \end{align}$$ where $\phi\in o(\mathbb R^n,\mathbb R^k)$ (coming from the error term of $\Delta g_{f(\mathbf p)}$) and $\psi\in O(\mathbb R^n,\mathbb R^m)$ because if $F:V\to W$ is differentiable at $\mathbf p$, then $\Delta F_{\mathbf p}\in O(V,W)$. Also, $Dg_{f(\mathbf p)}$ is linear, so is an element of $O(\mathbb R^m,\mathbb R^k)$. The composition of a big-oh and little-oh function is little-oh, making the last two terms above both little-oh, therefore $$\Delta h_{\mathbf p}[\mathbf x] = \left(Dg_{f(\mathbf p)} \circ Df_{\mathbf p}\right)[\mathbf x]+o[\mathbf x]$$ and so $$D(f\circ g)_{\mathbf p} = Dg_{f(\mathbf p)}\circ Df_{\mathbf p}.$$