Proving that Jacobian of Composition is equal to Composition of Jacobians using epsilon-delta

39 Views Asked by At

Let us have functions $\mathrm{f}: \mathbb{R}^n \rightarrow \mathbb{R}^m$ and $\mathrm{g}: \mathbb{R}^m \rightarrow \mathbb{R}^k$ such that $\mathrm{f}$ is differentiable at some point $\mathrm{a} \in \mathbb{R}^n$ and $\mathrm{g}$ is differentiable at some point $\mathrm{f(a)} \in \mathbb{R}^n$. Let us denote the Jacobian of $\mathrm{f}$, evaluated at $\mathrm{a}$, as $\mathrm{T_a f}$ and the Jacobian of $\mathrm{g}$, evaulated at $\mathrm{f(a)}$, as $\mathrm{T_{f(a)} g}$.

I am aware that the function composition $\mathrm{g\circ f}$ is differentiable at $\mathrm{a}$ and the Jacobian of $\mathrm{g\circ f}$, evaluated at $\mathrm{a}$, is equal to:

$\mathrm{T_a g\circ f=T_{f(a)} g (T_a f)}$

As we all know, derivatives are defined using the concept of limit, which involves epsilon-delta. When it comes to the chain rule for differentiation of scalar-valued functions, aka when $k=1=m$, I am aware of how to prove it on the epsilon-delta level. However, I have yet to see the same for for differentiation of vector-valued functions - from sources like Rudin (Theorem 9.15 in Principles of Mathematical Analysis) and Apostol (Theorem 12.9 in Multivariable Differential Calculus), they end up using remainder functions without rigorously proving that it fulfills the epsilon-delta condition.

I have decided to take up the task myself and have ran into a roadblock that I'm not sure how to get past. First, let me show you what I've done so far:

We have two situations we can deal with:

  • there exists a neighborhood around $\mathrm{a}$ such that its image under $\mathrm{f}$ is constant
  • there does NOT exist such a neighborhood

I have successfully dealt with the first situation, with my proof being as follows:

Let $\delta$ be the radius of the neighborhood around $\mathrm{a}$ where $\mathrm{f}$ is constant. Let $\mathrm{x}$ be a point within this neighborhood Obviously, the following is true:

$0 < \lvert \mathrm{x - a} \rvert < \delta \Rightarrow \mathrm{f(a) = f(x)} \Leftrightarrow \lvert \mathrm{f(x) - f(a)} \rvert = 0 < \lvert \mathrm{x - a} \rvert$

Because $\mathrm{f}$ is constant in this neighborhood, that also means: $\mathrm{\lvert T_a f(x-a) \rvert} = 0 < \lvert \mathrm{x - a} \rvert$

Now then, if we look at the expression $\mathrm{g(y) - g(f(a)) - T_{f(a)}g(y - f(a))}$, if we define $\mathrm{y:=f(x)}$, the expression becomes equal to $\mathrm{g(f(x)) - g(f(a)) - T_{f(a)}g(T_{a}(x - a))}$. For any $\mathrm{x}$ inside the $\delta$ neighborhood around $\mathrm{a}$, this expression evaluates to $\mathrm{g(f(a)) - g(f(a)) - T_{f(a)}g(\boldsymbol{0})} = \boldsymbol{0} - \boldsymbol{0} = \boldsymbol{0}$. As a result, we have:

$\mathrm{(\forall \epsilon > 0)(\exists \delta > 0)(\forall x \in \mathbb{R}^n):0 < \lvert x - a \rvert <\delta \Rightarrow \lvert g(f(x)) - g(f(a)) - T_{f(a)}g(T_{a}f(x - a)) \rvert = 0 < \epsilon \lvert x - a \rvert}$

This ends the proof for the first situation.

Now moving on to the second situation, which is where I'm struggling:

Because $\mathrm{g}$ is differentiable at $\mathrm{f(a)}$, there is a $\delta_1$ neighborhood around $\mathrm{f(a)}$ where $\mathrm{g}$ is defined.

Because $\mathrm{f}$ is differentiable at $\mathrm{a}$, it is also continuous, and as a result, for any value of $\delta_1$, there must exist a $\delta_2$ neighborhood around $\mathrm{a}$ such that $\mathrm{\lvert f(x) - f(a) \rvert < \delta_1}$. Choose $\delta_2$ so that the image of this neighborhood, under $\mathrm{f}$ is a subset of the domain of $\mathrm{g}$. Then we have:

$\mathrm{(\forall \epsilon > 0)(\exists \delta_2 > 0)(\forall x \in \mathbb{R}^n):0 < \lvert x - a \rvert <\delta_2 \Rightarrow \lvert g(f(x)) - g(f(a)) - T_{f(a)}g(f(x) - f(a)) \rvert < \epsilon \lvert f(x) - f(a) \rvert}$

And this is the roadblock I have yet to overcome. I do not know how to get from the above statement to the below statement:

$\mathrm{(\forall \epsilon > 0)(\exists \delta_2 > 0)(\forall x \in \mathbb{R}^n):0 < \lvert x - a \rvert <\delta_2 \Rightarrow \lvert g(f(x)) - g(f(a)) - T_{f(a)}g(T_{a}f(x - a)) \rvert < \epsilon \lvert x - a \rvert}$

There were two things I thought of but ultimately either cannot work or lead to further dead ends:

  • I first thought of using the property $\lvert a \rvert < b \Leftrightarrow -b < a < b$. However, we are not dealing with scalars inside the norm here, we are dealing with vectors, so this goes right out the window.
  • I thought of using the identity $\mathrm{\lvert T_{a}f(x - a) \rvert \leq \lvert f(x) - f(a) \rvert}$. However, I can't break open the norm of sum/difference of vectors, so if use of this identity is necessary, I'm not sure how to use it. It turns out this idea was false to begin with.

This leads me to the following two questions:

  • Is this a fruitful approach to trying to prove the multivariable differentiation chain rule on the epsilon-delta level/Is the use of asymptotic remainder functions to prove this unavoidable?
  • If the answer to the above question is yes, what am I missing?

Finally, to anyone who wonders why I do not like the use of remainder functions, it is because I often see them being used as a way to pay lipservice to epsilon-delta but ultimately evade the concept. Considering how fundamental the concept of chain rule is to differentiation, I consider the use of remainder functions to try to "prove" this unacceptable, and I am extremely disappointed that people like Rudin and Apostol would do such a thing. And the reason why I am being very uptight about epsilon-delta is because a lot of wishy-washy Calculus I infinitesimal tricks completely fall apart when dealing with vector-valued functions, and I'm doing this to try to rescue my understanding of this subject.

1

There are 1 best solutions below

0
On

Here is a literal translation of my proof of the chain rule in my "Fundamentos de calculo" (in Spanish), which I can post here since I own the rights. In the vector case, $\varepsilon$-$\delta$ method is superseded by topological arguments since we are working not with intervals but with balls. The little-oh notation is precisely this and it is not at all informal as you believe; it really is not the concern of mathematics whether you understand this or not, for her to remain true.

Theorem. Let $U$, $V$ and $W$ three normed spaces, $A$ a subset of $U$ and $B$ one of $V,$ $u$ a point in $A$ and $v$ one in $B$. Suppose $f:A \to V$ is differentiable at $u,$ that $f(u) = v$ and that $g:B \to W$ is differentiable at $v.$ Then, $g \circ f$ is differentiable at $u$ and, also, $$ D(g \circ f)(u) = Dg(f(u)) \circ Df(u). $$ Proof. Since $f$ is differentiable at $u$, there are $r_f > 0$ and a function $h \mapsto \tilde\epsilon_f(u; h)$ from $B_U(0;r_f)$ to $V$ such that for all $h \in B_U(0;r_f)$, $$ f(u + h) = f(u) + Df(u) \cdot h + \|h\| \tilde\epsilon_f(u; h) = v + Df(u) \cdot h + \|h\| \tilde\epsilon_f(u; h), $$ where $\lim_{h \to 0} \tilde\epsilon_f(u; h) = 0.$ Analogously, for $g$ at $v$ there exist $r_g > 0$ and $\tilde\epsilon_g:B_V(0;r_g) \to W$ such that $\lim_{k \to 0} \tilde\epsilon_g(v; k) = 0$ and that for all $k \in B_V(0;r_g)$ $$ g(v + k) = g(v) + Dg(v) \cdot k + \|k\| \tilde\epsilon_g(v; k). $$ By virtue of the triangle inequality and the fundamental inequality of linear functions, \begin{align*} \|Df(u) \cdot h + \|h\| \tilde\epsilon_f(u; h) &\leq \|Df(u)\| \|h\| + \|h\| \|\tilde\epsilon_f(u; h)\|\\ &= \left( \|Df(u)\| + \|\tilde\epsilon_f(u; h)\| \right) \|h\|. \end{align*} As $\lim_{h \to 0} \|\tilde\epsilon_f(u; h)\| = 0$, there exists $\delta > 0$ such that if $h \in B_U(0;\delta)$ then $\|\tilde\epsilon_f(u; h)\| \leq 1.$ Define $$ r = \dfrac{1}{2}\min\left\{ \dfrac{r_g}{\|Df(u)\| + 1}, \delta, r_f \right\}. $$ Then, the relation $\|h\| < r$ implies \begin{align*} \|Df(u) \cdot h + \|h\| \tilde\epsilon_f(u; h)\| &\leq \left( \|Df(u)\| + \|\tilde\epsilon_f(u; h)\| \right) \|h\| \\ &\leq \left( \|Df(u)\| + 1 \right) \|h\| \leq \dfrac{1}{2} r_g < r_g \end{align*} and, therefore, $Df(u) \cdot h + \|h\| \tilde{\epsilon_f(u; h)}$ belongs to $B_V(0;r_g).$ Therefore, for $h \in B_U(0;r)$, one has, by writing $k(h) = Df(u) \cdot h + \|h\| \tilde\epsilon_f(u; h),$ \begin{align*} (g \circ f)(u + h) &= g(f(u + h)) = g(v + k(h))\\ &= g(v) + Dg(v) \cdot k(h) + \|k(h)\| \tilde\epsilon_g(v; k(h)) \\ &= g(v) + Dg(v) \cdot \Big( Df(u) \cdot h \Big) + \|h\| Dg(v) \cdot \tilde\epsilon_f(u; h) + \|k(h)\| \tilde\epsilon_g(v; k(h)) \\ &= g(v) + \Big( Dg(v) \circ Df(u) \Big) \cdot h + \Gamma(h), \end{align*} where $\Gamma$ is definided by $\Gamma:B_U(0;r) \to W$, $$ \Gamma(h) = \|h\| Dg(v) \cdot \tilde\epsilon_f(u; h) + \|k(h)\| \tilde\epsilon_g(v; k(h)). $$ To conclude the theorem, one must show $Dg(v) \circ Df(u)$ is a continuous linear function and that one can write $\Gamma(h) = \|h\| \tilde{\Gamma}(h),$ where $\tilde\Gamma(h) \to 0$ as $h \to 0.$ The first of these two claims is a consequence that the composition of continuous functions is continuous and likewise for linear functions. To see the second, define $\tilde{\Gamma}(h) = \dfrac{\Gamma(h)}{\|h\|}$ when $h \neq 0$ y $\tilde{\Gamma}(0) = 0;$ to see that $\tilde{\Gamma}(h) \to 0$ as $h \to 0$ we use again the triangle and fundamental inequalities \begin{align*} \|\tilde{\Gamma}(h)\| &\leq \|Dg(v)\| \|\tilde\epsilon_f(u;h)\| + \dfrac{\|k(h)\|}{\|h\|} \|\tilde\epsilon_g(v;k(h))\| \\ &\leq \|Dg(v)\| \|\tilde\epsilon_f(v;h)\| + \left( \|Df(u)\| + 1 \right) \|\tilde\epsilon_g(v;k(h))\|, \end{align*} by hypothesis, $\|\tilde\epsilon_f(u;h)\| \to 0$ and since $k(h) \to 0,$ we also have $\|\tilde\epsilon_g(v;k(h))\| \to 0,$ then $\tilde\Gamma(h) \to 0.$ $\blacksquare$