In Kolk's Multidimensional Real Analysis I: Differentiation
He gave a very detailed version of Chain rule at some point $a$ as follows:
Now I want to use Chain rule to prove the following corollary:
Here is my proof:
Step1:
We define $f: \mathbf{R}^n \rightarrow \mathbf{R}^p \times \mathbf{R}^p $ by $f(x)= (f_1(x),f_2(x))$. We know $f$ is differentiable at $a$, while $Df (a)h = (Df_1(a)h, Df_2(a)h), \text{for all} \, a, h \, \in R^n$ We define $g: \mathbf{R}^p \times \mathbf{R}^p \rightarrow \mathbf{R}^p$ by $g(y_1,y_2) = \lambda y_1+y_2$. So we have $g\circ f= \lambda f_1+f_2$.
Step2:
\begin{align*} D(\lambda f_1+f_2)(a) &= D(g\circ f)(a) \\ &=Dg(f_1(a),f_2(a)) \circ Df(a) \text{............(have used Chain rule here)} \\ &=g \circ Df(a) \text{............(because we fortuanately know $g$ is linear, so g is its own derivative everywhere)} \end{align*}
Step3:
We are still not done here since we want to prove two linear functions are equal. So we have to introduce another vector, say $h \in R^n$, to see if they are equal for any $h$. Thus we use the above equation with $h$: $g \circ Df(a) h = g(Df_1(a)h, Df_2(a)h) = \lambda Df_1(a)h + Df_2(a)h = (\lambda Df_1(a) + Df_2(a))(h)$. Proof is done
I have two questions:
- Is my proof strictly/rigorously correct?
- Is it necessary to introduce this $h$ to prove "two derivatives are equal"?
- Can my proof be further simplified? Why the author just say it's obvious, which seems to me quite complicated.


So, if it was me, I would have just written the following for a proof:
So, the only theorems being invoked are the chain rule and that linear maps have derivatives equal to themselves.
Of course, having now proven this theorem as a consequence of the chain rule, you should also prove it directly from the definition: i.e prove (using triangle inequality) that \begin{align} \frac{\bigg\|\,\,\,(\lambda f_1+f_2)(a+h) - (\lambda f_1+f_2)(a) - [\lambda (Df_1)_a(h)+(Df_2)_a(h)]\,\,\,\,\bigg\|}{\|h\|}\to 0 \end{align} as $h\to 0$.
By the way, one can prove things in a different order. Usually, one proves linearity of the derivative directly, and then proves the chain rule. From here, all other facts can be derived, for example, the fact that if $f_1,f_2$ are differentiable at $a$ and $f(x):=(f_1(x)f_2(x))$ then $f$ is also differentiable at $a$ and $Df_a(h)=((Df_1)_a(h),(Df_2)_a(h))$ can be proven as follows:
define $\iota_1:\Bbb{R}^{p_1}\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ and $\iota_2:\Bbb{R}^{p_2}\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ by $\iota_1(x)=(x,0)$ and $\iota_2(y)=(0,y)$. Then given two mappings $f_1:\Bbb{R}^n\to\Bbb{R}^{p_1}$ and $f_2:\Bbb{R}^{n}\to\Bbb{R}^{p_2}$, we define $f:\Bbb{R}^n\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ as $f(x)=(f_1(x),f_2(x))$.
Then, it is easily verified that $f=\iota_1\circ f_1+\iota_2\circ f_2$, and that $\iota$'s are linear transformations. So, \begin{align} Df_a(h)&=D(\iota_1\circ f_1+\iota_2\circ f_2)_a(h)\\ &=[\iota_1\circ (Df_1)_a](h)+[\iota_2\circ (Df_2)_a](h)\\ &=\bigg((Df_1)_a(h),(Df_2)_a(h)\bigg) \end{align} (of course in the second equal sign, I did many steps at once; I used additivity of derivatives, used the chain rule, and that $\iota_1,\iota_2$ are linear so they are their own derivatives).
The idea of introducing such "auxillary" mappings $\iota$ is very common when you're trying to prove more complicated maps are differentiable (for example one can formulate a very general product rule)