Chain Rule: Let $U$, $V$, and $W$ be normed spaces, with $U'\subseteq U$ and $V'\subseteq V$ open. If $f:U'\rightarrow V$ is differentiable at $a\in U$ and $g:V\rightarrow W$ is differentiable at $b=f(a)$, then the function $g \circ f:U'\rightarrow W$ is differentiable at $a$, and $$(g\circ f)' (a) = g'(f(a))\circ f'(a).$$
The following proof -which suffers from an unjustified step- is very similar to the one found in Spivak's Calculus on Manifolds, but it seems to me slightly more intuitive: we wish to find the best linear approximation $L:U\rightarrow W$ at $a$. Considering that $f'(a)$ is the best linear approximation $U\rightarrow V$ at $a$, and $g'(b)$ is the best linear approximation $V\rightarrow W$ at $b=f(a)$, a plausible candidate for $L$ is $g'(b)\circ f'(a)$.
Proof: Letting $f'(a)=L_f$ and $g'(b)=L_g$ we may write $$f(a+h)=f(a)+L_f(h)+\varepsilon_f(h) \ \text{, and}$$ $$g(b+k)=g(b)+L_g(k)+\varepsilon_g(k)$$ where $|\varepsilon_f(h)|/|h| \rightarrow 0$ as $|h|\rightarrow 0$ and $|\varepsilon_g(k)|/|k| \rightarrow 0$ as $|k|\rightarrow 0$. Wishing to approximate $g\circ f$ around $a$ with a linear function, we may continue as follows: \begin{equation} \begin{split} g(f(a+h)) & = g[f(a)+L_f(h)+\varepsilon_f(h)] \\ & = g(b)+L_g[L_f(h)+\varepsilon_f(h)]+\varepsilon_g[L_f(h)+\varepsilon_f(h)] \\ & = g(b)+L_g[L_f(h)]+L_g[\varepsilon_f(h)]+\varepsilon_g[L_f(h)+\varepsilon_f(h)] \end{split} \end{equation} and it remains to be shown that $$\frac{|L_g[\varepsilon_f(h)]+\varepsilon_g[L_f(h)+\varepsilon_f(h)]|}{|h|}\rightarrow 0\ \text{as} \ |h|\rightarrow 0.$$ Employing the triangle inequality we have that the LHS above is lesser or equal than \begin{equation} \begin{split} & \frac{|L_g[\varepsilon_f(h)]|}{|h|}+\frac{|\varepsilon_g[L_f(h)+\varepsilon_f(h)]|}{|h|} \\ & = \left|L_g\left[\frac{\varepsilon_f(h)}{|h|}\right]\right| + \left(\frac{\left|\varepsilon_g[L_f(h)+\varepsilon_f(h)]\right|}{\left|L_f(h)+\varepsilon_f(h)\right|}\right)\left(\frac{\left|L_f(h)+\varepsilon_f(h)\right|}{\left|h\right|}\right)\\ & \le \left|L_g\left[\frac{\varepsilon_f(h)}{|h|}\right]\right| + \left(\frac{\left|\varepsilon_g[L_f(h)+\varepsilon_f(h)]\right|}{\left|L_f(h)+\varepsilon_f(h)\right|}\right)\left(\frac{|L_f(h)|}{|h|}+\frac{|\varepsilon_f(h)|}{|h|}\right). \end{split} \end{equation} As $h\rightarrow 0$ the first term tends to zero by the fact that $L_g$ is continuous and that $|\varepsilon(h)|/|h|\rightarrow 0$; the second term tends to zero by definition of $\varepsilon_g$, and the last term tends to zero by $L_f$ being bounded and by the definition of $\varepsilon_f$.
The problem comes at the end of the proof, when dividing by $|L_f(h)+\varepsilon_f(h)|$, since I see no reason to believe the quantity must be non-zero around $a$ i.e. one may not be justified in dividing by it. Can the proof be salvaged somehow?
Yes, you can savage your proof. Note that $$ \lim_{k\rightarrow0}\frac{|\varepsilon_{g}(k)|}{|k|}=0 $$ implies that $$ \lim_{t\rightarrow0}\sup_{k^{\prime}\in\overline{B(0,1)}}\frac{|\varepsilon _{g}(tk^{\prime})|}{t}=0 $$ (it is actually equivalent, but we don't need that). To see this, observe that if $\lim_{k\rightarrow0}\frac{|\varepsilon_{g}(k)|}{|k|}=0$, then for every $\epsilon>0$ there exists $\delta>0$ such that $|\varepsilon_{g} (k)|\leq\epsilon|k|$ for all $k$ with $0<|k|\leq\delta$. Since $\varepsilon _{g}(0)=0$, this inequality holds also for $k=0$. In particular if $k^{\prime }\in\overline{B(0,1)}$ and $0\leq t\leq\delta$, we have that $|tk^{\prime }|\leq\delta$ and so $|\varepsilon_{g}(tk^{\prime})|\leq\epsilon t$, or, equivalently, $\sup_{k^{\prime}\in\overline{B(0,1)}}|\varepsilon _{g}(tk^{\prime})|\leq\epsilon t$. This shows that $\lim_{t\rightarrow0} \sup_{k^{\prime}\in\overline{B(0,1)}}\frac{|\varepsilon_{g}(tk^{\prime})|} {t}=0$.
Since $|L_{f}(h)+\varepsilon_{f}(h)|\leq C|h|$, given $\epsilon>0$ take $\delta>0$ such that $\sup_{k^{\prime}\in\overline{B(0,1)}}|\varepsilon _{g}(tk^{\prime})|\leq\frac{\epsilon}{C}t/C$ for all $0\leq t\leq\delta$. Then $$ |\varepsilon_{g}(L_{f}(h)+\varepsilon_{f}(h))|=\left\vert \varepsilon _{g}\left( \frac{L_{f}(h)+\varepsilon_{f}(h)}{C|h|}C|h|\right) \right\vert \leq\sup_{k^{\prime}\in\overline{B(0,1)}}|\varepsilon_{g}(C|h|k^{\prime} )|\leq\frac{\epsilon}{C}C|h| $$ for all $0<|h|\leq\delta/C$. This proves that $$ \lim_{h\rightarrow0}\frac{|\varepsilon_{g}(L_{f}(h)+\varepsilon_{f}(h))|} {|h|}=0. $$