I'm having trouble proving a composite of $\mathcal{C}^k$ functions is $\mathcal{C}^k$ without use of partial differentiation. To set up my question, consider the following $\mathcal{C}^k$ functions defined on open sets:
$\begin{matrix} U & \xrightarrow{f} & V & \xrightarrow{g} & \mathbb{R}^p, \\ \cap & & \cap & & \\ \mathbb{R}^n & & \mathbb{R}^m\end{matrix}$
To prove the composite $g \circ f:U \to \mathbb{R}^p$ is $\mathcal{C}^k$, it is usually suggested to start by considering a single application of chain rule:
$D(g \circ f)(p) = Dg(f(p)) \circ Df(p). \tag{1}$
I would like to then apply the chain rule again, but the operation $\circ$ above both is and isn't function composition. To explain, it is in the sense that if
If $\,\,\, \alpha = Df(p) \in L(\mathbb{R}^n,\mathbb{R}^m), \\ \,\, \beta = Dg(f(p)) \in L(\mathbb{R}^m,\mathbb{R}^p), \\ \,\, \gamma = D(g \circ f)(p) \in L(\mathbb{R}^n,\mathbb{R}^p) \\ \mathrm{then\,\,} \gamma = \beta \circ \alpha,$
but also isn't, because upon writing
$A = Df: U \to L(\mathbb{R}^n,\mathbb{R}^m) \\ B = (Dg) \circ f : U \to L(\mathbb{R}^m,\mathbb{R}^p) \\ C = D(g \circ f): U \to L(\mathbb{R}^n,\mathbb{R}^p),$
$(1)$ becomes $C(p) = B(p) \circ A(p)$, and $C = B \cdot A$, where $\cdot$ is a form of multiplication rather than function composition ($B \circ A$ makes no sense, as their source and target spaces make them incompatible with function composition). Apparently, a form of product rule is needed to find the total derivative of $C = B \cdot A$.
I'm aware that both $A,B$ are Frechet differentiable (presuming $k>1$), but I'm not aware of a product rule regarding total differentiation of anything other than a product of real valued functions. Is there such a rule?
I suspect there is a higher order chain rule as well, yet that which uses a product rule for certain vector valued functions.
If you want to see higher order derivatives in terms of linear maps you have to use multilinear maps (see the post Why is the the $k$-th derivative a symmetric multilinear map?).
I have never seen the higher order chain rule done for multilinear maps. If you use partial derivatives, then what you want is called Faa di Bruno's formula. I am taking $p=1$ (for $p\ge 1$ you apply the formula below to each component of $g$). Hope you are familiar with multi-indeces.
For every multi-index $\alpha\in\mathbb{N}_{0}^{n}$, with $0<|\alpha|\leq k$, $$ \frac{\partial^{|\alpha|}}{\partial y^{\alpha}}(g\circ f)(y)=\sum c_{\alpha,\beta,\gamma,l}\frac{\partial^{|\beta|}g}{\partial x^{\beta}% }(f(y))\prod_{i=1}^{|\beta|}\frac{\partial^{|\gamma_{i}|}f_{l_{i}}}{\partial y^{\gamma_{i}}}\left( y\right) , $$ where $c_{\alpha,\beta,\gamma,l}\in\mathbb{R}$ are constant, the sum is done over all $\beta\in\mathbb{N}_{0}^{m}$ with $1\leq|\beta|\leq|\alpha|$, $\gamma=(\gamma_{1},\ldots,\gamma_{|\beta|})$, $\gamma_{i}\in\mathbb{N} _{0}^{n}$, with $|\gamma_{i}|>0$ and $\sum_{i=1}^{|\beta|}\gamma_{i}=\alpha$, and $l=(l_{1},\ldots,l_{|\beta|})$, $l_{i}\in\{1,\ldots,m\}$, $i=1, \ldots, |\beta|$. The proof is by induction on $k$. You can find it in this paper https://pdfs.semanticscholar.org/d36c/4938b649ce364aceeb47224f677b3deab0d3.pdf
Having said that, to prove that the composition of functions of class $C^k$ is still $C^k$, I would just do induction on $k$. You prove it for $k=1$ and then assume true for $k-1$. Then you consider the first order partial derivatives of $g\circ f$ which are given in terms of sums products and compositions of partial derivatives of $f$ and of $g$, which are all of (at least) class $C^{k-1}$ and so you can apply the induction hypothesis to get that the first order partial derivatives of $g\circ f$ are of class $C^{k-1}$. In turn, $g\circ f$ is of class $C^k$