I am trying to get comfortable with gradients on vector functions. I constructucted the following example. Could you please check if my reasoning is correct? I know it is quite basic, but I have no one else to ask.
Let
$f:\mathbb{R}^n \rightarrow \mathbb{R}^m$
$g:\mathbb{R}^m \rightarrow \mathbb{R}^k$
$z=f(g(x))$
$g(x)=y$
I want to take the gradient of $z$ w.r.t $x$ and this is how I formulated this:
\begin{align} \nabla_xz = \frac{\partial \mathbf{z}}{\partial\mathbf{x}} = \nabla_yz \cdot \nabla_xy \end{align}
where
\begin{align} \nabla_yz = \mathbf{J}_1 \in\mathbb{R}^{m\times n} \text{ and } \nabla_xy = \mathbf{J}_2 \in\mathbb{R}^{m\times k} \end{align}
with $\mathbf{J}_1$ and $\mathbf{J}_1$ being the Jacobians of of the respective gradients. Then I could reformulate $\nabla_xz$ as
\begin{align} \nabla_xz = {\mathbf{J}_1}^T \mathbf{J}_2 \in \mathbb{R}^{n\times k}. \end{align}
Is this formulation correct? If no, please guide me as to where I went wrong.
Your definitions of $f$ and $g$ lead to an invalid composition.
You require the image of $g$ to be the preimage of $f$ for $f(g(x))$ to be sensible.
Define $g:\Bbb R^k\mapsto \Bbb R^m$ and $f:\Bbb R^m\mapsto\Bbb R^n$ so you will have $f\circ g:\Bbb R^k\mapsto\Bbb R^n$
So you will have $\vec x\in\Bbb R^k, \vec y=\vec g(\vec x)\in \Bbb R^m, \vec z=\vec f(\vec y)\in\Bbb R^n$ .
The dimension of a matrix is read rows$\times$columns. The rows of the Jacobian have the dimension of the "numerator" vector, with the columns having the dimension of the "denominator" vector.
For example: $\dfrac{\partial\vec y}{\partial\vec x}=\begin{bmatrix}\dfrac{\partial y_1}{\partial x_1}&\cdots&\dfrac{\partial y_1}{\partial x_k}\\\vdots&\ddots&\vdots\\\dfrac{\partial y_m}{\partial x_1}&\cdots&\dfrac{\partial y_m}{\partial x_k}\end{bmatrix}\in\Bbb R^{m\times k}$
Thus we have:
$$\nabla_{\vec x}\vec y\in\Bbb R^{m\times k}\\\nabla_\vec y\vec z\in\Bbb R^{n\times m}\\\nabla_\vec x\vec z \in\Bbb R^{n\times k}$$
When multiplying matrices $\mathrm A\in\Bbb R^{n\times m}$ and $\mathrm B\in\Bbb R^{m\times k}$ the result is $\mathrm {AB}\in\Bbb R^{n\times k}$. So we require the order of multiplication to be: $$\begin{align}\nabla_\vec x \vec z &=(\nabla_\vec y \vec z)~(\nabla_\vec x\vec y)\\[2ex]& =\sum_{i=1}^m\begin{bmatrix}\dfrac{\partial z_1}{\partial y_i}\dfrac{\partial y_i}{\partial x_1}&\cdots&\dfrac{\partial z_1}{\partial y_i}\dfrac{\partial y_i}{\partial x_k}\\\vdots&\ddots&\vdots\\\dfrac{\partial z_n}{\partial y_i}\dfrac{\partial y_i}{\partial x_1}&\cdots&\dfrac{\partial z_n}{\partial y_i}\dfrac{\partial y_i}{\partial x_k}\end{bmatrix}\end{align}$$
That is all.
$\blacksquare$