Let $f : \mathbb{R}^n \rightarrow \mathbb{R}$ be a $C^2$ function (having continuous first and second derivatives). Define $g(z):=f(Az+b)$, $\forall z \in \mathbb{R}^n$ for a matrix $A \in \mathbb{R}^{n\times n}$ and a vector $b \in \mathbb{R}^n$. Show that $$\nabla g(z)=A^{\top} \nabla f(x)$$
My try:
I know we can write $\nabla g(z)=\frac{\partial x}{\partial z}\nabla f(x)=A^{\top} \nabla f(x)$ but I want to show it using matrix manipulation. Please complete my derivation or comment on that.
$$ \nabla g(z) = \begin{bmatrix} \frac{\partial g(z)}{\partial z_1}\\ \vdots\\ \frac{\partial g(z)}{\partial z_n} \end{bmatrix} = \begin{bmatrix} \frac{\partial }{\partial z_1}f(x)\\ \vdots\\ \frac{\partial}{\partial z_n}f(x) \end{bmatrix} = \begin{bmatrix} \frac{\partial }{\partial z_1}\frac{\partial x_1}{\partial x_1}f(x)\\ \vdots\\ \frac{\partial}{\partial z_n}\frac{\partial x_n}{\partial x_n}f(x) \end{bmatrix} = \begin{bmatrix} \frac{\partial x_1}{\partial z_1}\frac{\partial }{\partial x_1}f(x)\\ \vdots\\ \frac{\partial x_n}{\partial z_n}\frac{\partial }{\partial x_n}f(x) \end{bmatrix} \tag{1} $$
On the other hand,
$$ x_i=[x]_{i1}=[Az+b]_{i1}=\sum_{k=1}^n a_{ik}z_{k1}+[b]_{i1} $$ Therefore, $$ \frac{\partial x_i}{\partial z_i}=\frac{\partial [x]_{i1}}{\partial z_{i1}} = a_{ii} \tag{2} $$
I cannot match $(1)$ using $(2)$. Could you please revise my derivation or complete it?
An easier way avoiding these messy calculations is using the characterization $DF(p)(v)=\langle \nabla F(p),v\rangle$, for all $F:\Bbb R^n\to \Bbb R$. Since the total derivative of a linear map is itself, we have $$Dg(z)(v) = Df(Az+b)(Av) = \langle \nabla f(Az+b), Av\rangle =\langle A^\top \nabla f(Az+b), v\rangle.$$Thus $\nabla g(z) = A^\top \nabla f(Az+b)$.