I am relatively new to matrix calculus and I cannot figure out why the following equation holds. I have seen this on multiple answers on this site as a given, but I cannot derive this myself.
It is given that A is a linear transformation on the vector x such that g(x) = f(Ax). Then it follows that,
$$ \nabla g(x) = A^{T}\nabla f(Ax) $$
I was trying to do this approach:
$$ \nabla g(x) = \begin{bmatrix} \frac{\partial g(x)}{\partial x_1} \\ \frac{\partial g(x)}{\partial x_2} \\ ... \\ \frac{\partial g(x)}{\partial x_i} \end{bmatrix} $$ Where I get, $$ \frac{\partial g(x)}{\partial x_i} = \frac{\partial}{\partial x_i}f(Ax)\frac{\partial}{\partial x_i}Ax $$ But I cannot see how this leads to the first equation.
Let's say that $A = (a_{ij})$ is an $m\times n$ matrix and $x = (x_{1},...,x_{n})\in \mathbb{R}^{n}$. We may write $Ax^{T} = y^{T}$ where $y = (y_{1},...,y_{m})\in \mathbb{R}^{m}$. Note that, for each $j = 1,...,m$ we have $$y_{j} = \sum_{k=1}^{m}a_{jk}x_{k} \Rightarrow \frac{\partial y_{j}}{\partial x_{i}} = a_{ji}.$$ Now, $f(Ax) = f(y_{1},...,y_{m})$ and therefore $$\frac{\partial g(x)}{\partial x_{i}} = \sum_{j=1}^{m}\frac{\partial f}{\partial y_{j}}(y_{1},...,y_{m})\frac{\partial y_{j}}{\partial x_{i}} = \sum_{j=1}^{m}\frac{\partial f}{\partial y_{j}}(y_{1},...,y_{m})a_{ji}$$ But the last equality is precise the $i$-th coordinate of $A^{T}\nabla f(Ax)$ where I'm considering $\nabla f(Ax)$ as a column vector, as you did.