I'm trying to find the differential of the following multi variable function and then use the external definition of gradient in order to find its gradient: $$ f(\overline{x})=\phi(A\bar{x}) \,,where\,\bar{x}\in\mathbb{R}^{m},\,A\in\mathbb{R}^{n\times m}\,,\phi:\mathbb{R}^{n}\rightarrow\mathbb{R},\,f:\mathbb{R}^{m}\rightarrow\mathbb{R} $$
My calculation is:
let $u=Ax$, then $$df\underbrace{=}_{definition\,of\,differential}\frac{\phi(u)}{du}\cdot du\underbrace{=}_{substitution+chain\,rule}\frac{d\phi(Ax)}{dAx_{1}}\cdot dAx_{!}\cdot\underbrace{Adx_{1}}_{inner\,derivative}+\ldots+\frac{d\phi(Ax)}{dAx_{n}}\cdot dAx_{n}\cdot\underbrace{Adx_{n}}_{inner\,derivative}=\Sigma_{i=1}^{n}\frac{d\phi(Ax)}{dAx_{i}}\cdot dAx_{i}\cdot\underbrace{Adx_{i}}_{inner\,derivative}\underbrace{=}_{inner\,product\,definition}\langle A^{T}(\nabla\phi(Ax))^{T},dAxdx\rangle $$
What's my mistake here and how should it be done correctly?
Converting to indices makes things easier. Let $A=A^j_{\:i}$ (rows then columns) and let $$u^j = \sum_i A^j_{\:i} \: x^i$$ so that
$$ f(x) = \phi(Ax) =\phi(u) \text{ .}$$ By definition of the differential, we have that
$$df = \sum_k \dfrac{\partial f}{\partial x^k} dx^k $$ and thus
$$ df = \sum_k \dfrac{\partial \phi}{\partial x^k} dx^k \text{ .}$$
By the chain rule, we know that
$$\dfrac{\partial \phi(u(x))}{\partial x^k} = \sum_j\dfrac{\partial \phi}{\partial u^j} \dfrac{\partial u^j}{\partial x^k} $$ therefore
$$ df = \sum_k \sum_j\dfrac{\partial \phi}{\partial u^j} \dfrac{\partial u^j}{\partial x^k} dx^k \text{ .}$$
Looking more closely at $\dfrac{\partial u^j}{\partial x^k}$, the product rule gives us that
$$ \dfrac{\partial u^j}{\partial x^k} = \dfrac{\partial (\sum_i A^j_{\:i} \: x^i)}{\partial x^k} = \sum_i\bigg(\dfrac{\partial A^j_{\:i} \:}{\partial x^k} x^i + A^j_{\: i}\dfrac{\partial x^i}{\partial x^k}\bigg)$$ where $\dfrac{\partial x^i}{\partial x^k} $ equals $1$ or $0$ depending on if $i=k$ thus
$$\dfrac{\partial u^j}{\partial x^k} = \sum_i\bigg(\dfrac{\partial A^j_{\:i} \:}{\partial x^k} x^i + A^j_{\: k}\bigg) \text{ .}$$
We can now substitute this into $df$ so
$$ df = \sum_k \sum_j \sum_i\dfrac{\partial \phi}{\partial u^j}\bigg(\dfrac{\partial A^j_{\:i} \:}{\partial x^k} x^i + A^j_{\: k}\bigg) dx^k \text{ .} $$
If you are using a cartesian coordinate system, the the components of $df$ are the components of the gradient. In other words, the gradient can be indexed as
$$ (\nabla f)_k = \sum_j \sum_i\dfrac{\partial \phi}{\partial u^j}\bigg(\dfrac{\partial A^j_{\:i} \:}{\partial x^k} x^i + A^j_{\: k}\bigg) \text{ .}$$