I am trying to calculate the gradient of the following function
$$f(X) = \mbox{tr} \left( (AX)^t (AX) \right)$$
Chain's rule gives
$$\nabla_X(f(X)) = \nabla_X (\mbox{tr}(AX))\nabla_x(AX)$$
However, I'm having trouble with those two derivatives.
What is $\nabla_X tr(AX)$? Is it $A^t$? I did the math and obtained that $\frac{\partial(tr(AX))}{\partial x_{ij}} = a_{ji}$, but I'm not sure... And also what is $\nabla_X AX$? Is it simply $A$? I tried differentiating this but failed to see if this holds or not.
Thanks in advance
The gradient $\nabla_{X}f$ is defined as the vector in $\mathcal{M}_{n}(\mathbb{R})$ such that :
$$ f(X+H) = f(X) + \left\langle \nabla_{X}f, H \right\rangle + o(\Vert H \Vert) $$
where $\left\langle \cdot,\cdot \right\rangle$ is the usual inner product on $\mathcal{M}_{n}(\mathbb{R})$ (i.e. $\left\langle A,B \right\rangle = \mathrm{tr}(A^{\top}B)$). By expanding $f(X+H)$, you get :
$$ f(X+H) = f(X) + \underbrace{2\mathrm{tr}(H^{\top}A^{\top}AX)}_{= \; \left\langle 2A^{\top}AX,H \right\rangle} + \underbrace{\mathrm{tr}(H^{\top}A^{\top}AH)}_{= \; o(\Vert H \Vert)} $$
By identification : $\nabla_{X}f = 2A^{\top}AX$.