Let $f(A):= A^\top A$ where $A$ is an $m \times n$ matrix. We want to find the derivative of $f$ with respect to $A$. By derivative we mean to find the Jacobian of all partial derivatives of $f(A)$ with respect to $A$. Here is how I proceed.
The Derivative of $f$ is the linear map $D f(A): X \to A^\top X + X^\top A$. Let $K$ be the commutation matrix such that $K\operatorname{vec}(X^\top A) = \operatorname{vec}(A^\top X)$. Then,
\begin{align} \operatorname{vec}(A^\top X + X^\top A) & = \operatorname{vec}(A^\top X) + \operatorname{vec}(X^\top A) \\ & = (I_n\otimes A^\top) \operatorname{vec}(X) + \operatorname{vec}(X^\top A) \\ & = I_n (\otimes A^\top) \operatorname{vec}(X) + K_{n,n} \operatorname{vec}(A^\top X) \\ & = (I_n \otimes A^\top) \operatorname{vec}(X) + K_{n, n} (I_n \otimes A^\top) \operatorname{vec}(X) \end{align}
It now follows that \begin{align} \frac{\partial f}{\partial A} & = (I_n \otimes A^\top) + K_{n, n} (I_n \otimes A^\top) \end{align}
In here I am using the fact that $\operatorname{vec}(AXB) = (B^\top \otimes A)\operatorname{vec}(X)$ where $\operatorname{vec}$ is the vectorization operator.
I was inspired by this answer and the corresponding equation under the section Differentials of Quadratic Products on this webpage
My Questions:
Is this approach correct?. If not how do I go about finding the desired derivative?
Where can I find references regarding this type of manipulation?. (I don't mean this particular manipulation, but a reference for derivatives of matrices in general). I looked on Horn and Johnson Matrix Analysis, but a 'commutation matrix' is nowhere to be found. When I say reference, I mean a rigorous linear algebraic exposition.
Take the differential of the expression $$\eqalign{ F &= A^TA \cr dF &= dA^T\,A + A^T\,dA \cr }$$ At this point, you can either use vectorizations $$\eqalign{ {\rm vec}(dF) &= {\rm vec}(dA^T\,A) + {\rm vec}(A^T\,dA) \cr df &= (A^T\otimes I)(K\,da) + (I\otimes A^T)\,da \cr \frac{\partial f}{\partial a} &= (A^T\otimes I)K + (I\otimes A^T) \cr }$$ or tensor methods $$\eqalign{ dF &= (I{\mathcal E}A^T):({\mathcal K}:dA) + (A^T{\mathcal E}I):dA \cr \frac{\partial F}{\partial A} &= ({\mathcal E}A^T):{\mathcal K} + A^T{\mathcal E} \cr }$$ where a colon represents the double-contraction product, i.e. $$(X:{\mathcal E})_{kl} = \sum_{ij} X_{ij} {\mathcal E}_{ijkl} $$ while juxtapositions represent single-contractions $$(X{\mathcal E}Y)_{ikmr} = \sum_{jp} X_{ij} {\mathcal E}_{jkmp} Y_{pr} $$ The isotropic 4th order tensors have components $$\eqalign{ {\mathcal E}_{ijkl} &= \delta_{ik} \delta_{jl} \cr {\mathcal K}_{ijkl} &= \delta_{il} \delta_{jk} \cr\cr }$$ For references, try
"Matrix Differential Calculus" by Magnus and Neudecker
"Complex-Valued Matrix Derivatives" by Are Hjorungnes