I have seen several papers where it says more or less:
If $f : \mathbb{R}^{n\times n} \to \mathbb{R}$, $g(X) = f(XX^\top)$ for matrices $\mathbb{R}^{n\times k}$ (under mild conditions) the gradient of the function $g(X) = f(XX^\top)$
$$\nabla g(X) = (\nabla f(X X^\top)+\nabla f(XX^\top)) X.$$
I tried to get the previous expression using derivatives, and the definition of inner product for matrix spaces. So as first source I checked Magnus and Neudecker's fantastic book (as always) and I got a clue
$$D(XX^\top) = 2 N_n (X\otimes I_n),$$
in where $N_n$ is a symmetric idempotent matrix, $(I_{n^2}+ K_{nn})/2$ where $K_{nn} \text{vec} A = \text{vec}A^\top$.
So I innocently thought, well let's apply the chaing rule and see where I can use the properties of $\text{vec}$ to put the inner product and find the gradient. But If I blindly applied the chain rule, assuming that everything makes sense, it yields to
$$Dg(X) = 2 Df(XX^\top)N_n (X\otimes I_n).$$
So my question is, how can I connect the previous expression to get the gradient, i.e., how can I see that
$$Df(X)V = \langle \nabla g(X), V \rangle = \text{Tr}(V^\top \nabla g(X)).$$
holds? I do not see it.
I would really appreciate if you can help me out, or send me to an appropriate source to check this kind of problems.
My intuition say that is not true. But If so, what the authors mean... (btw, I've seen this in several papers so maybe I am wrong)
I tried the differential of $XX^\top$ by myself.
$$d(XX^\top) = dXX^\top + X(dX)^\top$$
So vectorizing $$d{\rm vec}(XX^\top) = ((X \otimes I) + (I \otimes X) K) d{\rm vec}(X).$$
So $DXX^\top = (X \otimes I) + (I \otimes X) K$ and by Chain rule leads to
$$Dg(X) = Df(XX^\top)((X \otimes I) + (I \otimes X) K)$$
Let $Y=XX^T$ then the differential of $f$ is $$df = \left(\frac{\partial f}{\partial Y}\right):dY$$ The function $g$ is the same function with a different parameterization, therefore $$\eqalign{ dg &= \left(\frac{\partial f}{\partial Y}\right):d(XX^T) \\ &= \left(\frac{\partial f}{\partial Y}\right):(X\,dX^T+dX\,X^T) \\ &= \left(\left(\frac{\partial f}{\partial Y}\right)^T+\left(\frac{\partial f}{\partial Y}\right)\right)X:dX \\ \frac{\partial g}{\partial X} &= \left(\left(\frac{\partial f}{\partial Y}\right)^T+\left(\frac{\partial f}{\partial Y}\right)\right)X \\ }$$ Based on this result, I think you've simply misread those papers.
NB: A colon is being used as a convenient product notation for the trace, i.e. $$A:B = \operatorname{Tr}(A^TB)$$