Just checking my math here and getting some help for the exponential part.
$\renewcommand{\v}[1]{\mathrm{vec}\left(#1\right)} \renewcommand{\m}[1]{\mathbf{#1}} \renewcommand{\trace}[1]{\mathrm{trace}\left(#1\right)} \renewcommand{\diag}[1]{\mathrm{diag}\left(#1\right)}$ Given that $\m X$ is a $n \times m$ matrix, what does the following partial derivative equal to ? $$\frac{\partial \trace{\exp(\mathbf X\mathbf X^\top)}}{\partial\m X}$$
We have that $\trace{\mathbf F(\mathbf X)} = \v{\mathbf F(\mathbf X)}^\top \v{\mathbf I}$, so its derivative is $\v{\partial \m F(\m X)}^\top\v{\m I}$.
We also have $\partial \exp f(x) / \partial x = \exp f(x)\partial f(x)/\partial x$.
I guess the partial derivative $\partial \exp \m X \m X^\top / \partial \m X$ must be something like $$\diag{\v{\exp \m X \m X^\top}}(\m I \otimes \m X+\m X \boxtimes \m I)$$
So in the end, the derivative I'm looking for must be equal to $2\m X^\top\exp(\m X\m X^\top)$ ?
The final expression is correct in the sense that if $G:X\mapsto\mathrm{tr}(\exp(XX^T))$, then $$\langle\nabla G(X),Z\rangle=2\mathrm{tr}(ZX^T\exp(XX^T))=2\mathrm{tr}(\exp(XX^T)XZ^T).$$ Thus, if one identifies the gradient $\nabla G(X)$ with the matrix $M_X$ such that $$\langle\nabla G(X),Z\rangle=\mathrm{tr}(ZM_X),$$ then indeed, the gradient is $$2X^T\exp(XX^T)=2\exp(X^TX)X^T.$$ A more logical convention would be to identify the gradient $\nabla G(X)$ with the matrix $M_X$ such that $$\langle\nabla G(X),Z\rangle=\mathrm{tr}(ZM_X^T)=\mathrm{tr}(MZ_X^T),$$ then the gradient would rather be $$2\exp(XX^T)X=2X\exp(X^TX).$$ The intermediate steps proposed in the question are opaque to me. A direct proof of the result uses simply the characterization of the differential $\nabla G(X)$ at $X$ as the unique linear functional on the space of matrices of size $n\times m$ such that, for every such matrix $Z$ and real number $h$, when $h\to0$, $$G(X+hZ)=G(X)+\langle\nabla G(X),Z\rangle h+o(h).$$