Assume $X$ not being a square matrix. If $\mathbf{M}=\mathbf{X}^T \mathbf{X}−\mathbf{I}$ and $f(\mathbf{X})=\operatorname{tr}(\mathbf{M}^T \mathbf{M})$, now can I apply the chain rule to obtain $\frac{\partial f}{\partial \mathbf{X}}$, i.e: $$\frac{\partial \operatorname{tr}(\mathbf{M}^T\mathbf{M})}{\partial \mathbf{M}} \times \frac{\partial \mathbf{M}}{\partial \mathbf{X}}$$
We know that $\frac{\partial\operatorname{tr}(\mathbf{M}^T\mathbf{M})}{\partial \mathbf{M}}=2\mathbf{M}$, but what is $\frac{\partial \mathbf{M}}{\partial\mathbf{X}}$?
Additional note: From Eq. 18 - 19 from http://www.sanjivk.com/SSH_CVPR10.pdf it seems that $\partial f / \partial \mathbf{X} = 2(\mathbf{X}\mathbf{X}^T-\mathbf{I})\mathbf{X}$, but I cannot see how?
First find the differential, then the derivative is easy.
In terms of the double-dot product, $f = \rm{tr}(M'.M) = M:M$
So $$\eqalign{df &= d(M:M) \cr &= 2 M:dM}$$
and $$\eqalign {dM &= d(X'.X -I) \cr &= dX'.X + X'.dX}$$
Substituting the second result into the first yields $$\eqalign { df &= 2 (M:dX'.X + M:X'.dX) \cr &= 2 (X.M':dX + X.M:dX) \cr &= 4 (X.M) : dX }$$
Finally, the derivative is $$ \frac{\partial f}{\partial X} = 4(X.M) $$