I would like to compute the following derivative:
$$\frac{d}{d\mathbf{X}} |\mathbf{X}^{T} \mathbf{X}|$$
Minka, in 'Old and New Matrix Algebra Useful for Statistics', says:
$$\frac{d}{d\mathbf{X}}|\mathbf{X}^{T} \mathbf{X}| = 2 |\mathbf{X}^{T}\mathbf{X}| (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T} $$
His reasoning is that because:
$$ d|\mathbf{X}^{T}\mathbf{X}| = 2|\mathbf{X}^{T}\mathbf{X}|\text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) $$
We get the aforementioned derivative.
Question: How do we get rid of the trace? What justifies this?
Is this saying that:
$$ \text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}d\mathbf{X} $$
That doesn't seem right to me.
@greg has some way to do with using the Frobenius inner product, but I haven't see that anywhere but on some answers.
Could anyone provide some insight into what is going on?
Thanks
Let $$Y=X^TX$$ Then the gradient of the function $(\log\det Y)$ is a well known result which can be looked up in the Matrix Cookbook or on Wikipedia. $$\eqalign{ f &= \log\det Y \\ G = \frac{\partial f}{\partial Y} &= (\det Y)\;Y^{-T} \\ }$$ All that's needed to answer this question is to perform a change of variables from $Y\to X$. $$\eqalign{ df &= G:dY \\ &= G:(dX^TX + X^TdX) \\ &= G:dX^TX + G:X^TdX \\ &= G^T:X^TdX + G:X^TdX \\ &= (G^T+G):X^TdX \\ &= 2G:X^TdX \\ &= 2XG:dX \\ &= 2(\det Y)XY^{-T}:dX \\ &= 2(\det X^TX)X(X^TX)^{-1}:dX \\ \frac{\partial f}{\partial X} &= 2(\det X^TX)X(X^TX)^{-1} \\ }$$ where a colon is employed as a convenient product notation for the trace, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) \\ }$$