Computing the gradient of $\det \left( \mathbf{X}^{T} \mathbf{X} \right)$

117 Views Asked by At

I would like to compute the following derivative:

$$\frac{d}{d\mathbf{X}} |\mathbf{X}^{T} \mathbf{X}|$$

Minka, in 'Old and New Matrix Algebra Useful for Statistics', says:

$$\frac{d}{d\mathbf{X}}|\mathbf{X}^{T} \mathbf{X}| = 2 |\mathbf{X}^{T}\mathbf{X}| (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T} $$

His reasoning is that because:

$$ d|\mathbf{X}^{T}\mathbf{X}| = 2|\mathbf{X}^{T}\mathbf{X}|\text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) $$

We get the aforementioned derivative.


Question: How do we get rid of the trace? What justifies this?

Is this saying that:

$$ \text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}d\mathbf{X} $$

That doesn't seem right to me.

@greg has some way to do with using the Frobenius inner product, but I haven't see that anywhere but on some answers.

Could anyone provide some insight into what is going on?

Thanks

1

There are 1 best solutions below

0
On BEST ANSWER

Let $$Y=X^TX$$ Then the gradient of the function $(\log\det Y)$ is a well known result which can be looked up in the Matrix Cookbook or on Wikipedia. $$\eqalign{ f &= \log\det Y \\ G = \frac{\partial f}{\partial Y} &= (\det Y)\;Y^{-T} \\ }$$ All that's needed to answer this question is to perform a change of variables from $Y\to X$. $$\eqalign{ df &= G:dY \\ &= G:(dX^TX + X^TdX) \\ &= G:dX^TX + G:X^TdX \\ &= G^T:X^TdX + G:X^TdX \\ &= (G^T+G):X^TdX \\ &= 2G:X^TdX \\ &= 2XG:dX \\ &= 2(\det Y)XY^{-T}:dX \\ &= 2(\det X^TX)X(X^TX)^{-1}:dX \\ \frac{\partial f}{\partial X} &= 2(\det X^TX)X(X^TX)^{-1} \\ }$$ where a colon is employed as a convenient product notation for the trace, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) \\ }$$