Matrix derivatives of determinant and inverse related to $\mathbf{X}\mathbf{X}^{T}+\mathbf{C}$

1.1k Views Asked by At

I would like to calculate the derivatives of determinant and inverse related to the term $\mathbf{X}\mathbf{X}^{T}+\mathbf{C}$ with respect to $\mathbf{X}$, where $\mathbf{C}$ is a constant matrix.

Specifically, the derivative of the determinant, $$\frac{\partial \mathrm{ln}|\mathbf{X}\mathbf{X}^{T}+\mathbf{C}|}{\partial \mathbf{X}},$$ and the derivative of the inverse, $$\frac{\partial \mathbf{a}^{T}(\mathbf{X}\mathbf{X}^{T}+\mathbf{C})^{-1}\mathbf{a}}{\partial \mathbf{X}},$$ where $\mathbf{a}$ is a constant vector.

I checked the Matrix Cookbook and some other resources but did not find the formulas specific for these cases and it seems the chain rule does not directly apply to the derivative of a matrix with respect to a matrix. So I post the question here and look forward to any clues! Thanks!

1

There are 1 best solutions below

5
On BEST ANSWER

Let $W = XX^T + C$ then your 2 functions are $$\eqalign { f &= {\rm log}({\rm det}(W)) = {\rm tr}({\rm log}(W) \cr g &= aa^T:W^{-1} = A:W^{-1} \cr }$$ The differentials with respect to $W$ can be found in the cookbook as $$\eqalign { df &= d\,{\rm tr}({\rm log}(W) \cr &= W^{-T}:dW\cr dg &= A:dW^{-1} \cr &= -A:W^{-1}\,dW\,W^{-1} \cr }$$ Substitute $dW = (dXX^T + XdX^T) = 2\,{\rm sym}(dX X^T)$ into the differentials to obtain $$\eqalign { df &= 2\,W^{-T}:{\rm sym}(dX X^T)\cr &= 2\,{\rm sym}(W^{-T}):dX X^T\cr &= 2\,{\rm sym}(W^{-1})X:dX\cr \cr dg &= -2\,W^{-T}AW^{-T}:{\rm sym}(dX X^T) \cr &= -2\,{\rm sym}(W^{-T}AW^{-T}):dX X^T \cr &= -2\,{\rm sym}(W^{-1}AW^{-1})X:dX \cr }$$ Finally the derivatives are $$\eqalign { \frac {\partial f} {\partial X} &= 2\,{\rm sym}(W^{-1})X \cr &= (W^{-1}+W^{-T})\,X \cr \frac {\partial g} {\partial X} &= -2\,{\rm sym}(W^{-1}AW^{-1})X \cr &= -(W^{-1}AW^{-1}+W^{-T}AW^{-T})X \cr }$$ I have used the fact that $A^T=A$ to simplify the results in some places. If $C$ is also symmetric, then $W^T=W$ and the derivatives can be simplified further.

I have also made extensive use of Frobenius products, which you can replace with traces, $X:Y = {\rm tr}(X^TY)$, if you prefer.