How to calculate the differentiation of this expressions which contains matrix variable?

76 Views Asked by At

it is assumed that $\mathbf{X}\in\mathbb{R}^{n\times n}$ is symmetric, and $\mathbf{Y}=\mathbf{M}^\textsf{T}{\mathbf{X}}^{-1}\mathbf{M}$, where $\mathbf{M}\in\mathbb{R}^{n\times m}$ has no special structure. Please calculate the differentiation $\frac{\partial(\ln|\det(\mathbf{Y})|)}{\partial\mathbf{X}}$.

My reference is "The Matrix Cookbook" written by Kaare Brandt Petersen, the link is http://matrixcookbook.com. But I still can not figure out how to solve it.

2

There are 2 best solutions below

1
On

Denote by $V_n$ the vector space of symmetric $n\times n$ real matrices and by $U_n$ the (open) subset of $V_n$ consisting of invertible matrices. Now consider the following four functions: $$ f_1:\mathbb{R}\setminus \{0\} \to \mathbb{R}, \quad x\mapsto \ln|x|, $$ $$ f_2: V_m\to \mathbb{R}, \quad \mathbf{X}\mapsto \det(\mathbf{X}), $$ $$ f_3: V_n\to V_m, \quad \mathbf{X}\mapsto \mathbf{M}^\top \mathbf{X}\mathbf{M} $$ and $$ f_4: U_n\to V_n, \quad \mathbf{X}\mapsto \mathbf{X}^{-1}. $$ Then $\ln|\det(\mathbf{Y})|=(f_1\circ f_2\circ f_3\circ f_4)(\mathbf{X})$.

Now note that $f_1$ is differentiable, with derivative $$ f_1'(x)(h) = \frac{h}{x}. $$ The function $f_2$ is differentiable with derivative $$ f_2'(\mathbf{X})(\mathbf{H}) = \mathrm{tr}((\mathrm{adj}(\mathbf{X}))\mathbf{H}) $$ as you can see in this answer, where $\mathrm{tr}$ is the trace and $\mathrm{adj}(\mathbf{X})$ is the adjoint matrix of $\mathbf{X}$. Also, $f_3$ is a linear map, so it is differentiable and $f_3'(\mathbf{X})=f_3$. Finally $f_4$ is differentiable with derivative $$ f_4'(\mathbf{X})(\mathbf{H}) = - \mathbf{X}^{-1}\mathbf{H}\mathbf{X}^{-1} $$ as you can see here.

From this, you can obtain the desired derivative by using the chain rule.

ADDED: I'm going to show how to finish the computation using the chain rule. Write $f = f_1\circ f_2\circ f_3\circ f_4$. Let's do it step by step: $$ (f_3\circ f_4)'(\mathbf{X})(\mathbf{H}) = (f_3'(f_4(\mathbf{X}))\circ f_4'(\mathbf{X}))(\mathbf{H}) = -\mathbf{X}^{-1}\mathbf{H}\mathbf{X}^{-1}. $$ Next $$ (f_2\circ f_3\circ f_4)'(\mathbf{X})(\mathbf{H}) = f_2'(f_3\circ f_4(\mathbf{X}))\circ (f_3\circ f_4)'(\mathbf{X})(\mathbf{H})=\mathbf{M}^\top(-\mathbf{X}^{-1}\mathbf{H}\mathbf{X}^{-1})\mathbf{M}, $$ and finally, in order to $f$ to be differentiable we need that $\mathbf{M}^\top \mathbf{X^{-1}}\mathbf{M}$ to be invertible, so we need to restrict $f$ to an open set where this occurs, and for such $\mathbf{X}$ we have $$ f'(\mathbf{X})(\mathbf{H}) = f_1'((f_2\circ f_3\circ f_4)(\mathbf{X}))\circ (f_2\circ f_3\circ f_4)'(\mathbf{X})(\mathbf{H}) = -\frac{\mathrm{tr}(\mathrm{adj}(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M})\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{H}\mathbf{X}^{-1}\mathbf{M})}{\det(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M})} = -\frac{\mathrm{tr}(\mathbf{X}^{-1}\mathbf{M}\mathrm{adj}(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M})\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{H})}{\det(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M})} $$ which means that $$ \frac{\partial f}{\partial \mathbf{X}} = -\frac{\mathbf{X}^{-1}\mathbf{M}\mathrm{adj}(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M})\mathbf{M}^\top \mathbf{X}^{-1}}{\det(\mathbf{M}^\top \mathbf{X}^{-1}\mathbf{M}} = -\mathbf{X}^{-1}\mathbf{M}\mathbf{Y}^{-1}\mathbf{M}^\top \mathbf{X}^{-1}, $$ which is the desired derivative.

0
On

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\X{X^{-1}} \def\Y{Y^{-1}} $First, calculate the differentials of $\,\X{\rm\;and\;}Y$ $$\eqalign{ &d\X = -\X dX\; \X \\ &dY = M^T\c{d\X}M \;=\; -{M^T\X dX\;\X M} \\ }$$ Then use a variant of Jacobi's formula to handle the determinant $$\eqalign{ \phi &= \log(\det(Y)) \\ d\phi &= \Y:dY \\ &= -\Y:\LR{M^T\X dX\;\X M} \\ &= -\LR{\X M\Y M^T\X}:dX \\ \grad{\phi}{X} &= -{\X M\Y M^T\X} \\ }$$ where a colon denotes the Frobenius product, which has the following properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \frob{A}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ C:\LR{AB} &= \LR{CB^T}:A \;=\; \LR{A^TC}:B \\ }$$ In the current problem, $\,X{\rm\;and\;}Y\,$ are symmetric, which simplifies some of the calculations.

$[$All relevant formulas can be found on page 9 of the Matrix Cookbook$ ]$