Differential of a matrix term wrt a matrix

49 Views Asked by At

Say we have a term as $P = XCX^t-XB-B^tX^t+C$ where all three matrices $X, C, B$ are square $n\times n$ matrices. Matrix $C$ is symmetric and $X,B$ are asymmetric with $X$ having zero as diagonal values (would have $n^2-n$ elements at max). $B, C$ are known and $X$ is the unknown. How can we compute the derivative of $PP^t$ with respect to matrix $X$?

PS. This is a MLE derivation and I want to solve for matrix $A$.

2

There are 2 best solutions below

0
On BEST ANSWER

$ \def\LR#1{\left(#1\right)} \def\sym#1{\operatorname{sym}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} $Define the function $$\eqalign{ \sym A = \tfrac 12\LR{A+A^T} \\ }$$ Then the differential of $P$ is $$\eqalign{ P &= XCX^T - XB - B^TX^T + C \\ dP &= 2\,\sym{dX\,\CLR{CX^T-B}} \\ }$$ and the differential of the product in question is $$\eqalign{ M &= PP^T \\ dM &= 2\,\sym{dP\,P^T} \\ &= 4\,\sym{\sym{dX\,\CLR{CX^T-B}}\,P^T} \\ }$$ If you want the derivative, just substitute $H\to dX$ as the direction of interest.

If you want the gradient, then you have a problem. A matrix-by-matrix gradient is a fourth order tensor and cannot be expressed using standard matrix notation.

However, since the componentwise gradient of a matrix with respect to itself is $$\eqalign{ \grad{X}{X_{ij}} &= E_{ij} \qquad \big\{{\rm single\,entry\;matrix}\big\}\qquad \\ }$$ the componentwise gradient of your product can be written as $$\eqalign{ \grad{(PP^T)}{X_{ij}} &= 4\,\sym{\sym{E_{ij}\CLR{CX^T-B}}\,P^T} \quad \\ }$$

0
On

@greg's answer is much more succinct, but if you want gore:

Let $\phi(X) = X C X^T -XB-B^TX^T+C$, then by examining $\phi(X+H)-\phi(X)$ and taking the linear part we get $D \phi(X) H = HCX^T + XCH^T -HB-B^TH^T$.

Similarly for $\eta(X) = X X^T$ we have $D \eta(X)H = H X^T+X H^T$.

The chain rule gives $D \eta \circ \phi (X) H = D \eta(\phi(X)) D \phi(X)H = (D \phi(X)H) \phi(X)^T + \phi(X) (D \phi(X)H)^T$. Expanding $D \eta \circ \phi (X) H = ( H X^T+X H^T) ( X C X^T -XB-B^TX^T+C) + (X C X^T -XB-B^TX^T+C) ( H X^T+X H^T)$.