Matrix-Matrix Derivative with Product Rule

62 Views Asked by At

I'm trying to find the derivative below, for a 3x3 matrix $X$: $$\frac{\partial Y}{\partial X}=\frac{\partial}{\partial X}\left(\left(\det{X}\right)^{-1/3}\left(\operatorname{tr}{X}\right)X^{-T}\right)$$

I know that $\frac{\partial \det{X}}{\partial X} = (\det{X})X^{-T}$ and that $\frac{\partial \operatorname{tr}{X}}{\partial X} = I$, but I'm unsure how to put everything together to get the output rank 4 tensor. My eventual goal is to get an expression $\frac{\partial Y_{ij}}{\partial X_{kl}}$ that I can put into code, if that simplifies any steps.

I've tried to follow the steps from Magnus-Neudecker (although I'm self-teaching myself their method), and the relevant differentials seem to be: $$ d(\det{X}) = \det{(x)}\operatorname{tr}{(X^{-1}})dX$$ $$ d(\operatorname{tr}{X}) = \operatorname{tr}{(I)}dX$$ $$ d(X^{-1}) = -X^{-1} (dX) X^{-1}$$ $$ d(X^T) = K dX$$ (where $K$ is the commutation matrix)

However, I'm unsure how to put everything together for the vectorized derivative (especially, how to take the $dX$ out from in between the two $X^{-1}$ for $d(X^{-1})$, and how to combine $d(X^{-1})$ and $d(X^T)$ into a single expression).

Any pointers (for the Magnus-Neudecker method or, honestly, any other method) would be much appreciated. Thank you!

1

There are 1 best solutions below

0
On

$ \def\a{\alpha} \def\b{\beta} \def\g{\lambda} \def\d{\delta} \def\E{{\cal E}} \def\F{{\cal F}} \def\G{{\cal G}} \def\H{{\cal H}} \def\M{{\cal M}} \def\aa{\a^{-1/3}} \def\aaa{\a^{-4/3}} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\vecc#1{\op{vec}\LR{#1}} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\dX{\c{dX}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $For typing convenience, define the variables $$\eqalign{ C^T &= X^{-1} &\qiq dC = -C\cdot dX^T\cdot C \\ \a &= \det(X) &\qiq d\a = \a C:dX \\ \b &= \det(X) &\qiq d\b = I:dX \\ }$$ and single-dot $(\cdot)$, double-dot $(:)$ and the dyadic $(\star)$ products between matrices $(A,B,C)$ and tensors $(\H,\M)$ $$\eqalign{ \M &= A\cdot\H\cdot C &\qiq \M_{iklq} = \sum_{j=1}^m\sum_{p=1}^n A_{ij}\H_{jklp}C_{pq} \\ G &= A:\H &\qiq C_{kl} = \sum_{i=1}^m\sum_{j=1}^n A_{ij}\H_{ijkl} \\ \g &= A:C &\qiq \quad\g = \sum_{i=1}^m\sum_{j=1}^n A_{ij}C_{ij} \\ \H &= A\star B &\qiq H_{ijkl} = A_{ij}B_{kl} \\ }$$ There are 3 special fourth-order tensors whose components (in terms of Kronecker delta symbols) are $$\eqalign{ \E_{ikjl} = \F_{iklj} = \G_{ijkl} = \d_{ij}\d_{kl} \\ }$$ which have the following useful properties $$\eqalign{ \E:A &= A:\E &= A \\ \F:A &= A:\F &= A^T \\ \G:A &= A:\G &= I\:\trace{A} \\ }$$ $$\eqalign{ A\cdot B\cdot C &= \LR{A\cdot\E\cdot C^T}:B \\ \grad{A}{A} &= \E, \qquad \grad{A^T}{A} = \F \\ }$$


Use the above notation to write the function and differentiate it $$\eqalign{ Y &= \b \aa C \\ dY &= \aa C\:d\b + \b C\;d\aa + \b\aa\:dC \\ &= \aa C\LR{I:\dX} - \b C\LR{\tfrac13\aaa\;\a C:\dX} - \b\aa\LR{C\cdot\dX^T\cdot C} \\ &= \aa\BR{\LR{C\star I} - \tfrac13\b\LR{C\star C} - \b\LR{C\cdot\E\cdot C^T}:\F}:\dX \\ \grad{Y}{X} &= \aa\BR{C\star\LR{I-\tfrac13\b C} - \LR{\b C\cdot\E\cdot C^T}:\F} \\ }$$ This is a tensor expression for the gradient.

It might be easier to calculate component-wise gradients using the single entry matrix $E_{ij}$ whose $(i,j)$ component equals one and all other components are zero. Not coincidentally, this is also the component-wise self-gradient of a matrix variable $$\eqalign{ \grad{X}{X_{ij}} &= E_{ij} = E_{ji}^T \\ }$$ Picking up at the expression for $dY$ $$\eqalign{ dY &= \aa\BR{C\LR{I:\dX} - \tfrac13\b C\LR{C:\dX} - \b\LR{C\cdot\dX^T\cdot C}} \\ \grad{Y}{X_{ij}} &= \aa\BR{C\LR{I:\c{E_{ij}}} - \tfrac13\b C\LR{C:\c{E_{ij}}} - \b\LR{C\cdot\c{E_{ji}}\cdot C}} \\ &= \aa\BR{\LR{\d_{ij}-\tfrac13\b C_{ij}}C -\b\LR{c_j\cdot c_i^T}} \\ }$$ where $c_j$ denotes the $j^{th}$ column of $C$, $\;c_i^T$ the $i^{th}$ row, and $C_{ij}$ the $(i,j)^{th}$ component.

Extracting the $(k,l)^{th}$ component of these matrix-valued gradients yields $$\eqalign{ \grad{Y_{kl}}{X_{ij}} &= \aa\BR{\LR{\d_{ij}-\tfrac13\b C_{ij}}C_{kl} -\b C_{kj}C_{il}} \\ }$$ which is the tensor gradient expressed in index notation.

The least illuminating way to tackle the problem is to vectorize the expressions using Kronecker $(\otimes)$ products. This effectively flattens matrices into vectors and fourth-order tensors into matrices. $$\eqalign{ x &= \vecc{X},\quad y = \vecc{Y},\quad c = \vecc{C},\quad i = \vecc{I} \\ dY &= \aa\BR{C\LR{I:\dX} - \tfrac13\b C\LR{C:\dX} - \b\LR{C\cdot\dX^T\cdot C}} \\ dy &= \aa\BR{c\LR{i\cdot dx} - \tfrac13\b c\LR{c\cdot dx} - \b\LR{C^T\otimes C}K\cdot dx} \\ &= \aa\BR{c\cdot i^T - \tfrac13\b c\cdot c^T - \b\LR{C^T\otimes C}K}\cdot dx \\ \grad{y}{x} &= \aa\BR{c\cdot i^T-\tfrac13\b c\cdot c^T-\b\LR{C^T\otimes C}K} \\ }$$ where $K$ is the Commutation Matrix associated with the vectorization of transposed matrices.