Derivative of a triple product matrix

169 Views Asked by At

I am trying to find the solution for the following derivative $$\frac{\partial \boldsymbol E\boldsymbol J\boldsymbol E^{T}}{\partial \boldsymbol E}$$

where $\boldsymbol E$ and $\boldsymbol J$ are both matrices. I tried to search for a solution mainly involving the Kronecker product and found that it could be solved using the chain rule. The problem is that I found two sources giving somewhat different version of the chain rule which I can't manage to check if they are equal. The first one from here gives

$$\frac{\partial (\boldsymbol A\boldsymbol F\boldsymbol )}{\partial \boldsymbol B}=\frac{\partial \boldsymbol A}{\partial \boldsymbol B}(\boldsymbol I\otimes \boldsymbol F)+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial \boldsymbol F}{\partial \boldsymbol B}$$

while the second one from here gives $$\frac{\partial (\boldsymbol A\boldsymbol B)}{\partial \boldsymbol x^{T}}=(\boldsymbol B^{T}\otimes \boldsymbol I)\frac{\partial vec(\boldsymbol A)}{\partial \boldsymbol x^{T}}+(\boldsymbol I\otimes \boldsymbol A)\frac{\partial vec(\boldsymbol B)}{\partial \boldsymbol x^{T}}$$

where I am assuming that the $\boldsymbol x^{T}$ can be seen as $vec(\boldsymbol E)$. Can someone please help me in understanding how these two are equal and at the end, how my original derivative can be solved?

Thanks

2

There are 2 best solutions below

3
On BEST ANSWER

$ \def\bbR#1{{\mathbb R}^{#1}} \def\d{\delta} \def\k{\sum_k} \def\l{\sum_l} \def\e{\varepsilon} \def\n{\nabla}\def\o{{\tt1}}\def\p{\partial} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\B{\Big}\def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\B(#1\B)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3}} \def\c#1{\color{red}{#1}} $The differential of a matrix is easy to work with, since it obeys all of the rules of matrix algebra. So let's start by calculating the differential of your function. $$\eqalign{ F &= EJE^T \\ dF &= dE\;JE^T + EJ\;dE^T \\ }$$ Vectorizing this expression yields
$$\eqalign{ f &= \vecc{F},\qquad e=\vecc{E} \\ df &= \LR{EJ^T\otimes I}\,de + \LR{I\otimes EJ}K\;de \\ \grad{f}{e} &= \LR{EJ^T\otimes I} + \LR{I\otimes EJ}K \\ }$$ where $K$ is the Commutation Matrix associated with the vec() operation.

Another approach to the problem is to use the self-gradient of a matrix, i.e. $$\eqalign{ \grad{E}{E_{ij}} = S_{ij} \\ }$$ where $S_{ij}$ is the matrix whose components are all zero, except for the $(i,j)^{th}$ component which is equal to one. This is sometimes called the single-entry matrix, and it can be used to write the component-wise gradient of the function as $$\eqalign{ \grad{F}{E_{ij}} &= S_{ij}\,JE^T + EJ\,S_{ij} \\ }$$ Yet another approach is to use Index Notation to write the self-gradient (which is a fourth-order tensor) in terms of Kronecker delta symbols as $$\eqalign{ \grad{E_{mn}}{E_{ij}} = \d_{im}\d_{jn} \\ }$$ Then calculate the gradient of the function (also a fourth-order tensor) as
$$\eqalign{ F_{mn} &= \k\l E_{mk}J_{kl}E_{ln}^T \\ \grad{F_{mn}}{E_{ij}} &= \k\l \BR{ \c{\d_{im}\d_{jk}}\;J_{kl}E_{nl} + E_{mk}J_{kl}\;\c{\d_{in}\d_{jl}} } \\ &= \l \d_{im}J_{jl}E_{ln}^T + \k E_{mk}J_{kj}\d_{in} \\ &= \d_{mi}\LR{JE^T}_{jn} + \LR{EJ}_{mj}\d_{in} \\ }$$ Once you are comfortable with the Einstein summation convention, you can drop the $\Sigma$ symbols to write the intermediate steps more concisely.

2
On

We look somewhat more detailed at OPs first identity and calculate \begin{align*} \frac{\partial\left(\boldsymbol {EJE}^{T}\right)}{\partial\boldsymbol {E}} \end{align*}

The matrix derivation used in OPs first cited paper is based on W.J Vetters definition. Let $X=(x_{ij})$ be a matrix of order $(r\times s)$. Let $Y$ be a matrix of order $(p\times q)$. We define the derivative of a matrix $Y$ with respect to a matrix $X$ as partitioned matrix \begin{align*} \frac{\partial\boldsymbol{Y}}{\partial\boldsymbol{X}}:= \begin{pmatrix} \frac{\partial \boldsymbol{Y}}{\partial x_{11}}&\frac{\partial \boldsymbol{Y}}{\partial x_{12}}&\cdots&\frac{\partial \boldsymbol{Y}}{\partial x_{1n}}\\ \frac{\partial \boldsymbol{Y}}{\partial x_{21}}&\frac{\partial \boldsymbol{Y}}{\partial x_{22}}&\cdots&\frac{\partial \boldsymbol{Y}}{\partial x_{2n}}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial \boldsymbol{Y}}{\partial x_{m1}}&\frac{\partial \boldsymbol{Y}}{\partial x_{m2}}&\cdots&\frac{\partial \boldsymbol{Y}}{\partial x_{mn}}\\ \end{pmatrix} =\sum_{i=1}^{r}\sum_{j=1}^{s}\boldsymbol{E}_{ij}^{(r\times s)}\otimes\frac{\partial\boldsymbol{Y}}{\partial\,x_{ij}}\tag{1} \end{align*} The matrix $\boldsymbol{E}_{ij}^{(r\times s)}$ is called elementary matrix. It has order $(r\times s)$, a $1$ at position $(i,j)$ and is zero at all other positions.

Let $\boldsymbol{X}$ be matrix of order $(r\times s)$, $\boldsymbol{Y}$ of order $(p\times q)$ and $\boldsymbol{Z}$ of order $(q\times u)$. We obtain \begin{align*} \color{blue}{\frac{\partial (\boldsymbol{Y}\boldsymbol{Z})}{\partial \boldsymbol{X}}} &=\sum_{i=1}^{r}\sum_{j=1}^{s}\boldsymbol{E}_{ij}^{(r\times s)}\otimes \frac{\partial\left(\boldsymbol{Y}\boldsymbol{Z}\right)}{\partial x_{ij}}\tag{2.1}\\ &=\sum_{i,j}\boldsymbol{E}_{ij}^{(r\times s)}\otimes \left(\frac{\partial\boldsymbol{Y}}{\partial x_{ij}}\boldsymbol{Z}+\boldsymbol{Y}\frac{\partial\boldsymbol{Z}}{\partial x_{ij}}\right)\tag{2.2}\\ &=\sum_{i,j}\boldsymbol{E}_{ij}^{(r\times s)}\otimes \left(\frac{\partial\boldsymbol{Y}}{\partial x_{ij}}\boldsymbol{Z}\right)+\sum_{i,j}\boldsymbol{E}_{ij}^{(r\times s)}\otimes \left(\boldsymbol{Y}\frac{\partial\boldsymbol{Z}}{\partial x_{ij}}\right)\tag{2.3}\\ &=\sum_{i,j}\left(\boldsymbol{E}_{ij}^{(r\times s)}\boldsymbol{I}_s\right)\otimes \left(\frac{\partial\boldsymbol{Y}}{\partial x_{ij}}\boldsymbol{Z}\right)\\ &\qquad+\sum_{i,j}\left( \boldsymbol{I}_r\boldsymbol{E}_{ij}^{(r\times s)}\right)\otimes \left(\boldsymbol{Y}\frac{\partial\boldsymbol{Z}}{\partial x_{ij}}\right)\tag{2.4}\\ &=\sum_{i,j}\left(\boldsymbol{E}_{ij}^{(r\times s)}\otimes\frac{\partial\boldsymbol{Y}}{\partial x_{ij}}\right) \left(\boldsymbol{I}_s\otimes\boldsymbol{Z}\right)\\ &\qquad+\sum_{i,j}\left( \boldsymbol{I}_r\otimes\boldsymbol{Y}\right) \left(\boldsymbol{E}_{ij}^{(r\times s)}\otimes\frac{\partial\boldsymbol{Z}}{\partial x_{ij}}\right)\tag{2.5}\\ &\,\,\color{blue}{=\frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}}\left(\boldsymbol{I}_s\otimes\boldsymbol{Z}\right) +\left( \boldsymbol{I}_r\otimes\boldsymbol{Y}\right)\frac{\partial \boldsymbol{Z}}{\partial \boldsymbol{X}}}\tag{2.6}\\ \end{align*}

Comment:

  • In (2.1) we use the representation (1).

  • In (2.2) we use the product rule of derivation.

  • In (2.3) we use the identity $(\boldsymbol{A}+ \boldsymbol{B})\otimes \boldsymbol{C} =(\boldsymbol{A}\otimes \boldsymbol{C})+(\boldsymbol{B}\otimes \boldsymbol{C})$.

  • In (2.4) we use the identity $\boldsymbol{I}_r\boldsymbol{A}^{(r\times s)}=\boldsymbol{A}^{(r\times s)}=\boldsymbol{A}^{(r\times s)}\boldsymbol{I}_s$.

  • In (2.5) we use the identity $(\boldsymbol{A}\otimes\boldsymbol{C})(\boldsymbol{B}\otimes\boldsymbol{D}) =(\boldsymbol{A}\boldsymbol{B}\otimes\boldsymbol{C}\boldsymbol{D})$

  • In (2.6) we use again the representation from (1).

Based on the identity (2.6) we consider a matrix $X$ of order $(r\times s)$, $Y$ of order $(s\times s)$ and obtain \begin{align*} \color{blue}{\frac{\partial (\boldsymbol{X}\boldsymbol{Y}\boldsymbol{X}^T)}{\partial \boldsymbol{X}}} &=\frac{\partial (\boldsymbol{X}\boldsymbol{Y})}{\partial \boldsymbol{X}} \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right) +\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\boldsymbol{Y}\right) \frac{\partial \boldsymbol{X}^T}{\partial \boldsymbol{X}}\\ &=\left(\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}\left(\boldsymbol{I}_s\otimes\boldsymbol{Y}\right) +\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\right)\frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}}\right) \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right)\\ &\qquad+\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\boldsymbol{Y}\right) \frac{\partial \boldsymbol{X}^T}{\partial \boldsymbol{X}}\\ &=\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}\left(\boldsymbol{I}_s\otimes\boldsymbol{Y}\right) \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right) +\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\right)\frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}} \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right)\\ &\qquad+\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\boldsymbol{Y}\right) \frac{\partial \boldsymbol{X}^T}{\partial \boldsymbol{X}}\\ &\,\,\color{blue}{=\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}\left(\boldsymbol{I}_s\otimes\boldsymbol{Y}\boldsymbol{X}^T\right) +\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\right)\frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}} \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right)}\\ &\qquad\color{blue}{+\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\boldsymbol{Y}\right) \frac{\partial \boldsymbol{X}^T}{\partial \boldsymbol{X}}}\tag{3} \end{align*}

Hint: This notation of matrix calculus presented nicely in OPs cited paper by J.W. Brewer can also be found in Kronecker Products & Matrix Calculus with Applications by A. Graham.

We can simplify (3) somewhat by introducing the permutation matrix $\boldsymbol{U}^{(r\times s)}$ which is of order $(r\times s)$ and which has precisely a $1$ in each row and in each column and is zero otherwise.

We get a permutation matrix \begin{align*} \color{blue}{\frac{\partial \boldsymbol{X}^T}{\partial \boldsymbol{X}}} &=\sum_{i,j}E_{ij}^{(r\times s)}\otimes \frac{\partial \boldsymbol{X}^T}{\partial x_{ij}}\\ &=\sum_{i,j}E_{ij}^{(r\times s)}\otimes E_{ji}^{(r\times s)} \color{blue}{=: \boldsymbol{U}} \end{align*} and the related matrix \begin{align*} \color{blue}{\frac{\partial \boldsymbol{X}}{\partial \boldsymbol{X}}} &=\sum_{i,j}E_{ij}^{(r\times s)}\otimes \frac{\partial \boldsymbol{X}}{\partial x_{ij}}\\ &=\sum_{i,j}E_{ij}^{(r\times s)}\otimes E_{ij}^{(r\times s)} \color{blue}{=: \boldsymbol{\overline{U}}} \end{align*}

With $\boldsymbol{U},\boldsymbol{\overline{U}}$ expression (3) can be written as \begin{align*} \color{blue}{\frac{\partial (\boldsymbol{X}\boldsymbol{Y}\boldsymbol{X}^T)}{\partial \boldsymbol{X}}} &\,\,\color{blue}{=\boldsymbol{\overline{U}}\left(\boldsymbol{I}_s\otimes\boldsymbol{Y}\boldsymbol{X}^T\right) +\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\right)\frac{\partial \boldsymbol{Y}}{\partial \boldsymbol{X}} \left(\boldsymbol{I}_s\otimes\boldsymbol{X}^T\right)}\\ &\qquad\color{blue}{+\left(\boldsymbol{I}_r\otimes\boldsymbol{X}\boldsymbol{Y}\right) \boldsymbol{U}} \end{align*}