Derivative of block matrix using einstein notation

91 Views Asked by At

Let $Y = [A \quad XB \quad C]$, where $A,B,C,X$ are all matrices with appropriate size. What is the derivative of $Y$ w.r.t. $X$?

The part that confuses me is that $\frac{\partial A}{\partial X}$ should be a zero 4-rank tensor. At the same time, $\frac{\partial X^{ij}B^{jk}} {\partial X^{lm}} = \frac{X^{ij}}{X^{lm}}B^{jk} = \delta_{il} \delta_{jm} B^{jk}$ should be a 3-rank tensor. It seems that the size of these tensors do not match.

Thank you in advance.

1

There are 1 best solutions below

4
On

$ \def\bbR#1{{\mathbb R}^{#1}} \def\e{{\large\epsilon}} \def\ve{{\large\varepsilon}} \def\Eij{E_{ij}} \def\Xij{X_{ij}} \def\Ykl{Y_{k\ell}} \def\Bjl{B_{jp}} \def\Dki{\delta_{ki}} \def\smA{{\small A}} \def\smC{{\small C}} \def\smF{{\small F}} \def\smG{{\small G}} \def\smH{{\small H}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\qiq{\quad\implies\quad} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\mc#1{\left[\begin{array}{c|c}#1\end{array}\right]} $Given the matrix $$\eqalign{ Y &= \mc{ A & XB & C \\ } \\ }$$ The gradient of $Y$ with respect to a single scalar element of $X$ is $$\eqalign{ \grad Y\Xij &= \mc{ 0_\smA & \Eij\,B & 0_\smC \\ } \\ }$$ $\Eij$ is a matrix whose components are all $0$ except for the $(i,j)$ element which equals $1$.
It is the component-wise $\:$ self-gradient of $X$ $$\eqalign{ \Eij = \grad X\Xij \\ }$$ The notation $0_\smA$ is meant to denote a zero matrix the same size as the matrix $A,\,$ i.e. $$\eqalign{ &0_\smA = A-A \\ &0_\smC = C-C \\ }$$ Assume that $Y\in\bbR{m\times n}$ and the cartesian basis vectors for the two dimensions are $\e_k\in\bbR{m}$ and $\ve_\ell\in\bbR{n}$, then you can calculate the component-wise gradients as $$\eqalign{ \grad{\Ykl}{\Xij} &= \e_k^T \mc{ 0_\smA & \Eij\,B & 0_\smC \\ }\,\ve_\ell \\ }$$ Most of these scalar gradients will evaluate to zero, but for some restricted range of the index $\LR{\,\ell_\smA < \ell < \ell_\smC}$ the non-zero gradients are given by $$\eqalign{ \grad{\Ykl}{\Xij} &= \Dki\,\Bjl\qquad \LR{\,p\,=\,\ell-\ell_\smA} \\ }$$ Note that the presence of the Kronecker delta symbol means that most of the terms in this formula will also evaluate to zero.