partial derivative of transpose matrix-matrix multiplication

149 Views Asked by At

I came across some problems that are related to partial derivative but I haven't learnt this yet. And I looked up many online resources but couldn't find answers to my doubts. Really hope someone can help me.
Here is my problem. $y=A^TB$, where A, B are two matrices. Now I want to know what $\frac{\partial y}{\partial A}$,$\frac{\partial y}{\partial B}$ are.

1

There are 1 best solutions below

3
On

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$Let $\,(\alpha,\beta)\,$ be fourth-order tensors with components $$\eqalign{ \alpha_{ijk\ell} &= \delta_{ik}\,\delta_{j\ell} \\ \beta_{ijk\ell} &= \delta_{i\ell}\,\delta_{jk} \\ }$$ and properties with respect to the matrices $(F,G,H)$ $$\eqalign{ \alpha:H &= H:\alpha = H \\ \beta:F &= F:\beta = F^T \\ HFG &= H\alpha G^T:F \\ }$$ where a colon denotes a double-contraction product, i.e. $$\eqalign{ \left(\alpha:H\right)_{ij} &= \sum_k\sum_\ell\alpha_{ijk\ell}\,H_{k\ell} \\ \left(F:\beta\right)_{k\ell} &= \sum_i\sum_jF_{ij}\,\beta_{ijk\ell} \\ }$$ and juxtaposition implies a single-contraction product $$\eqalign{ \left(H\alpha\right)_{mjk\ell} &= \sum_i H_{mi}\,\alpha_{ijk\ell} \\ \left(\alpha G^T\right)_{ijkm} &= \sum_\ell\alpha_{ijk\ell}\,G^T_{\ell m} \\ }$$

With these tensors, the posted question can be answered as follows $$\eqalign{ Y &= A^TB \\&= A^T\alpha:B \quad&\implies\quad\p{Y}{B} &= A^T\alpha \\ Y &= \alpha B^T:A^T \\ &= \alpha B^T:\beta:A \quad&\implies\quad\p{Y}{A} &= \alpha B^T:\beta \\ }$$ So the gradients in question are seen to be fourth-order tensors.

An approach which avoids higher-order tensors, is to transform the relationship into a vector equation using Kronecker products. $$\eqalign{ {\rm vec}(Y) &= (I\otimes A^T)\;{\rm vec}(B) \quad&\implies\quad \p{{\,\rm vec}\,Y}{{\,\rm vec}\,B} = (I\otimes A^T) \\ &= (B^T\otimes I)K\;{\rm vec}(A) \quad&\implies\quad \p{{\,\rm vec}\,Y}{{\,\rm vec}\,A} = (B^T\otimes I)K \\ }$$ where $K$ is the commutation matrix associated with vectorization.

Another approach is to use component-wise derivatives $$\eqalign{ \p{Y}{A_{ij}} &= E_{ij}^TB \qquad\quad \p{Y}{B_{ij}} &= A^TE_{ij} \\ }$$ where $E_{ij}$ is a matrix with all components equal to zero, except the $(i,j)$ component which equals one. And any matrix with independent components satisfies the identity $$\eqalign{ \p{G}{G_{k\ell}} &= E_{k\ell} \qquad\iff\qquad \p{G^T}{G_{k\ell}} &= E_{k\ell}^T \\ }$$ Finally, to bring things full circle $$\eqalign{ \p{G_{ij}}{G_{k\ell}} &= \alpha_{ijk\ell} \qquad\iff\qquad \p{G_{ij}^T}{G_{k\ell}} &= \beta_{ijk\ell} \\ }$$