What is derivative of Matrix with respect to a Matrix?

493 Views Asked by At

I want to calculate the derivative of dot product of two Matrices of not the same order.

$X = \begin{bmatrix}x_{11} & x_{12} & x_{13}\\x_{21} & x_{22} & x_{23}\\x_{31} & x_{32} & x_{32}\end{bmatrix}$

$y= \begin{bmatrix}y_{11} & y_{12}\\y_{21} & y_{22}\\y_{31} & y_{32}\end{bmatrix}$

Problem is I can't figure out What does it mean to derivative of matrix with respect of matrix individual elements. I tried to use the sum notation to calculate derivative of a single element of the resultant matrix.

$c_{i,j} = \sum_{k=1}^na_{i,k}\cdot b_{k,j}$

$\frac{\partial (X y)_{11}}{\partial X} = \begin{bmatrix}y_{11} & y_{12} & y_{21}\\ 0 & 0 & 0 \\ 0 & 0 & 0\end{bmatrix}$

and the other partial Derivatives are similar to this. I want to know that what is

$\frac{\partial Xy}{\partial X} = ?$

I can't figure out how to get this when the element derivative itself is a matrix. and the matrices as inputs are not even of the same order.

2

There are 2 best solutions below

3
On BEST ANSWER

Since $$Xy = \mathrm{vec}(Xy) = \mathrm{vec}(IXy) = (y\otimes I)'\mathrm{vec}(X)$$ take the derivative wrt $\mathrm{vec}(X)$ to obtain $y\otimes I$. This is consistent with the comment of Ben Grossmann as it is the "vectorization" of said fourth order tensor.

0
On

Based on the comments, it sounds like you have a scalar function $(\phi)$ defined as follows $$\eqalign{ C_{ij} &= \sum_{k=1}^p X_{ik}Y_{kj} \\ \phi &= \sum_{i=1}^m \sum_{j=1}^n C_{ij} \;=\; \sum_{i=1}^m \sum_{j=1}^n \sum_{k=1}^p X_{ik}Y_{kj} \\ }$$ This can be written in matrix notation using an all-ones matrix $J$ the same size as $C$.
In this form, the gradient is very easy to calculate $$\eqalign{ \phi &= J:XY \\ d\phi &= J:dX\,Y \;=\; JY^T:dX \\ \frac{\partial\phi}{\partial X} &= JY^T \\ }$$


In the above, a colon is used as a convenient product notation for the trace function $$A:B = \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} = {\rm Tr}(A^TB)$$ The product can be applied to vectors by treating them as rectangular matrices (set $n=1$) in which case it's just the dot product. The terms in such a product can be rearranged in a number of ways, e.g. $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ due to the properties of the underlying trace function.