Matrix-by-matrix derivative

788 Views Asked by At

The definition of the matrix-by-matrix derivative is:

$$ \frac{\partial X_{kl}}{\partial X_{ij}}=\delta_{ik}\delta_{lj} $$

If the matrices are $n\times n$, then the resulting matrix will be $n^2 \times n^2$.


Is the following identity valid for the matrix-by-matrix derivative?

$$ \frac{\partial}{\partial A} AB = \frac{\partial A}{\partial A} B + A\frac{\partial B}{\partial A} $$


If so, I do not understand how we can multiply a $n^2 \times n^2$ matrix by a $n \times n$ matrix?

$$ \frac{\partial}{\partial A} AB = \underbrace{\frac{\partial A}{\partial A}}_{n^2\times n^2} \overbrace{B}^{n\times n} + \overbrace{A}^{n\times n} \underbrace{\frac{\partial B}{\partial A}}_{n^2 \times n^2} $$

1

There are 1 best solutions below

0
On

As suggested in the comments, computing the gradient of a matrix with respect to a matrix will result in a fourth-order tensor.

The product rule holds if you consider differentials. For example:

$$ \begin{align} F &= AB \\ dF &= dAB + A dB \end{align} $$

Now you may not want to work with fourth-order tensors, thus you can look for a "flattened", matricial represetation of the tensor. Suppose you want to the compute this representation for $\frac{\partial F}{\partial A}$.

You can proceed by vectorizing both sides:

$$ \begin{align} \rm{vec}(dF) &= \rm{vec}(dAB)\\ &= \rm{vec}(I~dAB) \end{align} $$ where I is the identity matrix.

And using the Kronecker-vec relation:

$$ \begin{align} \rm{vec}(dF) &= (B^T \otimes I) \rm{vec}(dA) \\ df &= (B^T \otimes I) ~da \end{align} $$

Thus:

\begin{align} \frac{\partial f}{\partial a} = B^T \otimes I \end{align}

Which is $n^2 \times n^2$ instead of $n\times n \times n \times n$.

If you do want to work with fourth-order tensors, without the vectorization trick, then you can proceed as in this answer.