What is the derivative of matrix multiplication, wrt another matrix?

761 Views Asked by At

I'm in a deep learning class, and I always seem to mess up derivative questions, because I put the matrices in the wrong order or transposed/not when they were supposed to be the other way around.

Here's one simple question I have, what is:

$$\frac{ \partial (A B) }{ \partial X }$$

When $A \in \mathbb{R}^{M \times N}$, $B \in \mathbb{R}^{N \times P}$, and $X \in \mathbb{R}^{U \times V}$.

My class uses "denominator convention", which according to my notes means the answer should be a tensor with dimensions $U \times V \times P \times M$.

I'm aware of the "Matrix Cookbook", but that usually doesn't seem to contain what I need. If anyone can recommend a good book for learning this material, that would be great. My class doesn't talk about "contravariant, covariant" etc., so I'm not trying to learn differential geometry. I just want to know the matrix algebra equivalent of all of the calculus rules (given that these are matrices/tensors, not just real numbers).

1

There are 1 best solutions below

2
On BEST ANSWER

Let $$C=A\star B$$ where $(A,B,C)$ are tensors (scalars, vectors, matrices, other) and $(\star)$ is any product (Matrix, Hadamard, Frobenius, Kronecker, Dyadic, other) which is compatible with the tensor dimensions.

The only rule that you should memorize is the product rule for differentials $$dC = dA\star B + A\star dB$$ where the order is important when the product is not commutative.

The nice thing about the differential expression is that the quantities $(dA,dB,dC)$ have same tensorial character as $(A,B,C)$ and no higher-order tensors are required.

For example if $(A)$ is a matrix and $(B,C)$ are vectors then $(dA)$ is a matrix and $(dB,dC)$ are vectors.

Further, if the independent variable $(x)$ is a scalar, then the gradient will have exactly the same form as the above product rule, i.e. $$\frac{dC}{dx} = \left(\frac{dA}{dx}\right)\star B + A\star\left(\frac{dB}{dx}\right)$$ Index notation is always an option, e.g. for the given example $$\eqalign{ C_{ik} &= \sum_{j=1}^N A_{ij}\,B_{jk} \\ dC_{ik} &= \sum_{j=1}^N dA_{ij}\,B_{jk} + A_{ij}\,dB_{jk} \\ \frac{\partial C_{ik}}{\partial X_{pq}} &= \sum_{j=1}^N \left(\frac{\partial A_{ij}}{\partial X_{pq}}\right)B_{jk} + A_{ij}\left(\frac{\partial B_{jk}}{\partial X_{pq}}\right) }$$