How to calculate a matrix-matrix derivative

137 Views Asked by At

I'm having trouble understanding how to differentiate a matrix-matrix multiplication, and was wondering if anyone had a good textbooks and/or examples that could help me learn the details of this technique.

The function at hand is, $$ Y = AX + C $$

where X is a matrix of size (B,N), A is a matrix of size (N,H), C is a matrix of size (B,H), and Y is a matrix of size (B,H). Where, B, N, and H are scalar values to indicate the size of these matrices. I was using the website http://www.matrixcalculus.org/ to calculate the answer, although this doesn't help me understand the underlying principles behind it, and secondly it's not entirely clear to me as a beginner.

For example, the website states, $$ \displaystyle{\frac{\partial Y}{\partial A} \left( A\cdot X+C \right) = X^\top \otimes \mathbb{I}}$$ and, $$ \displaystyle{\frac{\partial Y}{\partial C} \left( A\cdot X+C \right) = \mathbb{I}\otimes \mathbb{I}} $$

Also, given that it states the identity matrix $\mathbb{I}$ however, the dimensions of it aren't clear to me if it has a size of (N,N) or (H,H)?

So, for $\frac{\partial Y}{\partial A}$ would be of size (N,B) $\otimes \ \mathbb{I}$?

Ideally, I want to be able to calculate these values and understand how it's done!

Apologises for the poor wording of this question!

Thank you in advance!

1

There are 1 best solutions below

2
On BEST ANSWER

The first thing you need to learn is how to vectorize a matrix equation using the Kronecker product $$\eqalign{ \operatorname{vec}(AXB) &= (B^T\otimes A)\operatorname{vec}(X) \\ }$$ Applying this to your example equation yields $$\eqalign{ \operatorname{vec}(Y) &= (X^T\otimes I)\operatorname{vec}(A) + (I^T\otimes I)\operatorname{vec}(C) \\ y &= (X^T\otimes I)\,a + (I\otimes I)\,c \\ }$$ Now your calculations can be performed on ordinary matrix-vector equations, i.e. $$\eqalign{ \frac{\partial y}{\partial a} &= (X^T\otimes I) \\ \frac{\partial y}{\partial c} &= (I\otimes I) \\ }$$ The standard text for this is "Matrix Differential Calculus" by Magnus & Neudecker.