I'm having trouble understanding how to differentiate a matrix-matrix multiplication, and was wondering if anyone had a good textbooks and/or examples that could help me learn the details of this technique.
The function at hand is, $$ Y = AX + C $$
where X is a matrix of size (B,N), A is a matrix of size (N,H), C is a matrix of size (B,H), and Y is a matrix of size (B,H). Where, B, N, and H are scalar values to indicate the size of these matrices. I was using the website http://www.matrixcalculus.org/ to calculate the answer, although this doesn't help me understand the underlying principles behind it, and secondly it's not entirely clear to me as a beginner.
For example, the website states, $$ \displaystyle{\frac{\partial Y}{\partial A} \left( A\cdot X+C \right) = X^\top \otimes \mathbb{I}}$$ and, $$ \displaystyle{\frac{\partial Y}{\partial C} \left( A\cdot X+C \right) = \mathbb{I}\otimes \mathbb{I}} $$
Also, given that it states the identity matrix $\mathbb{I}$ however, the dimensions of it aren't clear to me if it has a size of (N,N) or (H,H)?
So, for $\frac{\partial Y}{\partial A}$ would be of size (N,B) $\otimes \ \mathbb{I}$?
Ideally, I want to be able to calculate these values and understand how it's done!
Apologises for the poor wording of this question!
Thank you in advance!
The first thing you need to learn is how to vectorize a matrix equation using the Kronecker product $$\eqalign{ \operatorname{vec}(AXB) &= (B^T\otimes A)\operatorname{vec}(X) \\ }$$ Applying this to your example equation yields $$\eqalign{ \operatorname{vec}(Y) &= (X^T\otimes I)\operatorname{vec}(A) + (I^T\otimes I)\operatorname{vec}(C) \\ y &= (X^T\otimes I)\,a + (I\otimes I)\,c \\ }$$ Now your calculations can be performed on ordinary matrix-vector equations, i.e. $$\eqalign{ \frac{\partial y}{\partial a} &= (X^T\otimes I) \\ \frac{\partial y}{\partial c} &= (I\otimes I) \\ }$$ The standard text for this is "Matrix Differential Calculus" by Magnus & Neudecker.