If a matrix $Y$ is defined as the product of two matrices $X$ and $W$ ie $Y=X.W$ , how do we compute the gradients $\partial Y/\partial W$ and $\partial Y/\partial X$. The dimensions of matrix $X$ is $m*d$,those of $W$ are $d*h$ and those of $Y$ $m*h$ .
I have a basic knowledge of matrices and their multiplication but have no knowledge of matrix calculus. Any recommendation about books regarding this subject would be appreciated. My current field of study is neural networks and I need to study matrix calculation to perform backpropagation.
I would write it this way: $F(X, W) = X W$, $$DF(X, W)\cdot(H, K) = H W + X K$$ where $DF$ is the differential of $F$. This notation is less ambiguous than partial derivatives. Note that for a continuous bilinear map, one alway has $$DF(X, W)\cdot (H, K) = F(H, W) + F(X, K)$$ One can also use $dF$ or $F^\prime$ instead of $DF$. The partial derivative $\partial F/\partial X$ would be the linear map $H\mapsto H W$.