I have the following problem in which I want to compute $\frac{\partial L}{\partial W^T}$, where $L \in \Bbb R$ and $W \in \Bbb R^{d \times m}$. Suppose that $Z = XW$ where $X \in \Bbb R^{m \times d}$ then I have: $$ \frac{\partial L}{\partial W^T} = \frac{\partial L}{\partial Z^T} \frac{\partial Z}{\partial W^T}$$ Now, $\frac{\partial L}{\partial Z^T} \in \Bbb R^{m \times n}$, computing $\frac{\partial Z}{\partial W^T}$ I get a matrix in which entry $(i,j)$ is a matrix given by $\frac{\partial Z}{\partial W_{i,j}} \in \Bbb R^{m \times n}$.
Therefore, $\frac{\partial Z}{\partial W^T}$ is a $(d \times n)$ matrix in which each entry is a matrix of size $(m \times n)$. Pardon me in advance for butchering notation/terminology, but how would the product $\frac{\partial L}{\partial Z^T} \frac{\partial Z}{\partial W^T}$ work ? I know at the end I need to have a matrix and not a tensor, googling online it seems like this is a tensor contraction but I'm having a hard time finding a closed form form $\frac{\partial L}{\partial W^T}$, although I suspect it should be $X^T\frac{\partial L}{\partial Z^T}$.
Thank you.
According to the chain rule, we have $$ \frac{\partial L}{\partial W^T} = \frac{\partial L}{\partial Z} \frac{\partial Z}{\partial W^T} = \frac{\partial L}{\partial Z^T} \frac{\partial Z^T}{\partial W^T} $$ where $Z = X W$ or equivalently $Z^T = W^T X^T$. Let us write it componentwise with Einstein convention, by using $Z_{ab} = X_{ak}W_{kb}$ or $Z_{ba} = W_{ka}X_{bk}$. Here, an orthonormal basis is considered. Thus, \begin{aligned} \frac{\partial L}{\partial W_{ji}} &= \frac{\partial L}{\partial Z_{ba}} \frac{\partial Z_{ba}}{\partial W_{ji}} \\ &= \frac{\partial L}{\partial Z_{ba}} \frac{\partial W_{ka}}{\partial W_{ji}} X_{bk} \\ &= \frac{\partial L}{\partial Z_{ba}} \delta_{kj}\delta_{ai} X_{bk} \\ &= \frac{\partial L}{\partial Z_{bi}} X_{bj} \end{aligned} where $\delta$ is the Kronecker symbol. These are the $ij$ components of $$ \frac{\partial L}{\partial W^T} = \frac{\partial L}{\partial Z^T} X = \left(X^T\frac{\partial L}{\partial Z}\right)^T $$