I would like to derive the derivative of a vector by matrix, i.e. $y = W^Tx$, where $W$ is a matrix, $x,y$ are vectors. What is $\frac{\partial y}{\partial W} = \frac{\partial W^T x}{\partial W}$?
Follow-up: Define another function $z = a^T y = a^T W^Tx$, so that $z$ is a scalar. We know that $\frac{\partial z}{\partial W}$ is a matrix with the same of $W$. At the same time, by chain rule
\begin{equation} \frac{\partial z}{\partial W} = \frac{\partial z}{\partial y} \cdot \frac{\partial y}{\partial W} \end{equation} where $\frac{\partial z}{\partial y}$ is a $1\times N$ vector ($N$ is the dimension of $y$). So it seems the dimensions of matrices in left and right hand side don't match. Any explanations?
Using indices we have $y_i =\sum_j (W^T)_{ij} x_j=\sum_j W_{ji}x_j$. So using kronecker delta: $$ \frac{\partial y_i}{\partial W_{kl}} = \sum_j \delta_{ji,kl} x_j = \delta_{il} x_k.$$
For your second part with $z= a^Ty=a^T W^T x=\sum_{kl} a_l W_{kl} x_k$:
$$ a_l x_k = \frac{\partial z}{\partial W_{kl}} = \sum_i \frac{\partial z}{\partial y_i} \frac{\partial y_i}{\partial W_{kl}} = \sum_i a_i \delta_{il} x_k = a_l x_k .$$
Hope it helps.