Derivative of a vector by a matrix

32 Views Asked by At

I have an equation describing feedforward process for MLP: $$ \textbf{g}_j = \textbf{W}_j \times \textbf{h}_{j-1} + \textbf{b}_j $$ $$ \textbf{h}_j = \sigma_j (\textbf{g}_j) $$ then in order to use gradient descent we need to find next partial derivative for weights of matrix $W_j $ : $$ C = \frac {1}{2} ||h_L - y||^2 $$ $$ \frac{\partial C}{\partial \textbf{W}_j} = \frac{\partial C}{\partial \textbf{g}_j} \times \frac{\partial \textbf{g}_j}{\partial \textbf{W}_j} = \frac{\partial C}{\partial \textbf{g}_j} \times \textbf{h}^T_{j-1} $$ where $ y,h_j,g_j,b_j \in R^n, W_j \in R^{n \times n}, j \in [1,L] $

And I'm struggling to figure out why $h_j$ is transposed and how in general case is differentiation of vector by a matrix accomplished?