How does one take a derivative of the following:
$$ o_t = \mathbf{W}\cdot h^T $$
So, I need:
$$ \frac{\partial{o_t}}{\partial{\mathbf{W}}} $$
Where $o_t = w_0h_0 + w_1h_1 + \cdots$ in the simplest case, but where $o_t$ could be a vector in the more general case ($\mathbf{W}$ is only ever going to be a vector or 2D matrix, and $h \in \mathbb{R}^m$).
I was under the impression that, in this case, it was just:
$$ (1 \otimes h), o_t \in \mathbb{R} $$ $$ (onesVector \otimes h), o_t,onesVector\in \mathbb{R}^n $$
But, while the sign of the result is correct, the actual amount doesn't match my estimate. $h$ is not dependent on $\mathbf{W}$ and $\mathbf{W}$ is just a matrix of scalar weights.
The derivation of a scalar with respect to a vector has to be a vector. In you case, the $i^\text{th}$ component of the derivative can be found with $$\frac{o_t(\mathbf W+\delta W_i\mathbf u_i)-o_t(\mathbf W)}{\delta W_i} =\frac{1}{\delta W_i} \left((\mathbf W+\delta W_i\mathbf u_i)\cdot h^T-\mathbf W\cdot h^T\right)=\mathbf u_i\cdot h^T=h_i.$$ Therefore, the derivative is simply $h$.