When taking the $\frac{\partial{\hat{y}}}{\partial{\mathbf{V}}}$ where $\hat{y}$ is a scalar and $\mathbf{V} \in \mathbb{R}^{n \times n}$, how do I calculate the gradient where:
$$ \begin{align} \hat{y} &= \mathbf{W}\mathbf{h_t^T}; \mathbf{W,h_t} \in \mathbb{R}^{n}\\ \mathbf{h_t} &= \mathscr{O} \odot \theta; \mathscr{O},\theta \in \mathbb{R}^{n}\\ \mathscr{O} &= \operatorname{\sigma}(\mathbf{V}\mathbf{h_{t-1}^T}); \end{align} $$
Where $\sigma$ is just the vector version of the expit/sigmoid function.