In the context of Neural Nets, I'm trying to derivate the objective function with respect to the weight-matrices. I'm stuck at the following point:
Let $N$ and $D$ two distinct integers, and $A\in\mathcal M_{N, D}(\mathbb R)$, $B\in\mathcal M_{D, N}(\mathbb R)$ two matrices. We'll also consider $\omega_A$ and $\omega_B$ two column-vectors of size $N$. At some point of the process, I must derivate the following quantity $q$ with respect to $B$:
$$ q = \omega_A^T\cdot A\cdot B\cdot\omega_B $$
I know that $q$ has the dimension of a scalar and that $\frac{\partial q}{\partial B}$ should be some sort of matrix. But I just can't apply the classic formulas of scalar derivation and get $\frac{\partial q}{\partial B}=\omega_A^T\cdot A\cdot\omega_B$ because the dimensions don't match. How should I proceed?
I've thought of rearranging the terms to match dimensions, such as:
$$ \frac{\partial q}{\partial B}=\left(\omega_B\cdot\omega_A^T\cdot A\right)^T $$
but I don't "see" why this should be correct.
Besides, I don't know if this is of relevance here, but the two $\omega$ are "One-Hot" vectors, meaning they constist of one $1$ and $0$s everywhere else. This means $q$ can also be understood as the scalar product between a line of $A$ (given by $\omega_A$) and a column of $B$ (given by $\omega_B$). In this case, only one column of the matrix $\frac{\partial q}{\partial B}$ would be non-zero because $\omega_B\cdot\omega_A^T$ is a "One-Hot" matrix."
If you reshape the matrix $B$ as a large vector of dimension $D\cdot N$, you can consider $q$ as a scalar function of $D\cdot N$ variables, so the gradient of such function will be a vector of dimension $D\cdot N$. In turn, this vector might hopefully be reshaped again as a $D \times N$ matrix that has in the position $j,k$ the partial derivative $\frac{\partial q}{\partial B_{jk}}$.
So let us work by coordinates, always bearing in mind the formula of the matrix product. To avoid cluttering subindices, I am renaming the vectors so the scalar function is: $$q=v^\top \, A\, B\, w$$ or, component-wise: $$q_{11}=\sum_i\sum_j\sum_k v_{1i}^\top \, A_{ij}\, B_{jk}\, w_{k1}$$ Note that I am writing even the trivial subindices, to highlight the matching components in the matrix product formula. Now, take the partial derivative: $$\frac{\partial q}{\partial B_{jk}}=\sum_i v_{1i}^\top \, A_{ij}\, w_{k1}$$ And that's it... but I guess you want a "nice" matricial formula, so let us do some arrangements, looking for the $jk$ component of some matrix: $$\frac{\partial q}{\partial B_{jk}}=\sum_i v_{1i}^\top \, A_{ij}\, w_{k1}=\sum_i A_{ji}^\top \, v_{i1} \, w_{1k}^\top = \left(A^\top\, v\, w^\top \right)_{jk}$$ Therefore, with some abuse of notation, you can write the gradient as: $$\frac{\partial q}{\partial B}=A^\top\, v\, w^\top $$
EDIT: So your intuition is correct and everything that works by matching dimensions is correct... well, not everything :)