Chain rule for derivative with respect to a matrix

38 Views Asked by At

Suppose $\ell$: $1\times 1$ scalar, $x$: $m \times 1$ vector, and $W$: $n \times p$ matrix. I want to use the chain rule to take the derivative

$$\frac{\partial \ell}{\partial W} = \frac{\partial \ell}{\partial x}\times\frac{\partial x}{\partial W}$$

If I'm using the "Numerator layout" here, then $\frac{\partial \ell}{\partial x}$ is $1 \times m$ and $\frac{\partial \ell}{\partial W}$ is $p \times n$.

Then how can I reconcile the dimensions of each derivative?