While going through some notes on matrices, I stumbled upon the following
$$\nabla_A \mbox{tr}(AB) = B^T$$
where $A$ and $B$ are square matrices of the same size. The trace operator returns a real number. How does the derivative of a scalar result in a matrix?
Let $A,B$ be matrices such that their product $AB$ is a square matrix.
Another way to write the trace is to use the inner product (:) notation, i.e. $${\rm tr}(AB) = A^T:B$$ Because of the cyclic property of the trace, this can also be written as $B^T:A$
So the differential of the function is simply $$d\,{\rm tr}(AB) = A^T:dB + B^T:dA$$ Note that each term on the RHS is an inner product of two matrices, yielding a scalar result.
Depending on which variable you take to be the independent variable, the gradient is either $$\nabla_B\,{\rm tr}(AB) = A^T$$ $$\nabla_A\,{\rm tr}(AB) = B^T$$