My matrix calculus is entirely self-taught (and the only formal univariate calculus that I had was in high school), so I'm proud to have made it as far as I have. But I am stumped. Here is the problem: $$ y = \mathbf{a^T \Psi b}\\ \mathbf{\Psi} = \left[\begin{array}{cccc} \mathbf{B^TB} \\ & \mathbf{B^TB}\\ && \mathbf{B^TB}\\ \end{array} \right] $$ What is $ \partial y/\partial \mathbf{B}??? $
Assume $\mathbf{a}$ and $\mathbf{b}$ are $P \times 1$ such that $y$ is a scalar. $\mathbf{B}$ is $m\times m$.
Some context: I want to find the gradients for the elements of $\mathbf{B}$, rather than the actual covariance matrices that $\mathbf{B}$ comprises, because this result will be used in an unconstrained optimizer, and I want all real values to be admissible.
I know that a general $\partial y/\partial x$ where $y = \mathbf{X^TX}$ is equal to $2\mathbf{X^T}\partial{\mathbf{X}}/\partial x$. If the elements of $\mathbf{X}$ are just constants, then $\partial{\mathbf{X}}/\partial x$ is just $\mathbf{X}$ with a one somewhere, corresponding to the position of $x$ for whom the derivative is being taken.
Does this imply that the answer is simply $$ \partial y/\partial\mathbf{B} = \mathbf{a}^T\left[\begin{array}{cccc} \mathbf{2B^TI} \\ & \mathbf{2B^TI}\\ && \mathbf{2B^TI}\\ \end{array} \right]\mathbf{b} $$ Seems too easy...
The gradient that you seek is $$\frac{\partial y}{\partial B} = \sum_k BG_k\,{\rm tr}(F_k)$$ where the matrices $(F_k,G_k)$ come from the decomposition $$(ab^T+ba^T) = \sum_k F_k\otimes G_k$$ and the matrices $F_k$ are shaped like $I_3$, and $G_k$ are like $B^TB$.
Here's how the gradient was derived.
First let's denote the trace/Frobenius product with a colon, i.e. $$A:B={\rm tr}(A^TB)$$ Write the objective function using the Frobenius product and find its differential. $$\eqalign{ y &= ab^T:(I_3\otimes B^TB) \cr \cr dy &= ab^T:(I_3\otimes dB^T\,B+I_3\otimes B^T\,dB) \cr &= ab^T:(I_3\otimes dB^T\,B)+ab^T:(I_3\otimes B^T\,dB) \cr &= ba^T:(I_3\otimes B^T\,dB)+ab^T:(I_3\otimes B^T\,dB) \cr &= (ab^T+ba^T):(I_3\otimes B^T\,dB) \cr &= \Big(\sum_k F_k\otimes G_k\Big):(I_3\otimes B^T\,dB) \cr &= \Big(\sum_k (F_k:I_3)\,(G_k:B^T\,dB)\Big) \cr &= \sum_k {\rm tr}(F_k)\,BG_k :dB \cr \frac{\partial y}{\partial B} &= \sum_k {\rm tr}(F_k)\,BG_k \cr }$$ For more information on the Kronecker decomposition, search for the classic paper by van Loan & Pitsianis, or for Pitsianis' PhD thesis (which contains Matlab code).
Update
The Pitsianis decomposition can be avoided by using the vec operation.Let $$\eqalign{ &a = {\rm vec}({\cal A}) \quad&\implies\quad &{\cal A} = {\rm unvec}(a) \\ &b = {\rm vec}({\cal B}) \quad&\implies\quad &{\cal B} = {\rm unvec}(b) \\ }$$ and write the objective function as $$\eqalign{ y &= a:(I_3\otimes B^TB)b \\ &= a:{\rm vec}(B^TB\;{\cal B}\;I_3) \\ &= {\cal A}:B^TB\;{\cal B} \\ &= {\cal A}{\cal B}^T : B^TB \\ }$$ Finally, calculate the gradient. $$\eqalign{ dy &= {\cal A}{\cal B}^T : (B^TdB + dB^TB) \\ &= ({\cal A}{\cal B}^T+{\cal B}{\cal A}^T) : B^TdB \\ &= B\,({\cal A}{\cal B}^T+{\cal B}{\cal A}^T) : dB \\ \frac{\partial y}{\partial B} &= B\,({\cal A}{\cal B}^T+{\cal B}{\cal A}^T) \\ }$$