Let $\mathbf{x}, \mathbf{y} \in \mathbb{R}^{n}$ and $\mathbf{A}, \mathbf{B} \in \mathbb{R}^{n\times n}$. I am not able to understand why \begin{equation*} \frac{\partial}{\partial \mathbf{B}}(\mathbf{y} - \mathbf{B}^{\top}\mathbf{x})^{\top}\mathbf{A}(\mathbf{y} - \mathbf{B}^{\top}\mathbf{x}) = 4 \mathbf{x}\mathbf{y}^{\top}\mathbf{A} - 4 \mathbf{x}\mathbf{x}^{\top}\mathbf{B}\mathbf{A}. \end{equation*} Can you help me understand the passages?
Maybe you need that \begin{equation*} (\mathbf{y} - \mathbf{B}^{\top}\mathbf{x})^{\top}\mathbf{A}(\mathbf{y} - \mathbf{B}^{\top}\mathbf{x}) = \text{trace}(\mathbf{A}(\mathbf{y} - \mathbf{B}^{\top}\mathbf{x})(\mathbf{y} - \mathbf{B}^{\top}\mathbf{x})^{\top}). \end{equation*} I tried some calculations with the Einstein notation, but I am not able to find the result. If you also can point me to some theory which explains hwo to compute these kind of quantities I would be thankful.
For convenience, use the following product notation for the trace $$A:B = {\rm Tr}(A^TB)$$ Define the vector $$w = (B^Tx-y) \implies dw^T = x^TdB$$ Write the cost function in terms of the new variable. Then find its differential and gradient. $$\eqalign{ \phi &= w^TAw \cr&= A:ww^T \cr d\phi &= A:(w\,dw^T+dw\,w^T) \cr &= w^T(A+A^T):dw^T \cr &= w^T(A+A^T):x^TdB \cr &= xw^T(A+A^T):dB \cr \frac{\partial\phi}{\partial B} &= xw^T(A+A^T) \cr &= x(x^TB-y^T)(A+A^T) \cr }$$ If $A=A^T$ then this can be simplified $$\eqalign{ \frac{\partial\phi}{\partial B} &= 2x(x^TB-y^T)A \cr &= 2xx^TBA - 2xy^TA \cr }$$ To make this match the expected result, put factor of $(-2)$ in front of the $\phi$.
NB: The cyclic property of the trace allows colon products to be rearranged in many equivalent ways, e.g. $$\eqalign{ A:BC &= BC:A \cr &= A^T:(BC)^T \cr &= B^TA:C \cr &= AC^T:B \cr &= \ldots \cr }$$