I'm trying to take the gradient of the following function w.r.t A:
$$ f(A) = ||AC_YA^T - C_R||_F^2 $$
I tried the following:
$$ f(A) = trace((AC_YA^T - C_R)^T(AC_YA^T - C_R)) = trace(AC_Y^TA^TAC_YA^T) - trace(A C_Y^T A^T C_R) - trace(C_R^T AC_Y A^T) + trace(C_R^T C_R) $$
Now using identities found on wiki for matrix diff found in the following link:
I have no issues differentiating the second, third, and obviously forth terms w.r.t A.
The only issue is with the first term which is $$ trace(AC_Y^TA^TAC_YA^T) $$ Now by the commute property I can easily rewrite the previous terms as follows:
$$ g(A) = trace(C_Y^T \Psi C_Y \Psi ^T) $$ Where $$\Psi = A^T A$$ Keep in mind that $C_Y$ and $C_R$ are covariance matrices, so they are symmetric.
But how to take the derivative of g(A).
I tried taking the chain rule where, but wasn't sure how to use it. It appeared as if I'll end up with a tensor. I'm definitely sure that the answer should be a matrix though, since the main function F(A) is just a scalar.
Let $M=(AC_YA^T-C_R)$, then write the function in terms of the Frobenius product (:) and take its differential $$\eqalign{ f &= M:M \cr df &= 2\,M:dM \cr &= 2\,M:(dA\,C_YA^T + AC_Y\,dA^T) \cr &= 2\,M:dA\,C_YA^T \,+\, 2\,M:AC_Y\,dA^T \cr &= 2\,M:dA\,C_YA^T \,+\, 2\,M^T:dA\,C_Y^TA^T \cr &= 2\,MAC_Y^T:dA \,+\, 2\,M^TAC_Y:dA \cr &= 2\,(MAC_Y^T + M^TAC_Y):dA \cr }$$ Since $\,df=(\frac{\partial f}{\partial A}):dA$, the derivative must be $$\eqalign{ \frac{\partial f}{\partial A} &= 2\,(MAC_Y^T + M^TAC_Y) \cr &= 2\,\Big((AC_YA^T-C_R)AC_Y^T + (AC_YA^T-C_R)^TAC_Y\Big) \cr &= 2\,\big(AC_YA^TAC_Y^T-C_RAC_Y^T + AC_Y^TA^TAC_Y-C_R^TAC_Y\big) \cr }$$ If you like, you can replace the Frobenius product with the trace, since $\,A\!:\!B={\rm tr}(A^TB)$. However, I prefer the Frobenius product because it has nice algebraic properties.
It is commutative, distributive, transpose-invariant, and differentiable $$\eqalign{ X:Y &= Y:X \cr X:(Y+Z) &= X:Y + X:Z \cr X:Y &= X^T:Y^T \cr d\,(X:Y) &= dX:Y + X:dY \cr }$$ It also has simple mixed product rules for the Kronecker and Hadamard products $$\eqalign{ (A\otimes B):(X\otimes Y) &= (A:X)\otimes(B:Y) \cr (A\circ B):C &= A:(B\circ C) \cr }$$ And useful mixed product rules for the Matrix product $$\eqalign{ AX:Y &= X:A^TY \cr XB:Y &= X:YB^T \cr }$$