$\newcommand{\tr}{\operatorname{tr}}$ I want to prove the following expression with simple matrix operations.
\begin{align} & \frac{\partial}{\partial{W}} \tr((Y-XW)(Y-XW)^T)=2X^T(XW-Y) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr((Y-XW)(Y-XW)^T) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr(YY^T - YW^TX^T - XWY^T + XW(XW)^T) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr( -YW^TX^T - XWY^T + XW(XW)^T) \\[10pt] = {} & -2X^TY +\frac{\partial}{\partial{W}} \tr(XW(XW)^T) \end{align}
Now I need to calculate $ \dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)$.
Can anyone help to calculate this derivate?
Thanks.
=========
based on the suggestion to write the expression in Einstein notation I found:
$\begin{align} &\dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)=\\[10pt] = {} & \dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)\\[10pt] = {} & \dfrac{\partial}{\partial{W}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W^T_{kl}X^T_{li}\\[10pt] = {} & \dfrac{\partial}{\partial{W_{jk}}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W^T_{kl}X^T_{li} + \dfrac{\partial}{\partial{W_{lk}}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W_{lk}X^T_{li} \\[10pt] = {} & \sum_i\sum_k\sum_l(W^TX^T)_{ki} X_{ij} + \sum_i\sum_j\sum_k X_{ij}W_{jk} X^T_{li} \\[10pt] = {} & \sum_i\sum_k\sum_l(W^TX^T)_{ki} X_{ij} + \sum_i\sum_j\sum_kX^T_{li} X_{ij}W_{jk} \\[10pt] ={} &2X^TXW \end{align}$
Please let me know if something is wrong with this. Thanks.
write everything in einstein notation, and the result follows steadily. Or, exploit the great flexibility of the concept of differential: $$ d_{W}\operatorname{tr}((Y-XW)(Y-XW)^T)=\operatorname{tr}(d_{W}[(Y-XW)(Y-XW)^T])= $$ $$ =\operatorname{tr}\left([d_{W}(Y-XW)][(Y-XW)^T]+[(Y-XW)][d_{W}(Y-XW)^T]\right)= $$ $$ =\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T+(Y-XW)(-X\,d_{W}W)^T\right)= $$ $$ =\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T+(-X\,d_{W}W)(Y-XW)^T\right)= $$ $$ =2\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T\right)= $$ $$ =2\operatorname{tr}\left((XW-Y)^T(X\,d_{W}W)\right)= $$ $$ =2\operatorname{tr}\left(d_{W}W^T\, X^T(XW-Y)\right) $$ so: $$ d_{W}\operatorname{tr}((Y-XW)(Y-XW)^T)=2\operatorname{tr}\left(d_{W}W^T\, X^T(XW-Y)\right)=\# $$ using index notation: $$ \#=2\sum_i\left[d_{W}W^T\, X^T(XW-Y)\right]_{i,i}= $$ $$ =\sum_i\sum_q\left[d_{W}W^T\right]_{i,q}\left[2X^T(XW-Y)\right]_{q,i}= $$ $$ =\sum_i\sum_q\left[d_{W}W\right]_{q,i}\left[2X^T(XW-Y)\right]_{q,i} $$ so: $$ \frac{d}{dW_{q,i}}\operatorname{tr}((Y-XW)(Y-XW)^T)=\left[2X^T(XW-Y)\right]_{q,i} $$