How to prove $\frac{\partial}{\partial{W}} \operatorname{trace}((Y-XW)(Y-XW)^T)=2X^T(XW-Y)$?

310 Views Asked by At

$\newcommand{\tr}{\operatorname{tr}}$ I want to prove the following expression with simple matrix operations.

\begin{align} & \frac{\partial}{\partial{W}} \tr((Y-XW)(Y-XW)^T)=2X^T(XW-Y) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr((Y-XW)(Y-XW)^T) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr(YY^T - YW^TX^T - XWY^T + XW(XW)^T) \\[10pt] = {} & \frac{\partial}{\partial{W}} \tr( -YW^TX^T - XWY^T + XW(XW)^T) \\[10pt] = {} & -2X^TY +\frac{\partial}{\partial{W}} \tr(XW(XW)^T) \end{align}

Now I need to calculate $ \dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)$.

Can anyone help to calculate this derivate?

Thanks.

=========

based on the suggestion to write the expression in Einstein notation I found:

$\begin{align} &\dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)=\\[10pt] = {} & \dfrac{\partial}{\partial{W}} \tr(XW(XW)^T)\\[10pt] = {} & \dfrac{\partial}{\partial{W}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W^T_{kl}X^T_{li}\\[10pt] = {} & \dfrac{\partial}{\partial{W_{jk}}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W^T_{kl}X^T_{li} + \dfrac{\partial}{\partial{W_{lk}}} \sum_i\sum_j\sum_k\sum_l X_{ij}W_{jk}W_{lk}X^T_{li} \\[10pt] = {} & \sum_i\sum_k\sum_l(W^TX^T)_{ki} X_{ij} + \sum_i\sum_j\sum_k X_{ij}W_{jk} X^T_{li} \\[10pt] = {} & \sum_i\sum_k\sum_l(W^TX^T)_{ki} X_{ij} + \sum_i\sum_j\sum_kX^T_{li} X_{ij}W_{jk} \\[10pt] ={} &2X^TXW \end{align}$

Please let me know if something is wrong with this. Thanks.

3

There are 3 best solutions below

3
On BEST ANSWER

write everything in einstein notation, and the result follows steadily. Or, exploit the great flexibility of the concept of differential: $$ d_{W}\operatorname{tr}((Y-XW)(Y-XW)^T)=\operatorname{tr}(d_{W}[(Y-XW)(Y-XW)^T])= $$ $$ =\operatorname{tr}\left([d_{W}(Y-XW)][(Y-XW)^T]+[(Y-XW)][d_{W}(Y-XW)^T]\right)= $$ $$ =\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T+(Y-XW)(-X\,d_{W}W)^T\right)= $$ $$ =\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T+(-X\,d_{W}W)(Y-XW)^T\right)= $$ $$ =2\operatorname{tr}\left((-X\,d_{W}W)(Y-XW)^T\right)= $$ $$ =2\operatorname{tr}\left((XW-Y)^T(X\,d_{W}W)\right)= $$ $$ =2\operatorname{tr}\left(d_{W}W^T\, X^T(XW-Y)\right) $$ so: $$ d_{W}\operatorname{tr}((Y-XW)(Y-XW)^T)=2\operatorname{tr}\left(d_{W}W^T\, X^T(XW-Y)\right)=\# $$ using index notation: $$ \#=2\sum_i\left[d_{W}W^T\, X^T(XW-Y)\right]_{i,i}= $$ $$ =\sum_i\sum_q\left[d_{W}W^T\right]_{i,q}\left[2X^T(XW-Y)\right]_{q,i}= $$ $$ =\sum_i\sum_q\left[d_{W}W\right]_{q,i}\left[2X^T(XW-Y)\right]_{q,i} $$ so: $$ \frac{d}{dW_{q,i}}\operatorname{tr}((Y-XW)(Y-XW)^T)=\left[2X^T(XW-Y)\right]_{q,i} $$

3
On

I didn't learn about matrix calculus, so I use the notation of matrix calculus in Wikipedia

$$d(tr(XW(XW)^T))=tr(d(XWW^TX^T))$$ $$=tr(d(XW)W^TX^T+XWd(W^TX^T))$$ $$=tr(X(dW)W^TX^T)+tr(XW(dW^T)X^T)$$ $$=tr(X(dW)W^TX^T)+tr((dW)^TX^TXW)$$ $$=tr(W^TX^TX(dW))+tr((X^TXW)^TdW)$$ $$=tr(2(X^TXW)^TdW)$$

$$\therefore \frac{\partial}{\partial W} tr(XW(XW)^T)=2X^TXW$$

0
On

The Frobenius product is a convenient way to denote the trace $\,\,A:BC={\rm tr}(A^TBC)$
Rules for rearranging terms in a Frobenius product follow directly from properties of the trace.

Let $Z=(XW-Y)$, then find the differential and gradient of the function as $$\eqalign{ f &= Z:Z \cr df &= 2Z:dZ = 2Z:X\,dW = 2X^TZ:dW \cr \frac{\partial f}{\partial W} &= 2X^TZ = 2X^T(XW-Y) \cr }$$