I recently started studying matrix calculus and in some notes on matrix factorization, I came across this:
Click here to see the screenshot
(Properties (9) and (12) from the screenshot are the properties 101 and 102 from the Matrix Cookbook (see link below), but I don't think they are relevant with the part I'm stuck in )
I Can't understand why the derivative of
$tr[WHH^TW^T]$ is $2WHH^T$
I am working with the theory and properties from The Matrix Cookbook and I can't figure it out.
The properties listed on page 13 of the cookbook seem to be the suitable ones but they don't return the wanted result, so I am probably missing something important.
Thanks a lot to everyone who took time to read it,
Have a nice day and stay safe!
$\def\p#1#2{\frac{\partial #1}{\partial #2}}$It is convenient to replace the trace function with a product notation, i.e. $$\eqalign{ A:B &= {\rm Tr}(AB^T) \;=\; \sum_{i=1}^m \sum_{i=j}^n A_{ij} B_{ij} \\ A:A &= \big\|A\big\|^2_F \\ }$$ When $(A,B)$ are vectors the colon product corresponds to the ordinary dot product.
The properties of the underlying trace allow the terms in such a product to be rearranged in a number of equivalent ways, e.g. $$\eqalign{ A:B &= B:A = B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ ${\bf NB}\!:\;$ The matrix on each side of the colon has the same dimensions.
This product simplifies the calculation of gradients.
For example, the gradient of a linear function is $$\eqalign{ \lambda &= {\rm Tr}(LY^T) \\&= L:Y \\ d\lambda &= L:dY \\ \p{\lambda}{Y} &= L \\ }$$ while that of a quadratic function is $$\eqalign{ \phi &= {\rm Tr}(YQY^T) \\&= YQ:Y \\ d\phi &= dY\,Q:Y + YQ:dY \\ &= YQ^T:dY + YQ:dY \\ &= Y\left(Q+Q^T\right):dY \\ \p{\phi}{Y} &= Y\left(Q+Q^T\right) \\\\ }$$ To apply this to the function in question, set $\,Q=HH^T,\,Y=W\,$ to obtain $$\eqalign{ d\left(WHH^T:W\right) &= 2WHH^T:dW \\ \p{\left(WHH^T:W\right)}{W} &= 2WHH^T \\ }$$ For the function in the linked image, setting $\,Q=I,\,Y=(WH-X)\,$ yields $$\eqalign{ d\left(Y:Y\right) &= (2I)Y:dY = 2Y:dW\,H = 2YH^T:dW \\ \p{(Y:Y)}{W} &= 2YH^T \;=\; 2(WH-X)H^T \\ }$$