For an machine learning assignement I have the following Loss-function:
$$L(D, W) = \frac{1}{2}||DW-X||^2_F$$ Where $D, W$ and $X$ are matrices.
In one part of the assignment I need to calculate the gradient of the Loss-function once with respect to $W$ and once with respect to $D$.
What I have so far, is that one can transform $||A||^2_F$ to $tr(A^TA) = tr(AA^T)$, which will give me $$tr\Big((DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T\Big)$$
I know that you get the gradient by deriving element-wise with respect to the elements of the respective matrix. However, I'm a bit stuck on solving this analytically. I think, the best way would be to further reduce the formula within the trace function above and then calculate the gradient element-wise, but I'm not really certain how to go about that.
Maybe someone can point me in the right direction?
OK, with the help of Shogun's comment and the linked question I think I have found a solution. Maybe someone can confirm, that I didn't make any mistake (as this is rather unusable territory for me):
Assuming I made the result of the multiplication of the loss function with it's transpose is correct in this form $$(DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T$$ we want go on like this to solve with respect to $W$: $$\nabla_Wtr\Big((DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T\Big) = \\ \nabla_Wtr(DWW^TD^T) - \nabla_Wtr(DWX^T) - \nabla_Wtr(XW^TD^T) + \nabla_Wtr(XX^T)$$
Solving each part (using the methods shown here):
$$\nabla_Wtr(DWW^TD^T) = 2D^TDW$$ $$\nabla_Wtr(DWX^T) = D^TX$$ $$\nabla_Wtr(XW^TD^T) = D^TX$$ $$\nabla_Wtr(XX^T) = 0$$
So altogether we get: $$\nabla_Wtr\Big(L(D, W)\Big) = 2(D^TDW - D^TX)$$
The gradient with respect to $D$ be be solved in similar fashion.