Gradient of matrices formula with respect to one member?

83 Views Asked by At

For an machine learning assignement I have the following Loss-function:

$$L(D, W) = \frac{1}{2}||DW-X||^2_F$$ Where $D, W$ and $X$ are matrices.

In one part of the assignment I need to calculate the gradient of the Loss-function once with respect to $W$ and once with respect to $D$.

What I have so far, is that one can transform $||A||^2_F$ to $tr(A^TA) = tr(AA^T)$, which will give me $$tr\Big((DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T\Big)$$

I know that you get the gradient by deriving element-wise with respect to the elements of the respective matrix. However, I'm a bit stuck on solving this analytically. I think, the best way would be to further reduce the formula within the trace function above and then calculate the gradient element-wise, but I'm not really certain how to go about that.

Maybe someone can point me in the right direction?

1

There are 1 best solutions below

3
On BEST ANSWER

OK, with the help of Shogun's comment and the linked question I think I have found a solution. Maybe someone can confirm, that I didn't make any mistake (as this is rather unusable territory for me):

Assuming I made the result of the multiplication of the loss function with it's transpose is correct in this form $$(DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T$$ we want go on like this to solve with respect to $W$: $$\nabla_Wtr\Big((DW)(W^TD^T)-DWX^T-XW^TD^T+XX^T\Big) = \\ \nabla_Wtr(DWW^TD^T) - \nabla_Wtr(DWX^T) - \nabla_Wtr(XW^TD^T) + \nabla_Wtr(XX^T)$$

Solving each part (using the methods shown here):

$$\nabla_Wtr(DWW^TD^T) = 2D^TDW$$ $$\nabla_Wtr(DWX^T) = D^TX$$ $$\nabla_Wtr(XW^TD^T) = D^TX$$ $$\nabla_Wtr(XX^T) = 0$$

So altogether we get: $$\nabla_Wtr\Big(L(D, W)\Big) = 2(D^TDW - D^TX)$$

The gradient with respect to $D$ be be solved in similar fashion.