Gradient of least-squares loss function

1k Views Asked by At

Given $m \times n$ matrix $\bf X$ and $m \times p$ matrix $\bf Y$, define the loss function in $n \times p$ matrix $\bf R$

$$ \operatorname{Loss} ({\bf R}) := \| \mathbf{X} \mathbf{R} - \mathbf{Y} \|_F^2 $$

where the square of the Frobenius norm of $n \times m$ matrix $\mathbf{A}$ is defined as

$$ \| \mathbf{A} \|_F^2 = \sum_{i=1}^n \sum_{j=1}^m a_{ij}^2 $$

I have to compute the gradient $\nabla_{{\bf R}} \operatorname{Loss}$. My source says:

$$ \nabla_{{\bf R}} \operatorname{Loss} ({\bf R}) = \dfrac2m \mathbf{X}^T (\mathbf{X} \mathbf{R} - \mathbf{Y} ) $$

but I am not sure how to obtain this result.

1

There are 1 best solutions below

3
On BEST ANSWER

Use a colon to denote the trace/Frobenius product $$\eqalign{ A:B \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; {\rm Tr}(A^TB) \;=\; {\rm Tr}(AB^T) \\ }$$ then the derivative/gradient is fairly simple to calculate $$\eqalign{ W &= XR-Y \\ dW &= X\,dR\\ Loss &= \big\|W\big\|^2_F \,\;=\; W:W \\ d\,Loss &= 2W:dW \;=\; 2W:X\,dR \;=\; 2X^TW\color{red}{:dR} \\ \frac{\partial\,Loss}{\color{red}{\partial R}} &= 2X^TW \;=\; 2X^T(XR-Y) \\ }$$ So the answer from your source is off by a factor of $\left(\frac 1m\right)$