The Scalar-to-Matrix Derivative of $\frac 1 2 \| V \sigma \left( Wx \right) - Y \|_F^2$ w.r.t. $V$

52 Views Asked by At

I'd appreciate your help in confirming the following calculation (or pointing out the bugs) of $\frac {\partial L} {\partial V}$, where

$$ L := \frac 1 2 \| V \sigma \left( WX \right) - Y \|_F^2 $$

and $ X \in \mathbb{R}^{(d \times n)}, W \in \mathbb{R}^{(k \times d)}, V \in \mathbb{R}^{(m \times k)}, Y \in \mathbb{R}^{(m \times n)} $, for $\sigma \left( u \right) := \max \left( 0, u \right)$ element-wise.


I've tried to follow the answer in this post, and ended up having the expression below.

$$ \frac {\partial L} {\partial V} = \big( V \sigma(WX)-Y \big) \sigma(X^T W^T) $$

Please confirm that it is mathematically true.


Tnx.