I'd appreciate your help in explaining the steps of calculating $\frac {\partial L} {\partial W}$, where
$$ L := \frac 1 2 \| V \sigma \left( Wx \right) - y \|_2^2 $$
and $ x \in \mathbb{R}^{(d \times 1)}, W \in \mathbb{R}^{(k \times d)}, V \in \mathbb{R}^{(m \times k)}, y \in \mathbb{R}^{(m \times 1)} $, for $\sigma \left( u \right) := \max \left( 0, u \right)$ element-wise.
In the scalar case, the derivative of the ReLu function is the Heaviside step function, i.e. $$\frac{d\sigma(x)}{dx} = \theta(x) \quad\implies\quad d\sigma = \theta\,dx$$ When applied element-wise on a vector $(z)$ these functions produce vectors $$\eqalign{ s &= \sigma(z) \quad\quad h &= \theta(z) \\ }$$ whose gradient can be calculated using the elementwise/Hadamard product $$\eqalign{ ds &= h\odot dz = H\,dz \quad\implies\quad \frac{\partial s}{\partial z} &= H \\ }$$ where $\odot$ denotes the Hadamard product and $H={\rm Diag}(h)$
For your specific problem let $\,z = Wx,\;p=(Vs-y)\;$ and $$\eqalign{ L &= \tfrac 12\;p:p \\ dL &= p:dp \\ &= p:V\,ds \\ &= p:VH\,dW\,x \\ &= HV^Tpx^T:dW \\ &= {\rm Diag}\big(\theta(Wx)\big)\,V^T\big(V\sigma(Wx)-y\big)x^T:dW \\ \frac{\partial L}{\partial W} &= {\rm Diag}\big(\theta(Wx)\big)\,V^T\big(V\sigma(Wx)-y\big)x^T \\\\ }$$
In the above, a colon is used to denote the trace/Frobenius product $$\eqalign{ A:B &= {\rm Tr}(A^TB) \\ }$$ which happens to commute with itself and with the Hadamard product $$\eqalign{ A:B &= B:A \\ A:(B\odot C) &= (A\odot B):C \\ }$$ The product rule for differentials $$\eqalign{ d(A\star B) = dA\star B + A\star dB \\ }$$ is quite general since $(\star)$ can denote any product (Kronecker, Frobenius, Hadamard, Dyadic, Tensor, etc) and $(A,B)$ can be any two matrices (or scalars, vectors, tensors) whose dimensions are compatible with the underlying product.
If a product is commutative then this rule is analogous to the one for scalars, e.g. $$\eqalign{ d\big(A\star A\big) \;=\; \big(dA\star A + A\star dA\big) \;=\; 2A\star dA \\ }$$