I'm fitting a model to some data, and am trying to take the following derivative:
$$\frac{\partial}{\partial V}\|U \phi(VX)-Y\|_F^2$$
where $\phi$ is a differentiable function applied entry-wise.
From the matrix cookbook, I've (doubtfully) gotten to $\operatorname{Tr}((2U^T(U \phi(VX)-Y))^T\frac{\partial}{\partial V}\phi(VX))$, but am unsure how to proceed further.
Define the matrices $$\eqalign{ Z &= VX,\quad F=\phi(Z),\quad G=\phi'(Z) \\ }$$ where $\phi'$ is the derivative of $\phi$ and is also applied element-wise.
Use these to calculate the gradient of the cost function. $$\eqalign{ {\cal L} &= \|UF-Y\|^2 \\&= (UF-Y):(UF-Y) \\ d{\cal L} &= 2(UF-Y):U\,dF \\ &= 2U^T(UF-Y):dF \\ &= 2U^T(UF-Y):G\odot dZ \\ &= 2G\odot(U^TUF-U^TY):dZ \\ &= 2G\odot(U^TUF-U^TY):dV\,X \\ &= 2\Big(G\odot(U^TUF-U^TY)\Big)X^T:dV \\ \frac{\partial{\cal L}}{\partial V} &= 2\Big(G\odot(U^TUF-U^TY)\Big)X^T \\ }$$ where the symbol $(\odot)$ denotes the elementwise/Hadamard product, and the symbol $(:)$ represents the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ The Hadamard and Frobeius products commute with themselves and each other. $$\eqalign{ A\odot B &= B\odot A \\ A : B &= B : A \\ (A\odot B):C &= A:(B\odot C) \\ }$$ Further the cyclic property of the trace allows terms in its product to be rearranged, e.g. $$\eqalign{ A:BC &= AC^T:B \;=\; CA^T:B^T \;=\; etc \\ }$$