For $\nabla_X Y(X) = \nabla_X U(X)V(X)$, is there a general formula to differentiate it? I guess it is something like $\left(\frac{\partial U}{\partial X}\right)^T V(X) + \frac{\partial V}{\partial X} U(X)^T$ (denominator layout). However, when I was trying to derive it with chain rule, I get:
$\begin{align*} \frac{\partial Y}{\partial X} &= \frac{\partial U}{\partial X} \frac{\partial Y}{\partial U} + \frac{\partial V}{\partial X} \frac{\partial Y}{\partial V} \\ &= \frac{\partial U}{\partial X} V(X) + \frac{\partial V}{\partial X} U(X)^T \end{align*}$
which obvious doesn't even produce correct dimensions for matrix multiplication. What is the proper way to differentiate it (in denominator layout)?
A matrix-by-matrix gradient results in a fourth-order tensor. To handle such calculations within matrix calculus, you must use vectorization. $$\eqalign{ &u={\rm vec}(U),\quad v={\rm vec}(V),\quad x={\rm vec}(X),\quad y={\rm vec}(Y) \cr &{\rm vec}(AYB) = (B^T\otimes A)\,y \cr }$$ Start by calculating the differential of $Y$, then substitute the known gradients of $U$ and $V$. $$\eqalign{ Y &= UV \cr dY &= U\,dV + dU\,V \cr &= U\,dV\,I_v + I_u\,dU\,V \cr dy &= (I_v^T\otimes U)\,dv + (V^T\otimes I_u)\,du \cr \frac{\partial y}{\partial x} &= \Big(I_v\otimes U\Big)\,\frac{\partial v}{\partial x} + \Big(V^T\otimes I_u\Big)\,\frac{\partial u}{\partial x} \cr }$$ where the $I_k$ are identity matrices of appropriate dimension.