Let $S\in\mathbb{R}^{L\times N}$ and $A\in\mathbb{R}^{M\times N}$ be known and arbitrary. I'd like to solve the following system: \begin{align} \min_{X\in\mathbb{R}^{L\times M},Y\in\mathbb{R}^{M\times N}} \frac{1}{2}\|X(A\odot Y)-S\|_F^2, \end{align} where $\odot$ is the Hadamard product.
I've proceeded by defining $Z=X(A\odot Y)-S$ and then rewriting the objective function by $f=\frac{1}{2}Z:Z$, where ($:$) is the Frobenius product. Differentiating, we have \begin{align} \text{d}f&=Z:dZ\\ &=Z:\text{d}[X(A\odot Y)-S]\\ &=Z:[X\text{d}(A\odot Y)+\text{d}X(A\odot Y)-\underbrace{\text{d}S}_{=0}]\\ &=Z:[X(A\odot \text{d}Y+\underbrace{\text{d}A}_{=0}\odot Y)+\text{d}X(A\odot Y)]\\ &=Z:[X(A\odot \text{d}Y)+\text{d}X(A\odot Y)]\\ &=[X(A\odot Y)-S]:[X(A\odot \text{d}Y)+\text{d}X(A\odot Y)]\\ &=[X(A\odot Y)-S]:[X(A\odot \text{d}Y)]+[X(A\odot Y)-S]:[\text{d}X(A\odot Y)]\\ &=[X(A\odot Y)]:[X(A\odot \text{d}Y)]+[X(A\odot Y)]:[\text{d}X(A\odot Y)]\\ &\qquad-S:[X(A\odot \text{d}Y)]-S:[\text{d}X(A\odot Y)] \end{align}
This is as far as I have been able to get. I know that I need to find the gradient of $f$ with respect to $X$ and $Y$ and then try to find a pair $(X^*,Y^*)$ that satisfies the two first-order conditions. I'm also aware that such a pair may not be unique (e.g., for any $c\in\mathbb{R}\not\cap\{0\}$, the pair $(cX^*,c^{-1}Y^*)$ is also a solution to this system), but I'm willing to overlook this for now, as I want to first characterize what a solution set would look like. I've done a search, but haven't been able to figure out whether operations of the form $X(A\odot Y)$ simplify nicely, so I'm unsure how to proceed from here.
@ A.P. , your two equalities are correct. The first one is clear $(X(A\circ Y)-S)(A^T\circ Y^T)=0$ but the second one is less clear. Indeed, the rule for rearranging a Hadamard-Frobenius combination is absolutely not obvious. We can reason as follows:
For every $K\in M_{M,N}$, $\dfrac{\partial f}{\partial Y}:K\rightarrow tr((X(A\circ Y)-S)(A^T\circ K^T)X^T)=0$ , that is $tr(X^T(X(A\circ Y)-S)(A^T\circ K^T))=0$.
Let $U=\{(i,j)|a_{ij}=0\}$; then $(A^T\circ K^T)_{ji}=0$ when $(i,j)\in U$ and, when $K$ varies, the indices $(j,i)$ s.t. $(i,j)\notin U$ are free. It is not difficult to see that $(X^T(X(A\circ Y)-S))_{k,l}=0$ when $(k,l)\notin U$ and is free when $(k,l)\in U$. That is equivalent to $A\circ (X^T(X(A\circ Y)-S))=0$.
Note that if $A$ has no zero entries, then to find $Y$ is equivalent to find $A\circ Y$; then we put $T=A\circ Y$ and the problem reduces to find $\inf_{X,T}||XT-S||^2$. It is equivalent to search the best approximation of $S$ by a matrix with rank $\inf(L,M,N)$; the standard method is to calculate the SVD of $S$. Thus, I am not sure that the gradient method is the best one.
When $A$ has zero entries, then the problem reduces to find $\inf_{X,T}||XT-S||^2$ under the condition $t_{ij}=0$ when $(i,j)\in U$.