I have a rather peculiar contrained quadratic programming problem where I try to fit a left-stochastic matrix and I am not sure how to properly solve it. Consider matrices $S\in\mathbb R^{D\times N}$, $C\in \mathbb R^{D\times K'}$, $G \in \mathbb [0, 1]^{K\times N}$. I am interested in solving the following QP: $$\operatorname{argmin}_{\substack{X\in [0, 1]^{K'\times K}\\\sum_{k'}X_{k'k}=1\text{ for all }k}} \left\Vert S - CXG\right\Vert_F^2$$ where both $K$ and $K'$ might be relatively big (up to $10000$). Ideally, if the system is underdetermined, I'd like to solution of minimal Froebenius norm. But that's an added bonus (I think my system will almost never be underdetermined) that we can gladly ignore if it helps to solve the problem. I tried using standard constrained optimization algorithms like (scipys implementation of) SLSQP which unfortunately lead to poor runtime when $K$ and $K'$ grow (in my case $K=K'=50$ seems to not terminate in a reasonable time) due to the immense number of variables $K'\cdot K$ and additional added constraints.
I tried to approach the problem directly (instead of iteratively) using linear algebra:
If we drop the condition $X_{k'k} \in [0, 1]$ then the problem should be solvable by solving the gradient equation $$(C^\top C) X (GG^\top) = C^\top SG^\top$$ alongside with $$1_{1\times K'}X = 1_{1\times K}$$ which overall should be(?) just a linear system $LXR=B$. Then we obtain a good solution as $X':=L^+BR^+$ where $(-)^+$ denotes the Moore-Penrose-pseudoinverse. I however can't seem to formalize this properly and find sensible $L$, $R$ and $B$, due to the multiplication of $X$ from both sides. I could try to solve $$(C^\top C) X = C^\top SG^\top (GG^\top)^+$$ instead, then the overall system is $$\left(\frac{C^\top C}{1_{1\times K'}}\right) X = \left(\frac{C^\top S G^\top (GG^\top)^+}{1_{1\times K}}\right) =: LX=B.$$ This then is solvable in the same way as $X' := L^+B$, but I am not sure if this initial step is justified and gives me a sensible answer.
All good and well, but even if this works, I don't see a good way of incorporating this boundedness $X_{k'k} \in [0, 1]$ (or $\in \mathbb R_{\ge 0}$ is also enough due to the sum condition) into the linear algebra.
Due to the scale of the problem, I also tried to directly implement a (projected) gradient descent. The gradient $\nabla_X = C^\top(S-CXG)G^\top$ is easily calculated and a sensible "projection" is given by clipping and rescaling the obtained $X-\eta\cdot \nabla_X$. But I get some convergence problems and bouncing around even due to small learning rate. Either I am miscalculating or the "projection" is suboptimal. I however fail to see why this mapping does not work and I am not sure how to find a better one.