How could one solve the following least-squares problem with Frobenius Norm and diagonal matrix constraint?
$$\hat{S} := \arg \min_{S} \left\| Y - XUSV^T \right\|_{F}^{2}$$
where the $S$ is a diagonal matrix and $U,V$ are column-orthogonal matrix. Is there any fast algorithm?
The Problem
Stating the problem in more general form:
$$ \arg \min_{S} f \left( S \right) = \arg \min_{S} \frac{1}{2} \left\| A S {B}^{T} - C \right\|_{F}^{2} $$
The derivative is given by:
$$ \frac{d}{d S} \frac{1}{2} \left\| A S {B}^{T} - C \right\|_{F}^{2} = A^{T} \left( A S {B}^{T} - C \right) B $$
Solution to General Form
The derivative vanishes at:
$$ \hat{S} = \left( {A}^{T} A \right)^{-1} {A}^{T} C B \left( {B}^{T} B \right)^{-1} $$
Solution with Diagonal Matrix
The set of diagonal matrices $ \mathcal{D} = \left\{ D \in \mathbb{R}^{m \times n} \mid D = \operatorname{diag} \left( D \right) \right\} $ is a convex set (Easy to prove by definition as any linear combination of diagonal matrices is diagonal).
Moreover, the projection of a given matrix $ Y \in \mathbb{R}^{m \times n} $ is easy:
$$ X = \operatorname{Proj}_{\mathcal{D}} \left( Y \right) = \operatorname{diag} \left( Y \right) $$
Namely, just zeroing all off diagonal elements of $ Y $.
Hence one could solve the above problem by Project Gradient Descent by projecting the solution of the iteration onto the set of diagonal matrices.
The Algorithms will be:
$$ \begin{align*} {S}^{k + 1} & = {S}^{k} - \alpha A^{T} \left( A {S}^{k} {B}^{T} - C \right) B \\ {S}^{k + 2} & = \operatorname{Proj}_{\mathcal{D}} \left( {S}^{k + 1} \right)\\ \end{align*} $$
The code:
Solution with Diagonal Structure
The problem can be written as:
$$ \arg \min_{s} f \left( s \right) = \arg \min_{s} \frac{1}{2} \left\| A \operatorname{diag} \left( s \right) {B}^{T} - C \right\|_{F}^{2} = \arg \min_{s} \frac{1}{2} \left\| \sum_{i} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right\|_{F}^{2} $$
Where $ {a}_{i} $ and $ {b}_{i} $ are the $ i $ -th column of $ A $ and $ B $ respectively. The term $ {s}_{i} $ is the $ i $ -th element of the vector $ s $.
The derivative is given by:
$$ \frac{d}{d {s}_{j}} f \left( s \right) = {a}_{j}^{T} \left( \sum_{i} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right) {b}_{j} $$
Note to Readers: If you know how vectorize this structure, namely write the derivative where the output is a vector of the same size as $ s $ please add it.
By vanishing it or using Gradient Descent one could find the optimal solution.
The code:
Remark
The direct solution can be achieved by:
$$ {s}_{j} = \frac{ {a}_{j}^{T} C {b}_{j} - {a}_{j}^{T} \left( \sum_{i \neq j} {s}_{i} {a}_{i} {b}_{i}^{T} - C \right) {b}_{j} }{ { \left\| {a}_{j} \right\| }_{2}^{2} { \left\| {b}_{j} \right\| }_{2}^{2} } $$
Summary
Both methods works and converge to the optimal value (Validated against CVX) as the problem above are Convex.
The full MATLAB code with CVX validation is available in my StackExchnage Mathematics Q2421545 GitHub Repository.