Problem Description
My aim is to find the SVD decomposition of the minimizer of $F:\mathbb{R}^{m\times n}\to\mathbb{R}$ $$ U_*\Sigma_* V^\top_* = A_*\qquad \text{where} \qquad A_* = \arg\min_A F(A) $$ I could, of course, just minimize $F(\cdot)$ and then compute the SVD decomposition. However, I am interested in finding the optimal SVD decomposition directly. Is this possible? For instance, one way to go about it would be this:
Initialize $A\in\mathbb{R}^{m\times n}$ and compute its SVD decomposition $A = U\Sigma V^\top$.
For $i = 1, \ldots, n$ iterations do:
- Keep $\Sigma V^\top$ fixed and optimize $F(\cdot)$ with respect to $U\in\mathcal{V}_r(\mathbb{R}^m) = \{U\in\mathbb{R}^{m\times r}\,:\,U^\top U = I\}$ on the space of orthogonal matrices.
- Keep $U\Sigma$ fixed and optimize $F(\cdot)$ with respect to $V\in\mathcal{V}_r(\mathbb{R}^n) = \{\mathbb{V}\in\mathbb{R}^{n\times r}\,:\, V^\top V = I\}$ on the space of orthogonal matrices.
- Keep $U$ and $V$ fixed and optimize $F(\cdot)$ with respect to $\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r)$
Is it possible to construct a "coordinate-descent"-type algorithm that would do something like this? Of course the algorithm above is very likely to fail because most likely $U$, $\Sigma$ and $V$ won't give rise to a good matrix $A=U\Sigma V^\top$.
Resources
Optimization in the space of orthogonal matrices is generally doable, for instance one can use this paper.
Here is a possible approach.
Since you're looking for a "descent" type algorithm, I'll assume that $F$ is a differentiable function. Let $\Phi(X)$ denote the denominator layout derivative of $F$ (so the $i,j$ entry of $\Phi$ is the derivative of $F$ with respect to the $i,j$ entry of $X$).
The gradient (i.e. denominator layout derivative) of $F$ with respect to $U$ is given by the product $\Phi(U \Sigma V^T) V \Sigma^T$. To update $U$, you could apply the method in the paper you linked.
The gradient of $F$ with respect to $\Sigma$ is given by $VU^T\Phi(U \Sigma V^T)$. Update the diagonal entries of $\Sigma$ using the diagonal entries of this gradient
The gradient of $F$ with respect to $V$ is given by $\Phi^T(U\Sigma V^T)U \Sigma $. Again, use the method in the paper you linked.
The derivative of $F$ with respect to $U$:
I find it easiest to first obtain these derivatives in differential form. With that said, we have $$ \begin{align} F((U + dU) \Sigma V^T) = F(U \Sigma V^T) + \frac{\partial F}{\partial X}(U\Sigma V^T)(dU \,\Sigma V^T) + o(\|dU\|). \end{align} $$ here, $\frac{\partial F}{\partial X}$ is a function that takes an $m \times n$ matrix as its input and produces a linear function from $\Bbb R^{m\times n} \to \Bbb R$. Notably, $\Phi = \Phi(X)$ is defined such that $$ \frac{\partial F}{\partial X}(X)(H) = \operatorname{tr}(\Phi^TH). $$ With that, we can rewrite the above as $$ \begin{align} F((U + dU) \Sigma V^T) &= F(U \Sigma V^T) + \operatorname{tr}(\Phi^T \cdot dU \,\Sigma V^T) + o(\|dU\|) \\ &= F(U \Sigma V^T) + \operatorname{tr}(\Sigma V^T \Phi^T \cdot dU) + o(\|dU\|) \\ & = F(U \Sigma V^T) + \operatorname{tr}((\Phi V \Sigma^T)^T \cdot dU) + o(\|dU\|). \end{align} $$ With that, we can conclude that the denominator layout derivative of $F$ with respect to $U$ is $\Phi V \Sigma^T$.