Solve least-squares minimization from overdetermined system with orthonormal constraint

1.6k Views Asked by At

I would like to find the rectangular matrix $X \in \mathbb{R}^{n \times k}$ that solves the following minimization problem:

$$ \mathop{\text{minimize }}_{X \in \mathbb{R}^{n \times k}} \left\| A X - B \right\|_F^2 \quad \text{ subject to } X^T X = I_k $$

where $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{m \times k}$ are given. This appears to be a form of the orthogonal Procrustes problem, but I'm getting tripped up in my case when $X$ is not square and $n \gg k$ and $m > n$.

Optimistically, I'm looking for a solution that would involve singular value decomposition of a small $k \times k$ matrix, but I'm not seeing it. I'm especially interested in the case when $$A = \left(\begin{array}{c} D_1 \\ D_2\end{array}\right) \in \mathbb{R}^{2n \times n}$$ and $D_1,D_2$ are rank-sufficient diagonal matrices. This is to say that a solution involving $D_1^{-1}$ and $D_2^{-1}$ would be acceptable. The closest I've come (using "Thin SVD" on $Y$) is:

$$ Y = (A^TA)^{-1}(A^T B) \\ Y = U \Sigma V^T \\ X = UV^T $$

clearly $X^T X = I_k$, but

  1. I haven't convinced myself that this is the minimizer,
  2. this involves inverting a potentially huge $n \times n$ matrix (perhaps unavoidable and not so bad in the stacked diagonal case above where $(A^TA)^{-1} = (D_1^2 + D_2^2)^{-1}$, and
  3. this involves s.v.d. of a large rectangular $n \times k$ matrix.

Is this correct and as good as it gets? Or, is there a more efficient solution?

3

There are 3 best solutions below

2
On

Your proposed solution is not correct. Let's consider the simplest case: $m=n$, $k=1$, and $A$ is invertible. Then our problem is $$\min_{x\in\mathbb R^n} \|Ax-b\|^2\quad\text{s.t.}\quad \|x\|^2=1.$$ The set $\{x:\|x\|^2=1\}$ is the unit sphere, so the transformed set $\{Ax:\|x\|^2=1\}$ is an ellipsoid, and we want to find the point $Ax$ on this ellipsoid closest to $b\in\mathbb R^n$.

Your proposed solution reduces to $y = A^{-1}b$ and $x = y/\|y\|$. Then $Ax = b/\|A^{-1}b\|$, that is, your proposed closest point is obtained by simply scaling $b$ to lie on the ellipsoid. It should be clear that in general this is not the closest point to $b$.

Sorry, I don't have a good answer for how to find the correct solution.

5
On

We have the following optimization problem in tall matrix $\mathrm X \in \mathbb R^{n \times k}$

$$\begin{array}{ll} \text{minimize} & \| \mathrm A \mathrm X - \mathrm B \|_{\text{F}}^2\\ \text{subject to} & \mathrm X^\top \mathrm X = \mathrm I_k\end{array}$$

where tall matrices $\mathrm A \in \mathbb R^{m \times n}$ and $\mathrm B \in \mathbb R^{m \times k}$ are given. Let the Lagrangian be

$$\mathcal L (\mathrm X, \Lambda) := \frac 12 \| \mathrm A \mathrm X - \mathrm B \|_{\text{F}}^2 + \frac 12 \langle \Lambda , \mathrm X^\top \mathrm X - \mathrm I_k \rangle$$

Taking the partial derivatives and finding where they vanish, we obtain two matrix equations

$$\begin{array}{rl} \mathrm A^\top \mathrm A \, \mathrm X + \mathrm X \left(\dfrac{\Lambda + \Lambda^\top}{2}\right) &= \mathrm A^\top \mathrm B\\ \mathrm X^\top \mathrm X &= \mathrm I_k \end{array}$$

Left-multiplying both sides of the first matrix equation by $\mathrm X^\top$ and using $\mathrm X^\top \mathrm X = \mathrm I_k$, we obtain

$$\mathrm X^\top \mathrm A^\top \mathrm A \, \mathrm X + \left(\dfrac{\Lambda + \Lambda^\top}{2}\right) = \mathrm X^\top \mathrm A^\top \mathrm B$$

Using an argument similar to the one used by Peter Schönemann in his 1966 paper, note that the left-hand side of the matrix equation above is the addition of two symmetric matrices. Thus, the right-hand side must also be symmetric, which produces the following linear matrix equation

$$\mathrm X^\top \mathrm A^\top \mathrm B = \mathrm B^\top \mathrm A \, \mathrm X$$

If $\rm X$ were square and orthogonal, then we could use Schönemann's approach to solve the linear matrix equation above. Unfortunately, $\rm X$ is tall and only semi-orthogonal. We have $k^2$ linear equations in $n k$ unknowns. Hence, we have at least $n k - k^2 = k (n-k)$ degrees of freedom.

To summarize, we have $k^2$ linear equations and $k^2$ quadratic equations in the $n k$ entries of $\rm X$

$$\begin{array}{rl} \mathrm X^\top \mathrm A^\top \mathrm B - \mathrm B^\top \mathrm A \, \mathrm X &= \mathrm O_k\\ \mathrm X^\top \mathrm X &= \mathrm I_k\end{array}$$

Unfortunately, it is not obvious to me how to solve these equations.

5
On

Just to add another remark: finding a solution to your problem is equivalent to finding matrices $U_{n\times k}$ and $V_{k\times k}$, and a diagonal matrix $D$, such that $U^TU = I$, $V^TV = I$, and \begin{equation} A^TB = UDV^T + A^TAUV^T, \tag{1} \end{equation} in which case $X = UV^T.$

Interestingly, this is also true when $n=k$. In this case, if $$A^TB = \mathcal{U}\Sigma\mathcal{V}^T$$ is the singular value decomposition of $A^TB$, then $D$ is given by the spectral decomposition $$\Sigma - \mathcal{U}^TA^TA\mathcal{U} = RDR^T,$$ and $U = \mathcal{U}R, V = \mathcal{V}R.$

This solution does not work when $n>k$, but the equation (1) above looks an awful lot like some kind of generalized singular value decomposition, so perhaps there is hope...

As a practical stop-gap I'd be tempted to try fixed-point iteration $$A^TB - A^TAU_i V_i^T = U_{i+1}D_{i+1}V_{i+1}^T.$$