Why does SVD solve $\underset{U,V}{\min}\| A - UV^T\|_F^2$

212 Views Asked by At

I read here the following:

You can solve the quadratic problem below through Singular Value Decomposition (SVD) of the matrix $A$.

\begin{align} \underset{U,V}{\min} \| A - UV^T\|_F^2 \end{align}

My question is why? Why exactly does the definition or computation of SVD of the matrix $A$ provide a solution to the problem above?

2

There are 2 best solutions below

0
On

This problem can be reframed as "What is the matrix $B$ with rank $d$ that minimizes the Frobenius norm distance to $A$." The answer to this problem is given by the Eckart-Young-Mirsky theorem as $$ B = U_k\Sigma_kV_k^*, $$ where $A = U\Sigma V$ is the SVD of $A$ and $k$ subscript denotes truncation at the first $k$ columns of $U$ and $V$ and the $k\times k$ principal submatrix of $\Sigma$. This solution is unique iff $\sigma_k>\sigma_{k+1}$. A proof of this result and the very similar result for the 2-norm can be found on this Wikipedia page: https://en.wikipedia.org/wiki/Low-rank_approximation

2
On

The problem is similar to PCA.

Consider the objective function $$f(U,V) = \lVert A - UV^T\rVert_F^2,$$ and let $U$ be fixed with $U^TU = I$, then $V^T = U^TA$ minimizes $f$ by the projection theorem (here we project the columns $a_i$ of $A$ onto the space spanned by the columns of $U$). Substituting this for $V$ in $f$, then:

$f(U,A^TU) = \lVert A - UU^TA\rVert_F^2 = \mbox{Tr}[(A - UU^TA)^T(A - UU^TA)] = \lVert A \rVert_F^2 - \lVert U^TA\rVert_F^2.$

Thus, we formulate the following equivalent problem: $$ \max_U \lVert U^TA\rVert_F^2 = \mbox{Tr}(U^TAA^TU),\qquad\text{subject to } U^TU = I.$$

The Lagrangian of this problem is: $$ L(U,\Lambda) = \mbox{Tr}(U^T AA^T U) + \mbox{Tr}[\Lambda(I - U^TU)], $$ and the maximum is achieved at $U$ whenever: $$ \nabla_U L(U,\Lambda) = 2AA^TU - 2U\Lambda = 2AV - 2U\Lambda = 0, $$ where $\Lambda$ is symmetric (and even diagonal, leading to the decomposition: $$ A = U \Lambda V^T, $$ because $V^TV = I$, and is solved efficiently using the SVD.

EDIT: Improved my answer, because it had many mistakes. If someone finds more, let me know. I'm terrible at matrix calc.