Incomplete proof of Eckart-Young theorem

1.3k Views Asked by At

Let $$A=U\Sigma V^\top=\sum_i \sigma_i u_i v_i^\top ,$$ be the SVD of a real matrix $A$ of rank $r$. We want to show that the matrix $X_k$ of rank $k < r$ that minimises $\lVert A - X_k\rVert_F$ is $$A^k=\sum_i^k \sigma_i u_i v_i^\top .$$

The proof that can be found on the Wikipedia (also here) is as follows:

Since $||A-X_k||_F = ||U\Sigma V^\intercal - X_k||_F = ||\Sigma - U^\intercal X_k V ||_F$, denoting $N = U^\intercal X_k V$, an $m \times n$ matrix of rank $k$, a direct calculation gives \begin{equation} ||\Sigma-N||_F^2 = \sum_{i,j} |\Sigma_{i,j} - N_{i,j}|^2 = \sum_{i=1}^r |\sigma_i-N_{ii}|^2+\sum_{i>r}|N_{ii}|^2+\sum_{i\neq j} |N_{i,j}|^2 \end{equation} which is minimal when all the non diagonal terms of $N$ equal to zero, and so are all diagonal terms with $i > r$. Obviously, the minimum of the terms left is attained when $N_{ii} = \sigma_i$ for $i = 1,2,\cdots,k$ and all other $N_{ii}$ are zero.

My understanding of this is that geometrically $\lVert A - X_k\rVert_F$ is the sum of the distances between a set of orthogonal vectors which form the columns of $\Sigma$ and another set of vectors which form the columns of N. In addition, we know that $n-r$ vectors in the first set are zero, and $n-k$ vectors in the second set must be linearly dependent.

What I don't see is the second part of the proof, namely that it's "obvious" that $N$ must be chosen to be diagonal, in other words that the columns of $N$ must point in the same directions as the columns of $\Sigma$. Intuitively, it does seem that the optimal $N$ must be diagonal, but it's not obvious to me. I'd appreciate if somebody could clarify this point?

1

There are 1 best solutions below

3
On BEST ANSWER

There are three terms on the right hand side, each involving different elements of the $N$ matrix, and each a sum of squares. Since the right hand side is separable, you can minimize each of the three terms separately.

Is it clear to you that

$\min \sum_{i=1}^{r} | \sigma_{i}-N_{i,i} |^2$

is achieved with $N_{i,i}=\sigma_{i}, i=1, 2, \ldots, r$?

Is it clear to you that

$\min \sum_{i>r} | N_{i,i} |^{2}$

is achieved by setting $N_{i,i}=0, i=r+1, \ldots$?

Is it clear to you that

$\min \sum_{i \neq j} | N_{i,j} |^{2}$

is achieved by setting $N_{i,j}=0, i \neq j$?