Let $$A=U\Sigma V^\top=\sum_i \sigma_i u_i v_i^\top ,$$ be the SVD of a real matrix $A$ of rank $r$. We want to show that the matrix $X_k$ of rank $k < r$ that minimises $\lVert A - X_k\rVert_F$ is $$A^k=\sum_i^k \sigma_i u_i v_i^\top .$$
The proof that can be found on the Wikipedia (also here) is as follows:
Since $||A-X_k||_F = ||U\Sigma V^\intercal - X_k||_F = ||\Sigma - U^\intercal X_k V ||_F$, denoting $N = U^\intercal X_k V$, an $m \times n$ matrix of rank $k$, a direct calculation gives \begin{equation} ||\Sigma-N||_F^2 = \sum_{i,j} |\Sigma_{i,j} - N_{i,j}|^2 = \sum_{i=1}^r |\sigma_i-N_{ii}|^2+\sum_{i>r}|N_{ii}|^2+\sum_{i\neq j} |N_{i,j}|^2 \end{equation} which is minimal when all the non diagonal terms of $N$ equal to zero, and so are all diagonal terms with $i > r$. Obviously, the minimum of the terms left is attained when $N_{ii} = \sigma_i$ for $i = 1,2,\cdots,k$ and all other $N_{ii}$ are zero.
My understanding of this is that geometrically $\lVert A - X_k\rVert_F$ is the sum of the distances between a set of orthogonal vectors which form the columns of $\Sigma$ and another set of vectors which form the columns of N. In addition, we know that $n-r$ vectors in the first set are zero, and $n-k$ vectors in the second set must be linearly dependent.
What I don't see is the second part of the proof, namely that it's "obvious" that $N$ must be chosen to be diagonal, in other words that the columns of $N$ must point in the same directions as the columns of $\Sigma$. Intuitively, it does seem that the optimal $N$ must be diagonal, but it's not obvious to me. I'd appreciate if somebody could clarify this point?
There are three terms on the right hand side, each involving different elements of the $N$ matrix, and each a sum of squares. Since the right hand side is separable, you can minimize each of the three terms separately.
Is it clear to you that
$\min \sum_{i=1}^{r} | \sigma_{i}-N_{i,i} |^2$
is achieved with $N_{i,i}=\sigma_{i}, i=1, 2, \ldots, r$?
Is it clear to you that
$\min \sum_{i>r} | N_{i,i} |^{2}$
is achieved by setting $N_{i,i}=0, i=r+1, \ldots$?
Is it clear to you that
$\min \sum_{i \neq j} | N_{i,j} |^{2}$
is achieved by setting $N_{i,j}=0, i \neq j$?