I'm trying to read the proof for existence of the Singular Value Decomposition from the Eckart-Young (1936) paper.
In page 215, the authors mention "if $u$ is any orthogonal matrix and the independent variables that determine it are given any infinitesimal increments, the resulting increment of $u$ is, $$\delta u = u s$$
where $s$ a skew-symmetric matrix whose elements are infinitesimal, but otherwise arbitrary."
I'm unable to understand this statement and find the reference mentioned in the paper. The closest explanation I've been able to determine is from Wikipedia which is still confusing to me.
Is there a simpler explanation for this?
I had looked at this paper a few years back, and here's the interpretation of the proof that I have in my notes. Perhaps this will help.
We begin with a lemma:
Proof of Lemma: $\implies$ is straightforward.
For $\Longleftarrow$, it helps to note that if $AB^T$ and $B^TA$ are symmetric, then the symmetric block-matrices $$ \hat A = \pmatrix{0&A\\A^T&0}, \quad \hat B = \pmatrix{0&B\\B^T&0} $$ commute. We can then use an orthogonal matrix that simultaneously diagonalizes $\hat A$ and $\hat B$ to construct the required $U$ and $V$.
Now, let $\tilde B$ be the minimizer, i.e. suppose that $\tilde B$ is the $B$ that minimizes $\|A - B\|$. Let $\tilde B = \tilde U \tilde \Sigma \tilde V^T$ be its singular value decomposition. Consider any skew symmetric matrix $S$ (i.e. $S^T = -S$). Let $U:[-1,1] \to O(m)$ be such that $U(0) = \tilde U$ and $U'(0) = S \tilde U$. Denote $B(t) = U(t) \tilde \Sigma \tilde V$.
Because $\tilde B$ is a minimizer, we should find that \begin{align*} 0 & = \left. \frac{d}{dt}\|A - B(t)\|^2 \right|_{t = 0} = \left. \frac{d}{dt}\operatorname{tr}[(A - B(t))^T(A - B(t))] \right|_{t = 0} \\ & = -2\operatorname{tr}(A^T U'(0) \Sigma V^T) = -\operatorname{tr}(A^T S\tilde B) = - \operatorname{tr}([\tilde BA^T]S) \end{align*} Thus, $\tilde BA^T$ is in the orthogonal complement of the skew symmetric matrices, so it is symmetric.
By a similar argument (using instead a $V(t)$), we find that $\tilde B^TA$ must be symmetric. So, we may apply the lemma to find $U$ and $V$ such that both $UAV^T$ and $U\tilde BV^T$ are diagonal.
Having reduced to this case, we find that the nearest diagonal matrix $\tilde \Sigma$ of rank $r$ to the diagonal matrix $\Sigma = U^TAV$ is clearly found by setting $\tilde \Sigma_{ii} = \Sigma_{ii}$ for $1 \leq i \leq r$, which is precisely the desired result. $\square$
A proof via Weyl's inequalities:
Setting $X = A-B, Y = B, j = r+1$, we find that $$ \sigma_{i+r}(A) \leq \sigma _{i}(A - B) + \sigma_{r+1}(B) $$ However, since $B$ has rank $r$, we may state that $\sigma_{r+1}(B) = 0$, so that $\sigma_{i}(A - B) \geq \sigma_{i+r}(A)$. Thus, we have \begin{align*} \|A - B\|^2 &= \sum_{i=1}^n \sigma_i^2(A - B) \geq \sum_{i=1}^{n-r} \sigma_i^2(A - B) \\ & \geq \sum_{i=r+1}^{n} \sigma_i^2(A) \end{align*}
And equality is attained when we set $B = \tilde B$. Thus, $\tilde B$ is indeed a minimizer. $\square$