Understanding the proof of the Full SVD from the Economy SVD

561 Views Asked by At

$\textbf{The Economy SVD}$:

Let $r:=rank(A)\leq \min(m,n)$ where $A\in\mathbb{R^{m\times n}}$. We know that the number of singular values of $A$ or the cardinality of the set $\Sigma(A):=\{\sigma_{i}|i=1,2,\cdots,m\}$ is $r$. Considering $AA^{T}$, we know that it is symmetric and thus by the Jordan-Schur's decomposition, we may write: $$ AA^{T}=QDQ^{T} $$ Here $D$ which is a diagonal matrix, houses the eigenvalues of $AA^{T}$ which are $r$ in total since $rank(A)=rank(AA^{T})$. Now this means that if we consider $D_{r}$ which is an $r\times r$ diagonal matrix that removes the $0$'s from the diagonal entries and only contains that $r$ eigenvalues $\Sigma_{r}^{2}$, as well as $Q_{r}$ excluding the zero orthonormal vectors we can write : $$ AA^{T}Q_{r}\Sigma_{r}^{-1}=Q_{r}\Sigma_{r} $$
Now let $U_{r}:=Q_{r}$, $S_{r}:=\Sigma_{r}$, and $V_{r}:=A^{T}Q_{r}\Sigma_{r}^{-1}=A^{T}U_{r}S_{r}^{-1}$, we may write: $$ AV_{r}=U_{r}S_{r} $$ where $U_{r}^{T}U_{r}=I_{r}$ and it can also be shown that $V_{r}^{T}V_{r}=I_{r}$ which lets us to rewrite the equation above as: $$ A=U_{r}S_{r}V_{r}^{T} $$


Request: I hope someone can explain the beginning of the proof as I didn't understand it. The proof below essentially derives the full scale SVD from the economy SVD


$\textbf{Theorem (Full Scale SVD)}$ : Consider the matrix $A\in\mathbb{R}^{m\times n}$ with $\operatorname{rank}(A)=r\leq p=\min(m,n)$. We claim that there exist two orthogonal matrices $U\in\mathbb{R}^{m\times m}$ and $V\in\mathbb{R}^{n\times n}$ and a diagonal matrix $S\in\mathbb{R}^{m\times n}$, such that : $$ AV=US\iff A=USV^{T}\iff S=U^{T}AV $$ where $S$ has the singular values of $A$ on its diagonal $\operatorname{diag}(S):=\begin{bmatrix}\sigma_{1}&\sigma_{2}&\cdots&\sigma_{p}\end{bmatrix}$. Furthermore, $\sigma_{1},\sigma_{2},\cdots,\sigma_{r}>0$ and the remaining $p-r$ singular values $\sigma_{r+1}=\sigma_{r+2}=\cdots=\sigma_{p}=0$. The columns of $U$ and the columns of $V$ are called the left-singular vectors and right-singular vectors of $A$, respectively.


$\textbf{Proof}$. Construct $V:=[V_{r}\;\;V_{n-r}]$ and $U:=[U_{r}\;\;U_{n-r}]$, by adding orthogonal vectors so that $V$ forms a basis for $\mathbb{R}^{n}$ and $U$ forms a basis for $\mathbb{R}^{m}$.

Recall that $Av_i \in \operatorname{Im}(A) = \{y \in \Bbb C^n \mid y = Ax; x \in \Bbb C^n\}$ with $Av_i = \sigma_i u_i$ for $i = 1,\dots,r$ and $\sigma_i > 0$. Moreover, $u_i = \frac 1{\sigma_i } Av_i$ form a basis for $\operatorname{Im}(A)$ for $i = 1,\dots, r$. Thus, $v_{r+1},\dots,v_n \in \ker(A)$ i.e. $AV_{n-r} = 0$, since otherwise $\operatorname{rank}(A) > r$ (proof by contradiction)

The proof then goes on...

1

There are 1 best solutions below

0
On BEST ANSWER

Thus, Thus, $v_{r+1},\dots,v_n \in \ker(A)$ i.e. $AV_{n-r} = 0$, since otherwise $\operatorname{rank}(A) > r$ (proof by contradiction).

Let's fill in the details of this proof. Suppose that for some $k > r$, $Av_k \neq 0$. If $Av_k$ is linearly independent from $Av_1,\dots,Av_r$, then we are forced to conclude that $\operatorname{Im}(A)$ has dimension greater than $r$, which is what is referenced by the parenthetical remark.

However, we still need to rule out the possibility that $Av_k$ is non-zero but the set $\{Av_1,\dots,Av_r,Av_k\}$ is linearly dependent. It is not clear how the author of the proof intended for the reader to deduce this, but here is one approach. We begin by supposing that $Av_k = c_1 Av_1 + \cdots + c_r Av_r$ and then concluding that we much have $c_i = 0$ for all $i = 1,\dots,r$. As the author notes, the vectors $u_i = \frac 1{\sigma_i} Av_i$ form an orthonormal basis for the image of $A$. It follows that for each $i = 1,\dots,r,$ $$ Av_k = c_1 Av_1 + \cdots + c_r Av_r \implies\\ Av_k = c_1 \sigma_1 u_1 + \cdots + c_r \sigma_r u_r \implies \\ \langle Av_k, u_i \rangle = c_i \sigma_i \implies\\ c_i = \frac{\langle Av_k, u_i\rangle}{\sigma_i} = \frac{\langle v_k, A^Tu_i\rangle}{\sigma_i} = \frac{\langle v_k, A^TA v_i\rangle}{\sigma_i^2} = \langle v_k,v_i\rangle. $$ However, the fact that $v_1,\dots,v_n$ are orthonormal implies that $\langle v_k,v_i\rangle = 0$. So indeed: if $Av_k$ is a linear combination of $Av_1,\dots,Av_r$, then it must hold that $Av_k = 0$.