SVD says, that we can represent an arbitrary matrix $M$ of size $m \times n$ in the following form:
$$ M = U \Sigma V^\dagger $$
Now, I've encountered two verions of this theorem (with different sizes of matrices):
1)
- $U$ is an $m \times m$ unitary matrix,
- $\Sigma$ is a diagonal $m \times n$ matrix with non-negative real numbers on the diagonal,
- $V$ is an $n \times n$ unitary matrix, and $V^\dagger$ is the conjugate transpose of $V$.
2)
- U is a $m \times p$ matrix, such that $U^\dagger U = \mathbf{I}$ ($U$ consists of $p$ orthonormal columns, which can be read as a set of orthonormal vectors),
- $\Sigma$ is a diagonal $p \times p$ matrix, where $p = min(m,n)$ (the smaller of the two original dimensions),
- $V^\dagger$ is a $p \times n$ matrix, such that $V^\dagger V = \mathbf{I}$ ($V^\dagger$ consists of $p$ orthonormal rows, which again can be read as a set of orthonormal vectors).
In the first case everythig is fine, because we just need these sizes of matrices to get the size of the original one. My question is about the second version - where does this $p = min(m,n)$ rule come from? The only thing that came to my mind, is that singular values are connected to the eigenvalues in the following way:
$$ \sigma_i^2 = \lambda_i, $$
where $\sigma_i$ are singular values and $\lambda_i$ eigenvalues. Because of this, by removing some of the singular values, we would also discard some of the eigenvalues of the original matrix? If so, is the number of eigenvalues for a rectangular matrix (let's denote it as $N_\lambda$) $1 \leq N_\lambda \leq min(m,n)$ (I know, that for a square matrix of size $n \times n$ there is $1 \leq N_\lambda \leq n$)? Can it be seen as obtaining smaller ($min(m, n)$) number of eigenvectors (of length $max(m,n)$), but not the other way ($max(m,n)$ number of shorter ($min(m,n)$) vectors)? I would appreciate some clarification on this topic.
You can view the SVD like the following
$$ A = U \Sigma V^{T} = \sum_{i=1}^{r} \sigma_{i} u_{i}v_{i}^{T} $$
the SVD is the sum of $r$ rank one matrices with $\sigma_{i}$ being the weight. Note that
$$ \Sigma_{i} = \textrm{diag}(0,\cdots, 0, \sigma_{i}, 0,\cdots ,0) $$
If the matrix is $m \times n$ and has full rank then $\textrm{Rank}(A) = p = \min(m,n)$. But often it isn't. It has rank $r$ which is less than $p$ . I.e you could write
$$ A = U \Sigma V^{T} = \sum_{i=1}^{m} \sigma_{i} u_{i} v_{i}^{T} $$
you'd note that $\sigma_{i} = 0 , i > r$
Also, the matrix $U$ is called the left singular vectors which are the eigenvectors of $AA^{T}$ and the matrix $V$ is the right singular vectors. Which are the eigenvectors of $A^{T}A$. $A^{T}A$ has the same eigenvalues as $AA^{T}$ whose square roots are the singular values.