Is basis vectors orthonormality mandatory for SVD?

1.8k Views Asked by At

I have been trying to understand intuitively the Singular Value Decomposition(SVD) and broadly speaking I tend to think of it as of a generalisation of Eigenvalue Decomposition(EVD), i.e., transform a linear map in a simple scaling provided the given changes of basis are done accordingly but this time between spaces with different dimensions, namely, from $\mathbb{R}^n$ to $\mathbb{R}^m$.

Mathematically speaking, for a $n \times m$ matrix $\mathbf A$ we want to find the $v_i$'s and $u_i$'s such that:

$$A v_i=\sigma_i u_i$$

where: $$v_i \in \{v_1, v_2, ..., v_n\}$$ $$u_i \in \{u_1, u_2, ..., u_m\}$$

However, I still do not get it why $\{v_1, v_2, ..., v_n\}$ and $\{u_1, u_2, ..., u_m\}$ must be orthonormal bases, as is assumed by all the resources I have found, which jump right into finding these orthonormal bases by calculating the eigenvectors of $\mathbf A^\top \mathbf A$.

With that said, are orthonormal bases a mandatory requirement for SVD, if so, what would go wrong if you just pick two linearly independent bases not necessarily orthonormal?

2

There are 2 best solutions below

2
On BEST ANSWER

This is a great question. Nothing will go wrong if insist on working with arbitrary (not necessarily orthonormal) bases, it just won't lead to an SVD decomposition but to a different, more trivial but still sometimes useful, decomposition.

Let's assume we start with an $n \times m$ real matrix $A$ and think about it as (defining a) linear map $T_A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$ given by $T_A(v) = Av$. You can ask two different questions:

  1. How can we pick a basis $\alpha = (v_1, \dots, v_m)$ for $\mathbb{R}^m$ and a basis $\beta = (u_1, \dots, u_n)$ for $\mathbb{R}^n$ so that, with respect to those bases, the map $T_A$ will be represented by the simplest possible matrix? It turns out that the answer to this question is quite simple. If the rank of the matrix $A$ is $k$, you can pick $\alpha$ and $\beta$ so that $Av_i = w_i$ for $1 \leq i \leq k$ and $Av_i = 0$ for $k < i \leq m$. With respect to those bases, the map $T_A$ will be represented by a rectangular diagonal matrix with $k$ ones on the diagonal and all other entries zeros. In this case, there are no "singular values" - only ones and zeroes.

    How do you do it in practice? You can always perform a finite number of row and column operations on $A$ until it becomes a rectangular diagonal matrix with $k$ ones on the diagonal and all other entries are zero. Since performing row/column operations corresponds to multiplying the matrix $A$ on the left/right by elementary matrices, this gives you a decomposition of the form $P^{-1}AQ = D$ where $P,Q$ are invertible matrices and $D$ is the "diagonal" matrix. Multiplying by $P$, you get the equation $AQ = PD$. The columns of $Q$ will form the basis $\alpha$ while the columns of $P$ will form the basis $\beta$. Multiplying by $Q^{-1}$, you get the decomposition $A = PDQ^{-1}$ which is similar to the SVD decomposition, only here the matrices $P$ and $Q$ are not necessary orthogonal because we didn't insist on orthonormal bases and the entries on the diagonal of $D$ are only $1$'s and $0$'s.

    Finally, it is instructive to think about the case where $n = m$ and the matrix $A$ has full rank. Then, we can pick any basis $\alpha = (v_1, \dots, v_n)$ for $\mathbb{R}^n$ and take the basis $\beta = (w_1, \dots, w_n)$ to be $w_i := Av_i$ and then trivially $Av_i = w_i$. If we take the basis $\alpha$ to be the standard basis, we get the "trivial" decomposition $A = A \cdot I \cdot A^{-1}$.

  2. How can we pick an orthonormal basis $\alpha = (v_1, \dots, v_m)$ for $\mathbb{R}^m$ and an orthonormal basis $\beta = (w_1, \dots, w_n)$ for $\mathbb{R}^n$ so that, with respect to those bases, the map $T_A$ will be represented by the simplest possible matrix? This is a different question which is answered by the SVD decomposition. It turns out that you can always find bases $\alpha,\beta$ so that $Av_i = \sigma_i w_i$ and if the rank of the matrix is $k$ you can arrange that $\sigma_i = 0$ for $k < i \leq m$ and that $\sigma_i > 0$ for $1 \leq i \leq k$ but now you can't make all the $\sigma_i$'s to be ones! Those will be the singular values of $A$ and they provide interesting geometric information about the linear map $T_A$. For example, the map $T_A$ will map an $m$-dimensional cube of volume one to a $k$-dimensional parallelotope of ($k$-dimensional) volume $\sigma_1 \dots \sigma_k$. If you stack the vectors $v_i$ as columns of a matrix $Q$ and the vectors $w_i$ as columns of a matrix $P$, you get the decomposition $A= P \Sigma Q^{-1} = P \Sigma Q^T$ where now, $P$ and $Q$ are orthogonal matrices and not just invertible matrices and $\Sigma$ is a rectangular diagonal matrix with non-zero entries $\sigma_1, \dots, \sigma_k$ on the diagonal.

    Finally, let us think about the case where $n = m$ and the matrix $A$ has full rank. We can try and take $\alpha = (v_1,\dots,v_n)$ to be the standard basis $v_i = e_i$ which is orthonormal. Then $(Ae_1, \dots, Ae_n)$ will be a basis, but it won't necessarily be an orthogonal one so we can't just take $w_i = Av_i$.

2
On

There is no reason why they have to be orthonormal basis other than that this is the way the SVD is defined! If you drop the requirement of having orthonormal basis vectors in $U,V$ of the decomposition $A = U\Sigma V^t$ we can use pretty much any decomposition you like, as long as $\Sigma$ is diagonal.

What is very important though are the properties we gain by using this. For instance we can immediately find the rank of a matrix $A$ just by counting the nonzero singular values $\sigma_i$. And on that subject it is also very convenient for constructing low rank approximations as well as finding pseudo inverses and solving least squares problems, and the list could go on for quite a while. It is probably one of the most used matrix decompositions and it is not difficult to find examples.