Singular Value Decomposition of a big rectangular matrix

1k Views Asked by At

I have a huge rectangular $m \times n$ matrix (m = 72000, n = 130) and I need to calculate the SVD of this matrix. Obviously the regular SVD will fail. Since it needs LOTS of memory. I have used the econ version of SVD, but I think that the approximation is not what I desired. My teacher told me that by computing the QR decomposition of my matrix and then applying SVD on it, I would get better results. But I don't know how does the theory works. I know QR decomposition and I know SVD, But I don't know how to combine those to get a better result.

4

There are 4 best solutions below

0
On BEST ANSWER

Lets call your big $m\times n$ matrix $A$. Then for the composition $QR=A$, you have a $m\times m$ matrix $Q$ and a $m\times n$ matrix $R$. This seems to be way to big to store, but:

  1. $R$ has the form $\begin{pmatrix}\tilde R \\ 0\end{pmatrix}$ where $\tilde R$ is an (upper triangular) $n\times n$ matrix (which is small in your example).
  2. $Q$ is huge, but does not need to be stored explicitly. $O(nm)$ storage is enough (typically the householder-reflections are stored, and the matrix $Q$ is never explicitly computed)

Then as a second step, you only need to compute the SVD of $\tilde R$, which is rather small.

4
On

This is too long for a comment. I just computed the SVD of a $72000 \times 130$ matrix in 48 seconds on my PC using the Matlab SVD function. Admittedly, my PC has 512 gigabytes of RAM.

>> mtrx = randn(72000,130);

>> whos mtrx
  Name          Size                Bytes  Class     Attributes

  mtrx      72000x130            74880000  double              

>> tic; [u,s,v] = svd(mtrx); toc
Elapsed time is 47.659299 seconds.
>> whos u
  Name          Size                     Bytes  Class     Attributes

  u         72000x72000            41472000000  double 

Edit: Thanks to @mathreadler's comment I realized I should use the "econ" option when calling Matlab's SVD function. With that option, the SVD is computed in .48 seconds on my computer. So it's 100 times faster and uses much less memory.

>> mtrx = randn(72000,130);
>> tic; [u,s,v] = svd(mtrx,'econ'); toc
Elapsed time is 0.483609 seconds.
>> whos u
  Name          Size                Bytes  Class     Attributes

  u         72000x130            74880000  double              

>> whos v
  Name        Size              Bytes  Class     Attributes

  v         130x130            135200  double 
2
On

The econ version of SVD (at least MatLab) will only cut the basis-vectors, that will not be used since they do not lie in the image of the matrix.

However, if you form a (econ) QR-Decomposition, you could use the upper part of $R$ (the non-zero-part) and do a SVD on that. On my amchine, that is considerably faster, than directly using SVD.

0
On

You can use the power method to approximate an eigenvector / eigenvalue and then factor them away one at a time.

Starting with random vectors either 130 long from the right or 72000 from the left:

  1. $v_{n+1} = Av_n$
  2. Check if the variance of pointwise division $v_{n+1} ./ v_{n}$ is smaller than some threshold, if so quit the loop, store the quotient as singular value and (normalized) $v_n$ as singular vector.
  3. Normalize and iterate

If you only need the singular values and not the vectors you can use the svds function in Matlab/Octave.

In GNU Octave on my 4 year old budget machine:

 tic;S = svds(rand(72000,130),130);toc

Gives the 130 singular values and takes 34.5 seconds.