Advantage of singular value decomposition of $X$ over eigenvalue analysis of $XX^T$

263 Views Asked by At

Suppose I have a real $n \times N$ matrix $X$ where $n$ is a small number like $<20$ and $N$ is a big number like $>100000$. For example $X$ might be a time series of $n$ measurment quantities on a physical system, given for sample size $N$.

Then I might do a singular value decomposition on the data $$X=U\Sigma V^T$$ where $\Sigma$ is diagonal and positive and the bases $U$ and $V$ orthonormal.

Or, alternatively, I might do an eigenvalue analysis on the covariance matrix (not normalized to the sample size $N$ here, for the sake of simplicity): $$XX^T=U\Lambda U^T$$ where $\Lambda$ is diagonal and positive semi-definite and $U$ is orthonormal (orthogonal as a matrix). If $X$ is maximum rank ($=n$), we simply have $$\Lambda=\Sigma^2$$ so both methods are basically equivalent. The only thing I miss with the eigenvalue analysis is the matrix $V$ on the "sample space", whereas I get all the necessary information in the "physical space" $U$.

To my understanding of the sources, one should avoid the eigenvalue analysis for two reasons:

  1. as the square of $X$ has also squared condition number, the solution will be more sensitive to numerical round-off
  2. first forming $XX^T$ requires more operations (I don't really understand this, see below)

While I just accept 1), I don't understand how an eigenvalue analysis on, say for my example figures above, a $(20\times 20)$ matrix (which has resulted from 400 inner products of $100000$ components) can be more expensive a computation than a singular value decomposition on a $(20\times 100000)$ matrix. I imagine that eigenvalue decomposition and singular value decomposition are each very expensive operations (if I understand Wikipedia correctly, they increase with the third power of the greater dimension), also because logical branches likely happen, causing cache misses on a computer. On the contrary, the inner products involved in $XX^T$ are trivial sequential operations.

So the benefit of the singular value decomposition not requiring the formation of the matrix product $XX^T$, should be far outweighed by the fact, that a $(20\times 100000)$ matrix is much bigger than a $(20\times 20)$ matrix, at least in my intuitive feel.

Is my intuition wrong for the example above? How could the advantage of SVD be justified intuitively?

What information would I draw from the matrix $V$ that is dropped for eigenvalue analysis in my case of a time series $X$? Is it a rotation in ... time (whatever that means)?