Relationship between the singular value decomposition (SVD) and the principal component analysis (PCA). A radical result(?)

Question

Relationship between the singular value decomposition (SVD) and the principal component analysis (PCA). A radical result(?)

3k Views Asked by Bumbble Comm At 09 Apr 2026 - 1:35

I was wondering if I could get a mathematical description of the relationship between the singular value decomposition (SVD) and the principal component analysis (PCA).

To be more specific I have some point which I don't understand very well, at least from a mathematical point of view.

What are the principal components (PCs), bearing in mind that we are using the SVD to compute the PCA?
Which part of the SVD becomes the PCs?
What is the relationship between the orthogonal matrices from the SVD plus the diagonal matrix with the scores and loadings of the PCA?
I have read that the principal components can describe big data sets with very few loadings vectors, how is this (mathematically) possible? In which way can these few principal components tell me something about the variance of a big data set and what does the SVD has to do with this process?

I have tried to be very specific making my questions, if something is not so clear I apologize.

Thanks in advance for your help!

PS I have made my homework and look very close to : What is the intuitive relationship between SVD and PCA? but I could not fine the answers I am looking for. I believe mine are more related to the mathematical concepts than the practical ones. Anyways, if you believe this questions is unnecessary (duplicate) I will remove it.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 01 Jun 2014 - 12:21

Referring to the answer of the linked question:

1) The principal components are the column vectors of $U=W$. Their significance and order of significance is given by the singular values, most implementations return the singular values ordered by size.

2) $U=W$.

3) ? and see above, $WDW^⊤=XX^⊤=UΣ^2U^⊤$, so that again $W=U$ and $D=Σ^2$

4) The idea is that the sample points lie in a cloud that is essentially flat or even thin, i.e., flat in multiple directions. The assumption of SVD and PCA is that the center of the cloud is the origin, and you want to find out the directions in which the cloud is extended. Then the SVD $X=UΣV^⊤$ can be used to find a low rank approximation $$ X\approx\sum_{i=1}^r \sigma_iu_iv_i^⊤ $$ with error $\sqrt{\sum_{i=r+1}^m\sigma_i^2}$. If that error is small, the vectors $u_1,...,u_r$ give the dominant pattern(s) for the rows of $X$ and the most visible deviations from that pattern. The last of the $u_i$ with the smallest singular values give vectors that are dominantly orthogonal to the data set, that is, the data set is aligned in the orthogonal complement of the least significant vectors.

**Bumbble Comm** · Accepted Answer

Suppose we have a bunch of large vectors $x_1,\ldots,x_N$ stored as the columns of a matrix $X$. It would be nice if we could somehow find a small number of vectors $u_1,\ldots,u_s$ such that each vector $x_i$ is (to a good approximation) equal to a linear combination of the vectors $u_1,\ldots, u_s$. This would allow us to describe each of the (very large) vectors $x_i$ using just a small number of coefficients.

So we want to find vectors $u_1,\ldots, u_s$ such that for each $x_i$ we have \begin{equation} x_i \approx c_{i,1} u_1 + c_{i,2} u_2 + \cdots + c_{i,s} u_s \end{equation} for some coefficients $c_{i,1},\ldots, c_{i,s}$.

These $N$ equations ($i$ goes from $1$ to $N$) can be combined into one single matrix equation: \begin{equation} X \approx U C \end{equation} for some matrix $C$. (Here the columns of $U$ are $u_1,\ldots, u_s$.)

Note that the rank of $UC$ is less than or equal to $s$. So $UC$ is a low rank approximation of $X$.

Here is the key fact: the SVD gives us an optimal low rank approximation of $X$ ! That is one of the basic facts about the SVD. That's why the SVD can be used for image compression.

If the SVD of $X$ is expressed as \begin{equation} X = \sum_{i=1}^N \sigma_i u_i v_i^T, \end{equation} where $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_N$, then an optimal approximation of $X$ of rank less than or equal to $s$ is \begin{align} X &\approx \sum_{i=1}^s \sigma_i u_i v_i^T \\ &= U \Sigma V^T \\ &= U C \end{align} where $U$ is the matrix with columns $u_1,\ldots, u_s$ and $C = \Sigma V^T$.

Thus, the SVD finds an optimal $U$ for us.

PCA takes as input vectors $x_1,\ldots,x_N$ as well as a small positive integer $s$. PCA demeans the vectors and stores them in the columns of a matrix $X$, then simply computes the SVD $X = U \Sigma V^T$ and returns the first $s$ columns of $U$ as output.

Relationship between the singular value decomposition (SVD) and the principal component analysis (PCA). A radical result(?)

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in MATRICES

Related Questions in STATISTICS

Related Questions in SVD

Related Questions in PRINCIPAL-COMPONENT-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions