For PCA using the eigenvectors of the covariance matrix, what is the meaning of the eigenvalues?

612 Views Asked by At

When doing a PCA using the largest eigenvectors associated with the largest eigenvalues, what does the values of the eigenvalues means?

Example:

The 2 largest eigenvectors of my dataset are these:

1 - [ 6.62257875e-01 -1.63390189e-01  7.31243512e-01 -1.13386505e-04 -9.65364160e-05  1.02781966e-03]

2 - [ 3.31219165e-01 -8.11563370e-01 -4.81309165e-01  4.26282496e-04 3.70709031e-05  2.55801611e-04]

How can I associate these values with the large dispersion of the data in a plot?

3

There are 3 best solutions below

1
On BEST ANSWER

I assume that your data-matrix $X$ is $n\times 6$, so that each row represents a single data-point. I assume that the eigenvalues/eigenvectors that you are referring to are those of the matrix $X^TX$.

For $i=1,2$, let $\lambda_i,v_i$ denote the eigenvalue/eigenvector pairs with $\lambda_1 \geq \lambda_2$; note that your vectors are unit vectors. For each $i$, $\lambda_i$ is the variance of the $v_i$ component of the data points. That is, $\lambda_i$ is the variance of the dot-product $x \cdot v_i$ (among the rows $x$ of $X$). That is, if $\bar x$ denotes the average of the rows (the centroid of the data points), $\bar x \pm \sqrt{\lambda_i} v_i$ gives a standard error bar of the $v_i$ component.

3
On

I wrote a blog post that is more detailed than my answer below explaining PCA. It has some pictures that you might also find helpful.

Suppose you are given points $x_{1},\ldots,x_{N}$ in $p$-dimensional (Euclidean) space. Let $\mathbf{x}\sim\operatorname{Unif}(x_{1},\ldots,x_{N})$ be a random vector that takes on one of these points with uniform probability. By definition, the expectation $\mathbb{E}\mathbf{x}$ and variance $\operatorname{Var}(\mathbf{x})$ of this vector are simply the sample mean and sample variance of your points $x_{1},\ldots,x_{N}$. Let's proceed assuming $\mathbb{E}\mathbf{x}=0$ (if this is not true, you can always work with $\mathbf{x}^{\prime}=\mathbf{x}-\mathbb{E}\mathbf{x}$ instead).

The first principal component is defined to be a unit direction $v_{1}$ that maximizes the sample variance of your points: $$ v_{1}=\operatorname{argmax}_{\Vert v\Vert=1}\operatorname{Var}(\mathbf{x}\cdot v). $$ Now, let $X$ be an $N\times p$ matrix $$ X=\begin{pmatrix}x_{1}^{\intercal}\\ \vdots\\ x_{N}^{\intercal} \end{pmatrix} $$ whose rows are your points. Note that $$ \operatorname{Var}(\mathbf{x}\cdot v)=\frac{1}{N}\sum_{i=1}^{N}(x_{i}\cdot v)^{2}=\frac{1}{N}\left\Vert Xv\right\Vert ^{2}=\frac{1}{N}\left(Xv\right)^{\intercal}\left(Xv\right)=\frac{1}{N}v^{\intercal}X^{\intercal}Xv. $$ It follows that the first principal component satisfies $$ v_{1}=\operatorname{argmax}_{\left\Vert v\right\Vert =1}v^{\intercal}X^{\intercal}Xv. $$ We want to apply the method of Lagrange multipliers to solve the above. As such, we define $$ \mathcal{L}(v;\lambda)=v^{\intercal}X^{\intercal}Xv-\lambda\left(\left\Vert v\right\Vert^2 -1\right). $$ The gradient of $\mathcal{L}$ with respect to $v$ is $$ [\nabla_{v}\mathcal{L}](v;\lambda)=2X^{\intercal}Xv-2\lambda v. $$ Setting the gradient to zero, it follows that $v_{1}$ must satisfy $X^{\intercal}Xv_{1}=\lambda v_{1}$. In other words, $v_{1}$ is an eigenvector of $X^{\intercal}X$.

To determine which eigenvector (and consequently, what the eigenvalue $\lambda$ means), we plug $v_{1}$ back into the expression for the sample variance. Using the fact that $X^{\intercal}Xv=\lambda v$ and $v^{\intercal}v=1$, $$ \operatorname{Var}(\mathbf{x}\cdot v_{1})=\frac{1}{N}v_{1}^{\intercal}X^{\intercal}Xv=\frac{\lambda}{N}v_{1}^{\intercal}v_{1}=\frac{\lambda}{N}. $$ Since $v_{1}$ maximizes the sample variance, it follows that $v_{1}$ is an eigenvector associated with the largest eigenvalue $\lambda_{1}$ of the matrix $X^{\intercal}X$ (all eigenvalues of $X^{\intercal}X$ are nonnegative since $X^{\intercal}X$ is positive semidefinite). Moreover, the above tells us that $\lambda_{1}/N$ is variance explained by the first principal component.

In general, letting $\lambda_{k}$ denote the $k$-th eigenvalue of $X^{\intercal}X$, $$ \boxed{\frac{\lambda_{k}}{N}\text{ is the variance explained by the }k\text{-th principal component}} $$ It's important to point out that in practice, people usually talk about $\sigma_{k}=\sqrt{\lambda_{k}}$ instead of talking about $\lambda_{k}$ directly. Due to the connection between PCA and SVD, $\sigma_k$ is called the $k$-th singular value.

1
On

This matrix based derivation might provide a useful perspective. Consider the following decomposition of the covariance matrix $\Sigma$:

$$ E\left[\mathbf{x}\mathbf{x}^T\right]=\Sigma=Q\Lambda Q^{T} $$

where $E$ is the expectation operator, $\mathbf{x}$ is the data vector, $Q$ is the matrix of orthonormal eigenvectors of $\Sigma$ and $\Lambda$ is the diagonal matrix of the corresponding eigenvalues.

Note for later that $$Q^{T}\Sigma Q=\Lambda $$.

Now, the projections of the data onto the eigenvectors is given by $Q^{T}\mathbf{x}$, so we can ask what is the covariance of these transformed data? We obtain this from

$$ E\left[Q^{T}\mathbf{x}\left(Q^{T}\mathbf{x}\right)^{T} \right] $$

$$ E\left[Q^{T}\mathbf{x}\mathbf{x}^{T}Q \right] $$

$$ Q^{T}E\left[\mathbf{x}\mathbf{x}^{T}\right]Q $$

$$ Q^{T}\Sigma Q=\Lambda $$

Therefore, the eigenvalues of the original covariance matrix (i.e. the entries of the diagonal matrix $\Lambda$) are the variances of the projected data along the eigenvectors. I hope this helps.