Analytical solution to the first PCA direction

261 Views Asked by At

It is known that the first PCA direction for a dataset of $n$ points is the unit vector with max variance after projecting the points onto this vector. I wonder whether there are some analytical solutions to compute this vector, instead of using the power method and the like? Or are there some fast algorithms to compute the vector?

1

There are 1 best solutions below

0
On BEST ANSWER

I believe that an analytic solution does not exist for data of 5 or more dimensions and does exist for data with less than 5 dimensions.

Proof for data with 5 dimensions or more

Given $\lambda = (\lambda_1, ..., \lambda_5)$, we can find an arbitrary 5x5 symmetric matrix $P_{\lambda}$ with eigenvalues $\lambda$, and characteristic polynomial :

$p(t) = a \cdot (t - \lambda_1)(t - \lambda_2)(t - \lambda_3)(t - \lambda_4)(t - \lambda_5)$

Saying that we can find an analytical solution for the first principal component of data with co-variance matrix $P_{\lambda}$, implies that we can analytically find a root of $p(t) = 0$. But it is known that there is no analytical solution for the roots of 5th (or higher) degree polynomials.

For data with less than 5 dimensions

Let's say you have 4-dimensional data with 4x4 covariance matrix. There is a (very complicated) formula for the roots of 4th degree polynomials. You can analytically compute all the roots of the characteristic polynomial $\lambda_1 \ge \lambda_2 \ge \lambda_3 \ge \lambda_4$, find the corresponding eigenvector by solving the 4x4 linear system $(A - \lambda_1 I)v = 0$, and normalize the vector $v$. Here you go - your first principal component.