Is PCA at odds with stability?

328 Views Asked by At

For simplicity, let's consider a 2D case only.

I am measuring coordinates of say some 1000 data points lying on a regular 2D cartesian plane. The recorded $(x,y)$ values are placed as columns of matrix $X$ having dimensions $2 \times 1000$.

If I now attempt to get the covariance matrix $XX^\top$ mostly it will have several off diagonal entries. Then as popularised in approaches like PCA, we choose a basis vector, one of which is along the direction of maximum variance and the second one orthogonal to it. The eigen value for first eigenvector will be large (perhaps much larger) compared to the second eigenvalue.

Let the new system matrix using the new set of eigenvectors be $Y$ instead of $X$.

Now the condition number is ratio of the largest to smallest eigenvalue. Thus I expect that $X$ will have better condition number than $Y$.

Can I thus conclude that for solving equations $X$ should be preferred over $Y$, which was obtained using eigen-value-decomposition as in PCA (here though we are not doing any dimensionality reduction)?

1

There are 1 best solutions below

1
On

Not sure that this is a full answer, but it may help:

Once you use the PCs to construct the design matrix $X$, instead of the original values $X$, then your new axes are orthogonal to each other and thus the condition number should be $1$. Namely, the most stable solution of a linear model. The condition number of the original design matrix (given standardized data) is the ratio between the variance of the "most important" PC and the "least important" PC. But once you use the PCs to construct the design matrix, the design matrix become just the identity matrix.

Assume that your model is $y=\beta_1x_1 + \beta_2x_2 + \epsilon$, both $x_1$ and $x_2$ are standardized. Your design matrix is $X'X$, where the $(X'X)_{12}=(X'X)_{12}$ elements are $\sum_{i=1}^n x_{1i}x_{2i}$. For non orthogonal $x_1$ and $x_2$ this term is not zero. Hence, the condition number of $X'X$ is larger than $1$. Performing PCA on the covariance matrix $\frac{1}{n}X'X$ will rotate your data to the new axes that are the PCs. Now instead of $x_1$ and $x_2$ you have $PC_1$ and $PC_2$, which are orthogonal and with length $1$. Stack the PCs in columns such that $Z=[PC_1, PC_2]$. Therefore, the design matrix is $Z'Z=I$. Its condition number is $1$, thus the second estimated model is more stable.