Negative values in PCA

325 Views Asked by At

I have been trying to understand the meaning of negative values in PCA and what actions/considerations to take when faced with them. The following figure is the scores plot originating from about 6000 variables in four different groups. enter image description here

As it can be seen group 1 clusters nicely in quadrant 2 while the rest are much more scattered. For example, group 3 is both positive and negative along PC2, and one of the samples in the same is almost completly the opposite in quadrant 3. My questions: Should this sample be removed? Could this have arisen as possible mislabeling of original groupings?

1

There are 1 best solutions below

0
On

PCA reduces the dimensionality in the space of variables, finding, in your case 2, new orthogonal variables that are a linear combination of the original ones and capture most of the original variance. Then they can be useful to find clusters of observations in a reduced dimensional problem (in your case the plane) or, as you wish, interpret the groups.

However, if and how to interpret the score on the PCs depends on the interpretation of the PCs, for instance in terms of the original variables meaning and their contribution to the different PCs. So I do not think you should remove this sample just by looking at your plot, but it depends on the meanings you give to the PCs and also on what your groups represent.

Anyway, I suggest you ask this kind of questions on Cross Validated