Interpretation of PCA

227 Views Asked by At

I am wondering if there is a practical interpretation of a principal component analysis: Consider you have a data matrix $X\in\mathbb{R}^{N\times p}$ and you perform a principal component analysis where you typically receive certain directions $v_1,...,v_q$, $q<p$, in $\mathbb{R}^N$ that explain the most of the variance in the data. Is there an interpretation of these principal components in terms of the original components, i.e. the variables $x_1,...,x_p$ that constitute the model. Think e.g. of $x_i$ being certain "variables" of a human body such as weight, blood pressure etc. that should be used to predict expected life time. If one now performs a PCA as described a above, one recognizes that certain linear combinations of the columns of $X$ explain most of the variance. If one wants to reduce the model (i.e. reduce the $p$), which variables do you exclude given the information of the PCA?

1

There are 1 best solutions below

0
On

I assume that by "reducing the model," you mean to omit those measurements from the predictive model entirely (such that, for instance, you wouldn't even need to collect information about those variables in the future).

For this kind of goal, ordinary PCA is inappropriate, because each new variable is, as you said, a linear combination of the old variables.

If you want to stay within the realm of principal components analysis, then you should look at Sparse PCA. Sparse PCA essentially finds an approximation to the true PCA under the constraint that the principal components must be sparse.

However, there are two disadvantages here:

  • PCA and variants are solutions in the realm of unsupervised learning. When these methods perform dimensionality reduction, they consider covariance structure, but they completely disregard the utility of the predictor variables for predicting response variable (here, expected life time).
  • Moreover, sparsity of the principal components doesn't necessarily map that well to variable reduction. Even if one of the original variables is dropped from many or almost all of the linear combinations constituting the new variables, you would still have it "in the model" so long as it obtained a non-zero value in a single new predictor variable. (More precisely, if you set up your data matrix such that the rows are samples and the columns are predictors, and you find the SVD of your data matrix as $X=U\Sigma V^T$, then you would need to retain the $i$th variable in your model so long as the $i$th row of V attained a non-zero value in a single column).

Thus, it sounds like what you really want is the lasso. In the realm of predictive modeling, the lasso would find a subset of predictor variables that are useful for predicting the response variable.