PCA algorithm on low coverage features

90 Views Asked by Bumbble Comm At 11 May 2026 - 4:56

Suppose I am using PCA on traditional user-item matrix (each user for each row, each feature for each column), and I want to use PCA to lower feature dimension and use compacted features for a two class classification problem.

Suppose some features has very low coverage (very few users has this feature, all others are None), but such features are has strong prediction power for classification (e.g. measured by mutual information). Wondering PCA algorithm will ignore such features since its coverage is very low? Thanks.

regards, Lin

Original Q&A

There are 1 best solutions below

Bumbble Comm On 04 Oct 2016 - 3:27

PCA will still give strong weights to these features, so long as it is evident from your data that they have a strong effect.

To be more concrete, PCA is equivalent to the eigenvalue decomposition of the Covariance Matrix $C_{xx}$ = $\frac1{n-1}*X*X^T$.

If your data implies that the data is highly variant along certain dimensions, PCA will essentially amplify this difference, providing you with an optimal coordinate system to view your data which emphasizes which coordinates play a large role in the values of your data. Your eigenvector matrix will provide your new basis.

In MATLAB:

[V, D] = eig(cov(X))

should do the trick, where V will yield your new coordinates (Principal Components)

I'm sure Numpy or Scipy has some equivalent functionality that I'm not aware of.

PCA algorithm on low coverage features

There are 1 best solutions below

Related Questions in VECTOR-SPACES

Related Questions in MACHINE-LEARNING

Related Questions in CLUSTERING

Trending Questions

Popular # Hahtags

Popular Questions