Computing covariance matrix in PCA

183 Views Asked by At

I am implementing PCA in matlab and I have to compute the covariance matrix. I am using 'cov' command from matlab to compute the covariance matrix. But it is very slow and takes a lot of time to compute the covariance matrix. Is there any other faster way to compute the matrix?

1

There are 1 best solutions below

0
On

I just ran

X=rand(8545,2); cov(X)

in MATLAB and got an answer nearly instantaneously. I'm not sure why yours would be slower. (I have a fairly pedestrian computer -- it's a 2012 Macbook Air, 1.7 GhZ Intel Core i5 with 4 GB of RAM. I'm also running a fairly old version of MATLAB -- R2012a.)

But in general, here are a few things you could try, if you were indeed to run into scaling issues for this type of problem:

  • Try writing your code in Julia instead of MATLAB. Julia is a relatively new technical computing language with high-level syntax similar to MATLAB or R, but tends to be substantially more performant.
  • If the language you're writing in has parallel processing capacities, and if the dataset you have has very many samples, try dividing the dataset into $c=1,...,C$ "chunks" of slices, and then use the outer-product representation of matrix multiplication to expand: $$X^TX = \displaystyle\sum_{c=1}^C X_c^TX_c$$

  • See Petros Drineas et al.'s work on approximate matrix multiplication (http://epubs.siam.org/doi/pdf/10.1137/S0097539704442684). The paper shows how Monte Carlo importance sampling can be used to approximate very large matrix products.

  • If the ultimate goal is to obtain a PCA on a very large dataset, you can typically get a very good approximation through Halko et al.'s randomized PCA. (http://epubs.siam.org/doi/pdf/10.1137/100804139)