Is it possible to compute Pearson correlation coefficient in parallel?

974 Views Asked by At

Let be $X = \{x_1, ..., x_n\}$ and $Y = \{y_1, ..., y_n\}$ two vectors of the same length $n$.

Is it possible to compute the Pearson correlation coefficient between $X$ and $Y$ in parallel?

More precisely, is it possible to compute $\rho = \operatorname{corr}(X,Y)$ by computing $\rho_1 = \operatorname{corr}(X_1,Y)$ and $\rho_2 = \operatorname{corr}(X_2,Y)$ separately where $X_1 = \{x_1, ..., x_{n/2}\}$ and $X_2 = \{x_{n/2 + 1}, ..., x_{n}\}$?

1

There are 1 best solutions below

0
On BEST ANSWER

To compute the Pearson correlation, you will need to compute the variance and the covariance of the two operations. While the computation of correlation cannot be splitted, it is perfectly possible to split the computation of variance and covariance.

To see this, suppose that we are dealing with the variance, which is defined as $\frac{1}{n} \sum_i x_i^2 - (\frac{1}{n}\sum_i x_i)^2$. This is nothing but a summation and can be implemented using a pattern known as reduction. One implementation on CUDA can be found here, or the famous map reduce