Merging $\ n $ number of $\ n\times n $ matrices together to form one

37 Views Asked by At

Assume that a dataset $\ D $ has been (horizontally) divided into $\ n $ subsets and generated $\ n $ number of covariance matrices $\ C1, C2...Cn $ for each subset.

How can I merge $\ C1, C2...Cn $ to form $\ Cm $ in such a way that $\ Cm = C(D) $, where, $C(D)$ is the covariance matrix of $\ D $

I already tried the method provided in wikipedia, but when the $\ Cm $ is compared with $\ C(D) $, they are totally different.

1

There are 1 best solutions below

0
On

Expanding a bit on the linked page above, for $n$ sets of data $S_i$, you can compute $$ C_i = \frac{1}{|S_i|-1}\sum_{s\in S_i}(s-\mu_i)(s-\mu_i)^T $$ and then use the following estimator $$ C = \frac{1}{N}\sum_i |S_i|\,C_i $$ where $N = \sum_i |S_i|$.

Assuming the subpopulations do come from the same distribution, this is an unbiased estimator. But note that it should have some variance; i.e. expect some variance compared to the more direct "full" estimate.