Mean and Variance of subset of a data set

1.3k Views Asked by At

I have a data set of position measurements of an object. However, the data set is split into subsets. The subsets have equal size. I want to find the mean and variance of the whole data set only with access to the subsets, and the not the whole data set at once. How would I go about doing this? Is it correct that the mean of the means of the subsets is equal to the mean of the whole set? What about the variance?

2

There are 2 best solutions below

0
On BEST ANSWER

Suppose you have $k$ groups of observations on a variable $x$ (say) where the $i$th group consists of $n_i$ observations, $i=1,\ldots,k$. Let the $j$th observation in the $i$th group be $x_{ij}$ for $i=1,\ldots,k$ and $j=1,\ldots,n_i$.

The $i$th group mean is defined as $$\overline{x_i}=\frac1{n_i}\sum_{j=1}^{n_i} x_{ij}\quad,\,i=1,\ldots,k $$

Then the pooled mean or combined mean is given by $$\overline{\overline x}=\frac{\sum_{i=1}^k n_i\overline x_i }{\sum_{i=1}^k n_i} $$

This is a weighted average with weights being the number of observations in the $i$th group.

The $i$th group variance is defined as

$$s_i^2=\frac1{n_i}\sum_{j=1}^{n_i}\left(x_{ij}-\overline{\overline x}\right)^2\quad,\,i=1,\ldots,k $$

And the pooled variance based on all observations from all groups is given by

$$s^2=\frac{\sum_{i=1}^k\sum_{j=1}^{n_i}\left(x_{ij}-\overline{\overline x}\right)^2}{\sum_{i=1}^k n_i}=\frac{\sum_{i=1}^k n_is_i^2}{\sum_{i=1}^k n_i}+\frac{\sum_{i=1}^k n_i\left(\overline x_i-\overline{\overline x}\right)^2}{\sum_{i=1}^k n_i}$$

You are concerned with the case $n_1=n_2=\cdots=n_k$.

0
On

If ${\cal D} = \{ {\cal D}_1, {\cal D}_2, \ldots, {\cal D}_n \}$, each of the same number of points, then the mean of ${\cal D}$ is simply the mean (expectation) of the individual means:

$${\cal E}[{\cal D}] = \frac{1}{n} \sum\limits_{i=1}^n {\cal E}[{\cal D}_i]$$

The variance sum law states is just the sum of the individual variances (for non-zero variances):

$${\rm Var}[{\cal D}] = \sum\limits_{i=1}^n {\rm Var}[{\cal D}_i]$$