Estimate the variance of a population given only the means of independent subpopulations

62 Views Asked by At

Given a set of independent samples $x_{i,j}$ where $i$ ranges from 1 to $m$ and $j$ ranges from 1 to $n$, it is easy to estimate the variance of the underlying distribution using a formula like \begin{equation} \text{Var}(x) = \sum_{i,j} x_{i,j}^2 - (\sum_{i,j} x_{i,j})^2. \end{equation} Unfortunately, I need to estimate this variance without directly measuring the $x_{i,j}$. Instead, I only have access to the set of "sub-population means" \begin{equation} y_i = \frac1n\sum_{j=1}^nx_{i,j}. \end{equation} It's easy to see that $\text{Var}(y)$ is an underestimate of $\text{Var}(x)$ using arguments based on the law of large numbers or jensen's inquality. So how can I get an unbiased estimate the variance of the $x$s given only the $y$s?

1

There are 1 best solutions below

3
On

We can easily compute the variance of $y$ as $$\operatorname{Var}(y_i) = \operatorname{Var}(\frac{1}{n}\sum_{j=1}^n x_{ij}) = \frac{1}{n^2} \sum_{j=1}^n\operatorname{Var}(x_{ij})= \frac{1}{n}\operatorname{Var}(x_{ij}).$$ Thus if $\hat{y}$ is any unbiased estimator of $\operatorname{Var}(y)$, then $n\cdot \hat{y}$ is an unbiased estimator of $\operatorname{Var}(x)$.