Variance of mean of set v. sum of variances of means of arbitrary partition of set

28 Views Asked by At

I'm interested in the relation between the variance of the mean of set of real numbers, $A$, and sum of the variances of the means of any arbitrary partition of $A$ into a smaller number of sets.

Let's call the variance of the mean of the original set

$$ V_A = \frac{ \frac{1}{n} \sum^n_{i=1} (A_i - \bar{A})^2 }{n} $$

and the weighted sum of the variances of the means of some arbitrary partitioning of the set

$$ V_P = \left( \frac{n}{n_x}\right)^2 \frac{ \frac{1}{n_x} \sum^{n_x}_{i=1} (A_i - \bar{A_x})^2 } {n_x} + \left( \frac{n}{n_y}\right)^2 \frac{ \frac{1}{n_y} \sum^{n_y}_{i=1} (A_i - \bar{A_y})^2 } {n_y} + ... + \left( \frac{n}{n_k}\right)^2 \frac{ \frac{1}{n_k} \sum^{n_k}_{i=1} (A_i - \bar{A_k})^2 } {n_k} $$

where $n = n_x + n_y + ... + n_k$. (In case this notation isn't clear, I provide a short demonstration below in R.)

Obviously, $V_P$ is minimized when $k$ is set to $n$ such that $V_P = 0$. Thus, $V_P$ can be arbitrarily smaller than $V_A$. By contrast, I haven't thought of many cases where $V_A < V_P$, and I'm wondering if it's possible to find a bound for the difference between $V_A$ and $V_P$ when $V_A < V_P$.

Here's a simple example.

t <- 1

A <- rep(c(0, 20), t)
B <- rep(c(11, 9), t)
C <- rep(c(11, 9), t)

n <- length(A)

# Calculate variances (not sample variances)
var_A <- var(A) * (n / (n-1))
var_B <- var(B) * (n / (n-1))
var_C <- var(C) * (n / (n-1))
var_ABC <- var(c(A, B, C)) * (3 * n / (3 * n-1))

var_ABC / (3 * n)
(1/n^3) * (var_A / n) + (1/n^3) * (var_B /n ) + (1/n^3) * (var_C /n)

In this case, for $t = 1$, $V_A < V_P$ but for $t > 1$, $V_P < V_A$.

I suppose there is not a novel question, but I haven't found an answer so would be grateful if someone can point me in the right direction. If there's a bound, how do we establish it; if not, perhaps an example to show that there is no bound.