Insightful alternative proof that the average of sample variances of a set equals its overall sample variance (definition inside)

114 Views Asked by At

Let $A \subset \Bbb R$ be some set of $n$ numbers. We define its sample variance as $$\frac {1}{n-1}\sum \limits_{i=1}^n (a_i-\bar a)^2,$$ where $\bar a = \frac {a_1+\cdots+a_n}n$ is the average of $A$. It is a surprising fact that for every $2 \leq k \leq n$, the average of the sample variances of all subsets of $A$ of size $k$ equals the sample variance of $A$.

Put in formulas, the claim becomes $$\frac 1 {\binom n k} \sum_{|A|=k} \frac{1}{k-1} \sum_{i=1}^n \left(a_i-\frac 1k \sum_{a_j\in A}a_j\right)^2 = \frac 1{n-1} \sum_{i=1}^n \left(a_i-\frac{a_1+\cdots+a_n}n \right)^2$$

I verified that these awful sums are indeed equal by brute force, where the only tricks are noting that $\sum\limits_{|A|=k}\;\sum\limits_{i \neq j \in A}x_ix_j = \binom{n-2}{k-2}\sum_{i \neq j \in [n]}x_ix_j$ and likewise $\sum\limits_{|A|=k}\;\sum\limits_{i \in A}x_i^2 = \binom{n-1}{k-1}(x_1^2+\cdots+x_n^2)$.

The content of this theorem seems nice enough so that there ought to be better proofs, avoiding the arithmetic miracles, perhaps using the statistical interpretation of each side. I did not find online mentioning of this equality, though. Any ideas?

1

There are 1 best solutions below

0
On

I found a satisfying proof. Using the alternative representation of the sample variance as the average of $\frac 12 (a_i-a_j)^2$ over all pairs $a_i,a_j$ makes the claim become the simple assertion that the average of these averages is the overall average. (Which holds since each of them is of the same size, since we restrict to a fixed subset size.)