Why is the residual variance / pooled sample variance divided by n-k in ANOVA?

38 Views Asked by At

I was looking for a proof such as for sample variance where it's shown that expected value of sample variance with n-1 in the denominator yields the parameter. I'm not even sure what pooled sample variance / residual variance tries to estimate $$ E[\frac{1}{n-k}\sum_{i=1}^{n}(y_i-\bar{y}_{g(i)})^2] = E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] =?$$ n - # observations, k - # groups, $n_j$ # observations in $j$ groups, $s_j^2$ group variance. $g(i)$ assign

Is it population variance? I think no, because it's about groups.

My attempt was: $$ E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)E[s_j^2] $$ Yet I'm not sure what is the expectation of the group. Thanks

1

There are 1 best solutions below

3
On BEST ANSWER

First note that $n=\sum_{i=1}^{k}n_j$. Secondly, for each group $j$ we consider the following linear model:

$$X_{ji}=\mu_j+\epsilon_{ji}, i=1,\dots,n_j$$

where $\epsilon_{ij}$ are independent and follow $\mathcal N (0,\sigma^2)$.

Hence, the sample variance $S^2_j$ of the observations $X_{ji}, i=1,\dots,n_j$ from group $j$ is an unbiased estimator of $\sigma^2$, i.e., $\mathbb E[S_j^2]$. Finally, we have

$$\text{MSE}=\frac{SSE}{n-k}= \mathbb E [\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)S_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1) \mathbb E[S_j^2]\\=\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)\sigma^2 =\sigma^2 \frac{1}{n-k} \left ( \sum_{i=1}^{k}n_j -k \right )=\sigma^2,$$

which means that $\text{MSE}$ is also an unbiased estimator $\sigma^2$ (it is better as it has a less variation compared to each $S^2_j$). Now you can see why $n-k$ is used here.