I was looking for a proof such as for sample variance where it's shown that expected value of sample variance with n-1 in the denominator yields the parameter. I'm not even sure what pooled sample variance / residual variance tries to estimate $$ E[\frac{1}{n-k}\sum_{i=1}^{n}(y_i-\bar{y}_{g(i)})^2] = E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] =?$$ n - # observations, k - # groups, $n_j$ # observations in $j$ groups, $s_j^2$ group variance. $g(i)$ assign
Is it population variance? I think no, because it's about groups.
My attempt was: $$ E[\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)s_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)E[s_j^2] $$ Yet I'm not sure what is the expectation of the group. Thanks
First note that $n=\sum_{i=1}^{k}n_j$. Secondly, for each group $j$ we consider the following linear model:
$$X_{ji}=\mu_j+\epsilon_{ji}, i=1,\dots,n_j$$
where $\epsilon_{ij}$ are independent and follow $\mathcal N (0,\sigma^2)$.
Hence, the sample variance $S^2_j$ of the observations $X_{ji}, i=1,\dots,n_j$ from group $j$ is an unbiased estimator of $\sigma^2$, i.e., $\mathbb E[S_j^2]$. Finally, we have
$$\text{MSE}=\frac{SSE}{n-k}= \mathbb E [\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)S_j^2] = \frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1) \mathbb E[S_j^2]\\=\frac{1}{n-k}\sum_{i=1}^{k}(n_j - 1)\sigma^2 =\sigma^2 \frac{1}{n-k} \left ( \sum_{i=1}^{k}n_j -k \right )=\sigma^2,$$
which means that $\text{MSE}$ is also an unbiased estimator $\sigma^2$ (it is better as it has a less variation compared to each $S^2_j$). Now you can see why $n-k$ is used here.