Let $x_1,\ldots,x_n$ be distinct regressor variables. For each $x_i$, there are $n_i$ observations $Y_{i1},\ldots,Y_{in_i}$ such that $$Y_{ij}=\alpha x_i+\beta+\epsilon_{ij},$$ where $\epsilon_{ij}\sim N(0,\sigma^2)$ are independent. Let $$\bar{Y}_i\mathrel{\mathop:}=\sum_{j=1}^{n_i}Y_{ij}$$ and $$N\mathrel{\mathop:}=\sum_{i=1}^nn_{i}.$$ Let $\hat{\alpha}$ and $\hat{\beta}$ be the least squares estimators of $\alpha$ and $\beta$, respectively, based on the values of $(x_i,Y_{ij})$, and $$\hat{Y}_i=\hat{\alpha}x_i+\hat{\beta}.$$ It is easy to see that the sum of squares due to error splits into "pure error" part and "lack of fit" part: $$\sum_{i=1}^n\sum_{j=1}^{n_i}(Y_{ij}-\hat{Y}_i)^2=\sum_{i=1}^n\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_i)^2+\sum_{i=1}^nn_i(\bar{Y}_i-\hat{Y}_i)^2$$
This Wikipedia article says, without proof, the following:
- The sum of squares due to pure error, divided by $\sigma^2$, has a chi-squared distribution with $N-n$ degrees of freedom: $$\frac{\sum_{i=1}^n\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_i)^2}{\sigma^2}\sim\chi^2(N-n),$$
- The sum of squares due to lack of fit, divided by $\sigma^2$, has a chi-squared distribution with $n-2$ degrees of freedom. $$\frac{\sum_{i=1}^nn_i(\bar{Y}_i-\hat{Y}_i)^2}{\sigma^2}=\chi^2(n-2)$$
- The two sums of squares are independent.
I can see the first part, but how can I prove the second and third parts?
The second part follows from the first and third parts, so it suffices to prove the third part, that $$SSE\, \mbox{(pure)} = \sum_{i=1}^n\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y}_i)^2$$ is independent from $$SSE\, \mbox{(LOF)} = \sum_{i=1}^nn_i(\bar{Y}_i-\hat{Y}_i)^2.$$ Note that $$\hat{Y}_i=B_0+B_1x_i,$$ where $$B_1=\frac{S_{xY}}{S_{xx}}=\frac{\sum_{i=1}^n\sum_{j=1}^{n_i}(x-x_i)(Y_{ij}-\bar{Y})}{\sum_{i=1}^n\sum_{j=1}^{n_i}(x_i-\bar{x})^2}=\frac{\sum_{i=1}^nn_i(x-x_i)(\bar{Y}_i-\bar{Y})}{\sum_{i=1}^nn_i(x_i-\bar{x})^2}$$ and $B_0=\bar{Y}-B_1\bar{x}$. Here, we have $$\bar{Y}=\frac{\sum_{i=1}^n\sum_{j=1}^{n_i}Y_{ij}}{n}=\frac{\sum_{i=1}^nn_i\bar{Y}_i}{\sum_{i=1}^nn_i}$$ and $$\bar{x}=\frac{\sum_{i=1}^n\sum_{j=1}^{n_i}x_i}{n}=\frac{\sum_{i=1}^nn_ix_i}{\sum_{i=1}^nn_i}.$$ Note that it is as if there are $n_i$ repeated measurements of $\bar{Y}_i$ at each $i$. In particular, $B_0$ and $B_1$ are linear combinations of $\bar{Y}_1,\ldots,\bar{Y}_n$. Since $Y_{i1},\ldots,Y_{in_i}\sim N(\beta_0+\beta_1x_i,\sigma^2)$ are independent, $S_i^2$ and $\bar{Y}_i$ are independent. Since $Y_{ij}$ and $Y_{k\ell}$ are independent if $i\neq k$, it follows that $S_i^2$ are independent from $\bar{Y}_1,\ldots,\bar{Y}_n$; that is, $S_1^2,\ldots,S_n^2$ are independent from $\bar{Y}_1,\ldots,\bar{Y}_n$. Since $B_0$ and $B_1$ are linear combinations of $\bar{Y}_1,\ldots,\bar{Y}_n$, it follows that $SSE\,\mbox{(pure)}$ is independent from $SSE\,\mbox{(LOF)}$.