I have a question regarding the OLS Estimator of $\sigma^2$. In Gujarati's book on Econometrics author derives $E(\sum_{i=1}^n \hat u_i^2)$ (aka the expected value of residuals) to be $(n-2)\sigma$. However, I can't understand how author calculates $E\sum_{i=1}^n (u_i -\bar u)^2$ to be equal to $(n-1)\sigma$ when for a PRL (population regression line) we assume by GM theorem that $\bar u$ should be equal to zero.
I do understand that $E(u_i|X)=0$ and it looks like Gujarati is talking about $E(u_i|Y)$ in that equation. But why then it makes difference if by default we assume that the mean value for the error term in the population should be equal to zero? Is there then any difference between $\bar u$ and $E(u)$?
Let $X_i$ be $n$ independent, identically distributed random variables with mean $\mu$ and variance $\sigma^2$. Let $\overline{X}=\frac{1}{n} \sum_{i=1}^n X_i$. Let $S'^2$ be the "unnormalized" estimator of the variance, i.e. $\sum_{i=1}^n (X_i-\overline{X})^2$. Then $\mathbb{E}[S'^2]=(n-1)\sigma^2$. Proof:
$$\mathbb{E}[S'^2]=\mathbb{E} \sum_{i=1}^n (X_i-\mu+\mu-\overline{X})^2 \\ = \mathbb{E} \left ( \sum_{i=1}^n (X_i-\mu)^2 + 2(X_i-\mu)(\mu-\overline{X}) + (\mu-\overline{X})^2 \right ) \\ = n \left ( \sigma^2 - \frac{2}{n} \sigma^2 + \frac{1}{n} \sigma^2 \right ) \\ = (n-1) \sigma^2.$$
For the first term this is just summing up $n$ copies of $\sigma^2$. In the second term, write:
$$\mathbb{E}[(X_i-\mu)(\mu-\overline{X})]=\frac{1}{n} \sum_{j=1}^n \mathbb{E}[(X_i-\mu)(\mu-X_j)].$$
The term $j=i$ contributes $-\sigma^2$ to the sum; the others contribute nothing because of independence.
In the last term, write $\mathbb{E}[(\mu-\overline{X})^2]=\mathbb{E} \left [ \left ( \mu-\frac{1}{n} \sum_{i=1}^n X_i \right )^2 \right ]$. Use the multinomial theorem to expand the square. Cancel the cross terms using independence again, and then notice that you have $n$ copies of $\frac{1}{n^2} \sigma^2$ remaining. (There are shorter proofs of this: ultimately I am just using that variances of independent r.v.s add and $\operatorname{Var}(aX)=a^2 \operatorname{Var}(X)$ for constants $a$.)
I'm not sure exactly what is being calculated to obtain $(n-2)\sigma^2$.