I am taking a course of Econometrics:
I need help to understand as to how do we arrive at the formula for standard error of regression $$\hat{\sigma}^2=\frac{\sum{e_i^2}}{n-k}.$$
I understand the bessel's correction required to remove the bias inherent in sample variance. The proof being available at Bessels Correction Proof of Correctness.
I also found Standard deviation of error in simple linear regression
How to derive the standard error of linear regression coefficient
But I could not find the proof for the above expression (standard error of regression estimate).
I tried to open the equation on the lines of Bessels Correction proof.
$$e_i=\text{Total SS}- \text{Explained SS}$$
Then I try to expand the Explained sum of squares term, but I got stuck at
$$ \sum _{i=1}^n \operatorname {E} \left((\beta\mathbf{ X}-\bar{y} )^2 \right) = \beta^2 E(x^2)-2\beta\bar{xy}+E(\bar{y}^2)$$
I don't know how to proceed. Can anyone please help ?
Then I read this :
The term "standard error" is more often used in the context of a regression model, and you can find it as "the standard error of regression". It is the square root of the sum of squared residuals from the regression - divided sometimes by sample size n (and then it is the maximum likelihood estimator of the standard deviation of the error term), or by $n−k$ ($k$ being the number of regressors), and then it is the ordinary least squares (OLS) estimator of the standard deviation of the error term.
on Standard Error vs. Standard Deviation of Sample Mean
Can anyone suggest a textbook where I can read about these derivations in more details ?
Here's one way. This will work only if you understand matrix algebra and the geometry of $n$-dimensional Euclidean space.
The model says $y_i = \alpha_0 + \sum_{\ell=1}^k \alpha_\ell x_{\ell i} + \varepsilon_i, \quad i=1,\ldots,n $ where
The $n\times(k+1)$ "design matrix" is $$ X= \begin{bmatrix} 1 & x_{11} & \cdots & x_{k1} \\ \vdots & \vdots & & \vdots \\ 1 & x_{1n} & \cdots & x_{kn} \end{bmatrix} $$ with independent columns and typically $n\gg k$.
The $(k+1)\times 1$ vector of coefficients to be estimated is $$ \alpha= \begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \vdots \\ \alpha_k \end{bmatrix}. $$ The model can then be written as $Y= X\alpha+\varepsilon$, where $Y, \varepsilon \in\mathbb R^{n\times 1}$. Then $Y$ has expected value $X\alpha\in\mathbb R^{n\times 1}$ and variance $\sigma^2 I_n\in\mathbb R^{n\times n}$.
The "hat matrix" is $H = X(X^T X)^{-1} X^T$, an $n\times n$ matrix of rank $k+1$. The vector $\widehat Y = HY$ is the orthogonal projection of $Y$ onto the column space of $X$. It is also $\widehat Y=HY = X\widehat\alpha$, where $\widehat\alpha$ is the vector of least-squares estimates of the components of $\alpha$.
The residuals are $\widehat\varepsilon_i = e_i = Y_i-\widehat Y_i = Y_i-(\widehat\alpha_0 + \sum_{\ell=1}^k \widehat\alpha_\ell x_{\ell i})$. These are observable estimates of the unobservable errors. The vector of residuals is $$ \widehat\varepsilon = e = (I-H)Y. $$ This has expected value $(I-H)\operatorname{E}(Y) = (I-H)X\alpha = 0$.
We seek \begin{align} & \operatorname{E}(\|\widehat\varepsilon\|^2) = \operatorname{E}(\|e\|^2) \\[10pt] = {} & \operatorname{E} ( \Big((I-H)Y\Big)^T \Big((I-H)Y\Big)) \\[10pt] = {} & \operatorname{E} (Y^T (I-H) Y) \qquad \text{since } (I-H)^T = I-H = (I-H)^2. \text{ (Check that.)} \end{align} We've projected $Y$ onto the $(n-(k+1))$-dimensional column space of $I-H$. The expected value of the projection is $0$.
I claim the variance of the projection is just $\sigma^2$ times the identity operator on that $(n-(k+1))$-dimensional space. The reason for that is that $I-H$ is itself the identity operator on that $(n-(k+1))$-dimensional space, which is the orthogonal complement of the column space of $X$.
So it's as if we have a random vector $w$ in $(n-(k+1))$-dimensional space with expected value $0$ and variance $\sigma^2 I_{(n-(k+1))\times(n-(k+1))}$, and we're asking what $\operatorname{E}(\|w\|^2)$ is. And that is $\sigma^2(n-(k+1))$.
Hence the expected value of the sum of squares of residuals (which is the "unexplained" sum of squares) is $\sigma^2(n-(k+1))$.