Let $Y \sim N(X\beta,\sigma^2I)$ where $Rank(X_{n\times p})=p \leq n$. The least square estimate of $\sigma^2$ is $\hat{\sigma^2}=\frac{Y'(I-P)Y}{n-p}$ where $P=X(X'X)^{-1}X'$ is the projection matrix onto $C(X)$. Now by Chebyshev's inequality it's easy to show $\hat{\sigma^2}\xrightarrow[]{P}\sigma^2$. But how do I show
Question: $\hat{\sigma^2}\xrightarrow[]{a.s.}\sigma^2$?
My approach: $Z=(I-P)Y \sim N(0,\sigma^2(I-P))$
$\hat{\sigma^2}=\frac{Z'Z}{n-p}=\frac{n}{n-p}\frac{\sum Z_i^2}{n}$. Although $\frac{n}{n-p} \rightarrow 1$ but I can't apply SLLN as $Z_i$'s are not $i.i.d$ which is where I am stuck at.
Further question: If we drop normality assumption can we claim the above (i.e. a.s. convergence or in probability convergence)?
The model is given by
$$ y_i=X_i^{\top}\beta_0+\epsilon_i. $$
Denote $e:=(Y-X\hat \beta)=(I-P)Y=(I-P)(X\beta_0+\epsilon)=(I-P)\epsilon$. Then
$$ \hat{\sigma}^2=\frac{e^{\top}e}{n-p}=\frac{\epsilon^{\top}(I-P)\epsilon}{n-p}=\frac{n}{n-p}\left[\frac{\epsilon^{\top}\epsilon}{n}-\frac{\epsilon^{\top}X}{n}\left(\frac{X^{\top}X}{n} \right)^{-1}\frac{X^{\top}\epsilon}{n} \right]. $$
You don't need distributional assumptions to show that $\hat{\sigma}^2=\sigma^2+o_{a.s}(1)$. You need to impose appropriate assumptions on the moments of $\epsilon_i$ and $X_i$ in order to use one of the SLLNs. Then
\begin{align} \hat{\sigma}^2&=\frac{n}{n-p}\left[\frac{1}{n}\sum \epsilon_i^2-\frac{1}{n}\sum(\epsilon_iX_i^{\top}) \left(\frac{1}{n}\sum (X_iX_i^{\top}) \right)^{-1}\frac{1}{n}\sum(X_i\epsilon_i) \right] \\ &=(1+o(1))\left[(\sigma^2+o_{a.s.}(1))-o_{a.s.}(1)O_{a.s.}(1)o_{a.s.}(1)\right] \\[0.8em] &=\sigma^2+o_{a.s.}(1). \end{align}
For example, if $(X_i,\epsilon_i)$ are i.i.d., it suffices to assume that $\mathsf{E}\epsilon_i^2<\infty$ and $\mathsf{E}X_{i,j}^2<\infty$ for $j=1,\dots,p$ (also $\mathsf{E}[X_iX_i^{\top}]$ must be invertible).