Consider the standard linear model $$y_i = \beta^TX_i+\varepsilon_i,\quad \begin{cases}i=1,\dots,n \\ \beta\in \mathbb R^p\\ E(\varepsilon_i\mid X_i) = 0\\ \mathrm{Var}(\varepsilon_i\mid X_i) = \sigma^2 \end{cases}$$ , where the target of inference is $\beta$. Assume that the empirical covariance $n^{-1}X^TX$ is invertible almost surely, where $X$ is the $n\times p$ matrix whose $i$th row is $X_i$. This is true, if for example, $X_i \overset{\text{i.i.d}}{\sim}\mathcal N(0,\Sigma)$ with $\Sigma$ non-singular and $n\geq p$. Let $\hat \beta$ be the least squares estimator, namely $$\hat \beta = (X^TX)^{-1}X^Ty$$
I'm trying to show that for a random X that
E($\underset{\sim}{\hat\beta}$)=$\underset{\sim}{\beta}$
Var($\underset{\sim}{\hat\beta}$)=$\sigma^2$E[$(X^TX)$-1$]$
I believe I was able to show that E($\underset{\sim}{\hat\beta}$)=$\underset{\sim}{\beta}$. However, I am pretty lost at how to get the variance. I was able to show that
Var($\underset{\sim}{\hat\beta}$)=$\sigma^2$$(X^TX)$-1 for a fixed X.
But I don't know how I would get to the Variance of beta hat for a random X.
How do I get this Var($\underset{\sim}{\hat\beta}$)=$\sigma^2$E[$(X^TX)$-1$]$?
I would really appreciate some help showing me how to get there because I simply don't see how you would. Thank you for your time and help!
One has $$E(\hat \beta \mid X) = \beta+(X^TX)^{-1}X^TE(\varepsilon\mid X) = \beta$$ Thus $E(\hat \beta) = \beta$. Straightforward algebra then shows $$\hat \beta \hat \beta^T = \beta\beta^T+\beta\varepsilon^T X (X^TX)^{-1}+(X^TX)^{-1}X^T\varepsilon\beta^T+(X^TX)^{-1}X^T\varepsilon\varepsilon^TX(X^TX)^{-1}$$ It is easy to believe (with some abuse of notation) that $E(\varepsilon ^T X\mid X) = E(X^T\varepsilon \mid X) = 0$. Namely $$ E(\hat \beta\hat \beta^T \mid X) = \beta \beta^T + \sigma^2(X^TX)^{-1} $$ Whence $\mathrm{Cov}(\hat \beta) = \sigma^2 E[(X^TX)^{-1}]$
Remark: One would expect that $$E(\varepsilon_1\mid \sigma(X_1,X_2)) = E(\varepsilon_1 \mid X_1)$$ The way to do it is to check it for rectangles $B = (X_1 \in B_1,X_2\in B_2) \in \sigma(X_1,X_2)$ and then apply a monotone class / $\pi-\lambda$ argument.