Find $\mathrm{Var}(\hat\beta | X)$ :Linear Regression

91 Views Asked by At

Given $X$ has full rank I have $\hat\beta=(X^TX)^{-1}X^TY$

Now to find $\mathrm{Var}(\hat\beta | X)$

$$\mathrm{Var}(\hat\beta | X)=\mathrm{Var}((X^TX)^{-1}X^TY|X)=(X^TX)^{-1}X^T\mathrm{Var}(Y|X){((X^TX)^{-1}X^T)}^T$$ $$=(X^TX)^{-1}X^T \sigma^2 I X {(X^TX)^{-1}}^T$$ $$=\sigma^2 I {(X^TX)^{-1}}^T$$ Using property ${(X^TX)^{-1}}^T=(XX^T)^{-1}$ I have

$$\mathrm{Var}(\hat\beta | X)=\sigma^2 I(XX^T)^{-1}$$

However my notes say it should be:

$$\mathrm{Var}(\hat\beta | X)=\sigma^2 I(X^TX)^{-1}$$

I would be right if $X^TX$ is symmetric (edit: sorry I meant $X^TX=XX^T$) , but is it true? I tried computing 2x2 matrices with full rank and they were not equal.

1

There are 1 best solutions below

6
On BEST ANSWER

Your notes are correct.

Consider the model $y_i = \boldsymbol{x}_i^T \boldsymbol{\beta} + e_i$, $i=1,\ldots,n$, where $\boldsymbol{x}_i$ is a $p$-vector of (real-valued) auxiliary data, $\boldsymbol{\beta} \in \mathbb{R}^p$ is an unknown parameter vector; $e_i$ is a random error term. Define $\boldsymbol{y}=(y_1, \ldots, y_n)^T$, $\boldsymbol{e}=(e_1, \ldots, e_n)^T$, and $\boldsymbol{X}=(\boldsymbol{x}_1^T, \ldots, \boldsymbol{x}_n^T)^T$. We impose the following assumptions.

Assumption 1: (strict exogeneity): $\mathbb{E}[e_i \vert \boldsymbol{X}] = 0$, for all $i=1,\ldots,n$

Assumption 2: (no multicolinearity): The rank of the $(n \times p)$ matrix $\boldsymbol{X}$ is $p$.

Assumption 3: (spherical error variance):

  • (homoscedasticity): $\mathbb{E}[e_i^2 \vert \boldsymbol{X}]=\sigma^2 > 0$ (an unknown parameter)

  • (uncorrelatedness): $\mathbb{E}[e_ie_j] = 0$ for all $i,j=1,\ldots,n$, $i \neq j$

Let $\boldsymbol{b}$ denote the OLS estimator of $\boldsymbol{\beta}$ under our model, given by \begin{equation*} \boldsymbol{b} = (\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T \boldsymbol{y}, \end{equation*} and put $\boldsymbol{A} = (\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T$ (the matrix inversion is justified by Ass. 2); denote by $\boldsymbol{I}$ the identity matrix of size $(n \times n)$. We have for the variance of $\boldsymbol{b}$ \begin{align*} Var[\boldsymbol{b} \vert \boldsymbol{X}] &= Var[\boldsymbol{b}-\boldsymbol{\beta} \vert \boldsymbol{X}] && \text{since} \; \boldsymbol{\beta} \; \text{is not random}\\ % &= Var[ \boldsymbol{A} \boldsymbol{e} \vert \boldsymbol{X}] && \text{since} \; \boldsymbol{b} - \boldsymbol{\beta} = \boldsymbol{A}\boldsymbol{e}\\ % &= \boldsymbol{A} Var[\boldsymbol{e} \vert \boldsymbol{ X}] \boldsymbol{A}^T && \text{take} \; \boldsymbol{A} \; \text{out of Var}\\ % & =\boldsymbol{A} \mathbb{E} [\boldsymbol{e}\boldsymbol{e}^T\vert \boldsymbol{X}] \boldsymbol{A}^T && \text{by Assumption 1}\\ % &= \boldsymbol{A}(\sigma^2 \boldsymbol{I})\boldsymbol{A}^T && \text{by Assumption 3}\\ % &= \sigma^2 \boldsymbol{A}\boldsymbol{A}^T \\ % & = \sigma^2 (\boldsymbol{X}\boldsymbol{X})^{-1} && \end{align*} where, in the last equality, we have used $\boldsymbol{A}\boldsymbol{A}^T = (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T\boldsymbol{X} (\boldsymbol{X}^T\boldsymbol{X})^{-1}=(\boldsymbol{X}^T\boldsymbol{X})^{-1}$.

Note: above, we have used (the trick): \begin{align} \boldsymbol{b} -\boldsymbol{\beta} &= (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T \boldsymbol{y} - \boldsymbol{\beta} \\ &= (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T (\boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{e}) - \boldsymbol{\beta} \\ &= \boldsymbol{\beta} + (\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{e} - \boldsymbol{\beta} \\ &=(\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{e} \\ &= \boldsymbol{A}\boldsymbol{e} \end{align}

Further comments referring to your question.

Let $\boldsymbol{X}$ denote any $(n \times p)$ matrix over the real numbers (see our definition above). Then $\boldsymbol{X}^T \boldsymbol{X}$ is a symmetric $(p \times p)$ matrix; also, $\boldsymbol{X} \boldsymbol{X}^T$ is a symmetric $(n \times n)$ matrix. Hence, in the regression setting with $n\neq p$, your identity is obviously incorrect. Even in the case where $n = p$, the matrices must be very special in order for your identity/property to hold.