How to derive the kth coefficient standard error?

62 Views Asked by At

Given a multiple regression with the usual assumptions satisfied, with $X \in R^{n \times p}$

$$ y = X \beta + e $$

I know that the estimated variance is given by $\sigma^2 (X^TX)^{-1}$. But what I want to know is the estimated variance for the $k^{th}$ coefficient out of this whole covariance matrix.

I've seen from the resource A (cited below) that this value is the $k^{th}$ diagonal term of the $\sigma^2 (X^TX)^{-1}$. But more interestingly (AND THIS IS WHAT I WANT TO PROVE), this value is:

$$ \dfrac{\sigma^2}{(1-R^2_k) \sum_{i=1}^n (x_{ik} - \bar{x_k})^2 } $$

Here, $x_k$ is the $k^{th}$ column of $X$. $R^2_k$ is the $R^2$ from regressing $x_k$ on $X_{(k)}$(=$X$ after taking $k^{th}$ column out).

Here is my (failed) attempt at this derivation:

Using resource B (cited below), the $k^{th}$ diagonal value of $\sigma^2 (X^TX)^{-1}$ is:

$$\sigma^2 [x_k^Tx_k - x_k^T X_{(k)} (X_{(k)}^TX_{(k)})^{-1}X_{(k)}^T x_k ] ^ {-1}$$

Since $X_{(k)} (X_{(k)}^TX_{(k)})^{-1}X_{(k)}^T x_k$ can be seen as projection of $x_k$ onto column space of $X_{(k)}$, we can say:

$$ X_{(k)} (X_{(k)}^TX_{(k)})^{-1}X_{(k)}^T x_k = \hat{x_k} $$, which is the regression prediction after regressing $x_k$ on $X_{(k)}$.

So our diagonal value simplifies to:

$$ \sigma^2 [x_k^Tx_k - x_k^T X_{(k)} (X_{(k)}^TX_{(k)})^{-1}X_{(k)}^T x_k ] ^ {-1}= \sigma^2 [ x_k^Tx_k - x_k^T \hat{x_k} ] ^ {-1}$$

Now, since (I will omit subscript $i$ so $x_k = x_{ik}$)

$$ R^2_k = 1 - \dfrac{ \sum (x_k - \hat{x_k} )^2 }{ \sum (x_k - \bar{x_k} )^2 } $$

$$ \dfrac{\sigma^2}{(1-R^2_k) \sum (x_{k} - \bar{x_k})^2 } $$

will simplify to:

$$ \dfrac{\sigma^2}{ \sum (x_{k} - \hat{x_k})^2 } $$

So in summary, we want to show that:

$$ \sigma^2 [ x_k^Tx_k - x_k^T \hat{x_k} ] ^ {-1} = \dfrac{\sigma^2}{ \sum (x_{k} - \hat{x_k})^2 }$$

But then,

$$ \sigma^2 [ x_k^Tx_k - x_k^T \hat{x_k} ] ^ {-1} = \dfrac{\sigma^2}{\sum (x_k)^2 - \sum x_k \hat{x_k} } $$

where as,

$$ \dfrac{\sigma^2}{ \sum (x_{k} - \hat{x_k})^2 } = \dfrac{\sigma^2}{\sum (x_k)^2 - 2\sum x_k \hat{x_k} + \sum x_k^2 } $$

Could someone please help me find where I am making a mistake, and if this is not the right approach, help me derive it correctly?

Resource A: http://people.stern.nyu.edu/wgreene/MathStat/GreeneChapter4.pdf (page 40)

Resource B: About the diagonal entries of an inverse matrix

1

There are 1 best solutions below

0
On BEST ANSWER

You are on the right track. Putting the subscript $i$ back into your equation, you are trying to show that $$x_k^T x_k-x_k^T \hat x_k =\sum_i (x_{ik}-\hat x_{ik})^2. \tag1$$ The RHS of (1) can be written in matrix form as $(x_k-\hat x_k)^T(x_k-\hat x_k)$. So comparing this to the LHS, it is enough to show that $$\hat x_k^T(x_k-\hat x_k)=0.\tag2$$ But (2) is a consequence of:

Claim: In least squares regression with the usual assumptions, the predicted response vector $\hat y$ is orthogonal to the residual vector $y-\hat y$.

Proof: Write $\hat y=Hy$ where $H:= X(X^TX)^{-1}X^T$ is symmetric (i.e., $H^T=H$) and idempotent (i.e., $H^2=H$). Therefore $$\hat y^T(y-\hat y)=(Hy)^T(y-Hy)=y^TH^T(I-H)y=y^TH(I-H)y=0.$$