Providing the theoretical steps to prove the formula for the confidence interval in simple linear regression

154 Views Asked by At

This is the exact question

Explain from first principles the theoretical steps required to prove the formula for the confidence interval for the mean response in the simple linear regression model corresponding to a value $x_0$ of the predictor. You may assume without proof that $\operatorname{cov}\left(\overline y, \widehat b\right) =0.$

I don't really understand the question because it's not asking me to prove anything just describe what I'd do if I was trying to prove it, can somebody help me out?

1

There are 1 best solutions below

0
On

Here's a short version of the answer. You have $$ Y_i \sim N(a+ bx_i, \sigma^2) \quad\text{for } i = 1,\ldots, n, $$ i.e. the expected value of $Y_i$ given the value of $x_i$ is $a+bx_i,$ and the error has variance $\sigma^2.$

Thus $Y_i$ is treated as random and $x_i$ is not. The justification for not treating $x_i$ as random is often (not always) that one speaks of the conditional distribution of $Y_i$ given $x_i.$

Let $\overline x = (x_1+\cdots+x_n)/n$ and $\overline Y = (Y_1+\cdots+Y_n)/n.$ Then the least-squares estimates of $b$ and $a$ are $$ \widehat{\,b\,} = \dfrac{\sum_{i=1}^n(x_i-\overline x)(Y_i - \overline Y)}{\sum_{i=1}^n (x_i - \overline x)^2}, \qquad \widehat a = \overline Y - \widehat b\overline x. $$ Then $\widehat a$ and $\widehat {\,b\,}$ are random whereas $a$ and $b$ are not. Although we have $$ \frac 1 {\sigma^2} \sum_{i=1}^n (y_i - (a+bx_i))^2 \sim \chi^2_n, $$ we have $$ \frac 1 {\sigma^2} \sum_{i=1}^n (y_i - (\widehat a+\widehat{\,b\,}x_i))^2 \sim \chi^2_{n-2}. \tag 1 $$ Just why $(1)$ has that distribution is something I would normally explain with matrix algebra.

Now notice that $$ \widehat a + \widehat{\,b\,}x_i = \overline Y + \widehat{\,b\,} (x_i - \overline x) \sim N( a+bx_i,\,\underbrace{\operatorname{var}(\,\overline Y\,) + \operatorname{var}(\,\widehat{\,b\,}\,) (x_i-\overline x)^2\,}_{\text{no covariance term}} \,). \tag 2 $$ There is no covariance term because of the fact mentioned in the question.

The one additional fact needed is that the quantities mentioned in $(1)$ and $(2)$ are independent.

Then go back to the fact that $\dfrac Z {\chi_{n-2}/\sqrt{n-2}}$ has a t-distribution and do a bunch of routine algebra.