We have a new observation $x_0$, whose response will be $Y_0 = \beta_0+\beta_1x_0+\epsilon_0$. We want to predict $Y_0$.
The estimator that we use is $\hat{Y}_0 = \hat{\beta}_0+\hat{\beta}_1x_0$.
The books goes on finding $E[\hat{Y}_0-Y_0]=0$ and $var[\hat{Y}_0-Y_0]=\sigma^2(1+h_{00})$. Also it says that since $Y_0$ is a random variable and it is normally distributed, we know that we can write $$0.95 = P\left(-c_1 < \frac{\hat{Y}_0 - Y_0}{\hat{\sigma} (1+h_{00})} < c_2 \right)$$ and then goes on finding the prediction interval.
So a couple of my questions are:
- How do we know $Y_0$ is normally distributed? But more importantly, how do we know it is a random variable?
- if $Y_0$ is a random variable, then why the responses $Y_i = \beta_0+\beta_1x_i+\epsilon_i$ are not treated as random variables? (or are they?)
- Finally, if $Y_0$ and $Y_i$ are treated in the same way, i.e. they have the same distribution, how come their estimators $\hat{Y}_0 \equiv \hat{\beta}_0+\hat{\beta_1}x_0$ and $\hat{Y}_i \equiv \hat{\beta}_0+\hat{\beta_1}x_i$ are not the same thing?
They're all random variables. In simple linear regression each response, including the future response, is a normally distributed random variable because of the $\epsilon$ term, which by assumption has a normal distribution with zero mean.
When you derive formulas for statistical inference (confidence intervals, prediction intervals, etc.) you are manipulating random variables. The statement "let $X_1, X_2,\ldots, X_n$ be a random sample..." is an assertion that the $X$'s are random variables. This is the reason why you consult the normal table or the $t$ table or whatever table when finding critical values for confidence intervals -- you manipulate those random variables into a "pivotal" form (involving the random sample and the parameter of interest) that has that particular distribution.
The approach behind constructing confidence intervals for a parameter is to make a probability statement on this pivot, say $P(-t_{\alpha/2} < T < t_{\alpha/2}) = 1-\alpha$. You then rearrange the inequality into a equivalent statement -- a confidence interval -- with the parameter at the center, for which the probability statement still holds. Once you observe your random sample, you plug in the observed values into your formula, and presto! you've got the observed value for your confidence interval. The same logic applies to constructing prediction intervals for a future observation.
To answer your last question, the difference between $\hat Y_0$ and $\hat Y_i$ is that the future observation hasn't been observed yet. The beta hats involve only the observed variables, since the only prediction you can make on a future observation is one based on the responses you've observed.