Prediction intervals with OLS and indicator variables

38 Views Asked by At

Suppose I have a model like so, call it the first model:

$$E[y] = \beta_0+\beta_1x+\beta_2x_m+\beta_3(x\cdot x_m) $$

where $x_m$ is an indicator variable. I fit it using ordinary least squares.

Rather than fit this I can also just fit models solely to those data points where $x_m =1$ and those where $x_m = 0$ separately. Let's call this the second model.

When I do this with some data my output shows that the estimates from both the first and second model are the same but the prediction intervals for the second model are slightly wider. Why is that?

In the formula for the prediction intervals there is a $\frac 1 n$ term which would be smaller for the model with the indicator but intuitively it doesn't make sense. I have more data but it is data that is not applicable, being of a different class.

1

There are 1 best solutions below

0
On

The prediction intervals are wider when the two are fit separately, because population variances are estimated based on smaller samples; hence there is greater uncertainty about them. In the first model, only a single population variancce is estimated, and it is based on a larger sample than those that estimated the two separate population variances in the second model. Being based on a larger sample, it has less uncertainty. Hence a shorter prediction interval. Notice the larger denominator in that case.