Data point on regression line: Effect on estimates (simple linear regression)

776 Views Asked by At

I am concerned with the simple linear regression model,

$y_k = a + bx_k + \epsilon_k$,

where $(\epsilon_k)$ are iid normal with mean $0$ and $k=1,...,n$; here $n$ is the number of observations.

I know the usual estimators for $a$ and $b$ from MLE or LS which coincide.

What I am interested in is the effect of a new data point $(x_{n+1},y_{n+1})$ placed on the regression line, that is $y_{n+1}=\hat{a}+\hat{b}x_{n+1}$, where $\hat{a}$ and $\hat{b}$ are the ML-estimates based on the first $n$ observations. Intuitively I would expect new estimates $\tilde{a}$ and $\tilde{b}$ based on $n+1$ observations to equal the old estimates. I have ventured into some tiresome calculations using the known closed formulas for the MLE without success.

Is the conjecture right - and in that case: Is it easily proved?

In addition: I would expect that e.g. $y_{n+1}>\hat{a}+\hat{b}x_{n+1}$ to yield $\tilde{b}>\hat{b}$. Is this true?

Edit: The last statement is of course under the assumption that $x_{n+1}>\max\{x_1,...x_n\}$.

1

There are 1 best solutions below

8
On BEST ANSWER

Yes, your first expectation is right. In the least-squares formulation, the original regression line has, by definition, the least sum of squared errors for the original data points, and zero error for the new data point, and any other line has non-negative squared error for the new data point; thus the original regression line also has the least sum of squared errors for all of the points.

The additional expectation in the last paragraph is wrong, though. Adding a point above the original regression line will shift the line upwards at that point, but whether that increases or decreases its slope depends on the position of the point. If it's the point with highest $x$ coordinate, it will increase the slope, but if it's the point with lowest $x$ coordinate, it will decrease the slope. (Or perhaps you meant to imply that the points are in order of increasing $x$ coordinates and thus the $(n+1)$-th point added has the highest $x$ coordinate?)