Reliability of linear regression to predict future

4.2k Views Asked by At

When we have a set of data, where X is the cause, and Y is the effect, we can use linear regression to predict values for Y, based on values of X.

I have learned that you may only safely apply this for values of X that fall into the domain of X for the input data.

Can we also use linear regression to do reliable predictions about values of Y for values X that lie outside of this domain, and if so, what can we say about the reliability of these predictions?

I would love some answers and possibly some interesting sources on this subject.

3

There are 3 best solutions below

0
On BEST ANSWER

XKCD explains it perfectly in this comic, which is

XKCD Weddings & Extrapolation

0
On

In general, it's not advised as the uncertainty of such an extrapolation cannot be quantified (the functional relationship of X on Y could drastically change outside your data). However, if you assume that the observed linear relationship holds outside the range of the data, then you can form a prediction interval

1
On

All machine learning approaches, including linear regression, may suffer from overfitting the data. To avoid this, one can include a penalty or regularization term that penalizes "unlikely" parameter values. You may wish to google the term "Lasso" or look here http://statweb.stanford.edu/~tibs/lasso.html