Simple Linear regression assumption

261 Views Asked by At

What do you mean by a distribution is homoscedastic (ie, $\sigma(Y|X = x) = \sigma$) in the context of simple linear regression?

why do we need this assumption in simple linear regression?

What will happen to the regession if a distribution is not homoscedastic?

3

There are 3 best solutions below

4
On BEST ANSWER

As an example consider the following data-set:

enter image description here

The blue thick line is a linear fit to the red points. I added the dashed lines as a visual guide, but you can see that the dispersion around the prediction grows with $x$, that is, $\sigma$ is not constant. Moreover, probably $\sigma$ and $x$ are not independent.

If this is the case, then:

$$ \sigma(Y|X = x) \not= \sigma $$

This is an example where the homoscedastic assumption fails. In these situations you should be careful, since the best linear unbiased estimator (BLUE) of the coefficients is no longer provided by the standard ordinary linear square (OLS). See this link for further details.

0
On

Homoscedasticity means that the variance is constant and does not depend on your parameter, in this case $X$. If the dependent parameter, $Y$, was not homoscedastic with $X$ then it would skew the results when performing ordinary linear regression.

0
On

Heteroscedasticity means that the variance of the noise term $\epsilon_i$ is not-constant, i.e., $\epsilon_i |X \sim \mathcal{N}(0,\sigma_i^2)$, it does not mean that $\epsilon_i$ necessarily depends on $X$.

Why do we need this assumption?

Actually, we don't. The least squares method works fine with constant and non constant variance. With some considerations of conversion and overall model stability it is a good (actually crucial) thing to have finite variance of the error term (and the $X$s).

What will happen to the regression if a distribution is not homoscedastic?

Let us start with what will not happen. The OLS estimators can be derived as usual, and they will remain unbiased estimators. However, they will not longer be "best", i.e., they will have larger variance that can be obtained for unbiased estimators. Another problem is statistical inference on the coefficients that is based on comparison to $F$ distribution values - will not be valid anymore as the test statistic is no longer a ratio of chi-squared random variables. However, this problem can be solved by applying WLS (weighted least squares) instead of the ordinary LS.