Distribution of residual term in regression.

2k Views Asked by At

In regression analysis for classical linear regression model the residual term is independent of x and y and normally distributed and it is a random variable but i found somewhere written u~N and u~NID.I cannot understand the difference so can someone explain the meaning of NID (normally and independently distributed)?(sorry for my bad English).

2

There are 2 best solutions below

2
On

In linear regression with Gaussian (and heteroscedastic) noise, our model assumes that for $n$ observations of data, for each $i \in [n]$,

$$Y_i = \beta X_i + \epsilon_i,$$

where $\epsilon_i$ is our ERROR term for the $i$th observation (note that residual $e_i$ is an estimator of $\epsilon_i$) Such that $\epsilon_i \sim N(0,\sigma^2_i).$ NID means "Gaussian and independently distributed", which is essentially a slightly more lenient way of saying that $\forall i \in [n],$ $\epsilon_i$ is independent of $\epsilon_j$, $j \neq i$ (i.e. errors are independent across observations).

Note that our residual $e_i$ is not necessarily independent of $Y_i$ depending on how we estimate $\epsilon_i$. Most of the time, our residuals are modeled as $e_i = \hat{Y}_i - Y_i,$ where $\hat{Y}_i$ is our prediction for the $i$th observation generated as

$$\hat{Y}_i = \hat{\beta}X_i.$$

In this case, $e_i$ is not independent of $Y_i$.

0
On

"In regression analysis for classical linear regression model the residual term is independent of x and y." The error term must be assumed to be independent of x if the regression is to be unbiased, but it cannot be independent of y since it is a stochastic driver of y (y = a + b*x + error). The assumption that the error term is normally distributed is in general NOT required, unless the sample is small and the researcher wishes to rely upon "exact" t-statistics.

Beware, the answer shown above is semi-informed gibberish, from someone who doesn't know what "heteroscedastic" means, nor how a regression residual is defined.