Difference beween Residual and Error in Regression

Question

Difference beween Residual and Error in Regression

1k Views Asked by pseudo_teetotaler At 04 Jun 2025 - 12:06

Can anyone please explain the difference between residual and error in regression problems? According to me both are same as :

$Y = Z*\beta + \epsilon$ Here $\epsilon$ is error.

And residual is defined as $\hat{\epsilon}$= $Y-\hat{Y}$ ,and since $\hat{Y} = Z*\beta $ ,

$=>$ $\hat{\epsilon}$= $Y-\hat{Y}$ = $Z*\beta + \epsilon - Z*\beta$ = $\epsilon$

I just am not able to understand the difference between residual and error. Any help on this is highly appreciated.

Original Q&A

There are 2 best solutions below

**Gaussian0617** · Answer 1 · 2015-10-28 18:33:24

The error $\epsilon$ is a theoretical representation of random "noise" in your model; this is also known as irreducible error as it represents the amount by which you expect observations to deviate from a perfect model.

The residuals $\hat{\epsilon}$ are a measurement of how much your observations deviated from your fitted model. They are an estimate of your error $\epsilon$.

In statistics, the $\hat{}$ symbol is usually placed over values that estimate model parameters using observed data. That's what's going on here; $\hat{\epsilon}$ is just your estimate for $\epsilon$ given your data.

**Michael Hardy** · Answer 2 · 2021-12-20 03:31:17

Start with a simpler probem: Suppose $X_1,\ldots,X_n$ are uncorrelated (a weaker assumption than that of indpeendence) and all have the same expected value $\mu$ and the same variance $\sigma^2<+\infty$ (a weaker assumption than that of identical distribution).

Let $\overline X = (X_1+\cdots+X_n)/n.$ This is the sample mean, and can easily be shown to have expected value $\mu$ and variance $\sigma^2/n.$

\begin{align} \text{The } {\bf\text{errors}} \text{ are } \varepsilon_i & = X_i-\mu. \\[4pt] \text{The } {\bf\text{residuals}} \text{ are } \widehat{\varepsilon\,}_i & = X_i - \overline X. \end{align}

The errors are uncorrelated.

The residuals are forced to satisfy the linear constraint $\widehat{\varepsilon\,}_1+\cdots+\widehat{\varepsilon\,}_n = 0,$ so they are negatively correlated.

The residuals are observable estimates of the unobservable errors, when one can observe the data $X_1,\ldots,X_n$ but not the population parameters $\mu$ and $\sigma.$

Now consider $Y_i = \alpha+\beta_1 x_{1,i} + \cdots + \beta_p x_{p,i} + \varepsilon_i$ for $i=1,\ldots,n,$ where $\varepsilon_1,\ldots,\varepsilon_n$ are uncorrelated and each has expected value $0$ and variance $\sigma^2<+\infty,$ and $p\ll n.$

Let $\widehat\alpha, \widehat\beta_1,\ldots,\widehat\beta_p$ be the least-squares estimates of $\alpha, \beta_1,\ldots,\beta_p.$

\begin{align} \text{The } {\bf\text{errors}} \text{ are } \varepsilon_i & = Y_i - \big( \alpha+\beta_1 x_{1,i} + \cdots + \beta_p x_{p,i} \big). \\[4pt] \text{The } {\bf\text{residuals}} \text{ are } \widehat{\varepsilon\,}_i & = Y_i - \widehat Y_i = Y_i - \big( \widehat\alpha+\widehat\beta_1 x_{1,i} + \cdots + \widehat\beta_p x_{p,i} \big). \end{align}

The errors are uncorrelated.

The residuals are correlated in a way that depends on the $x\text{s}.$ They are forced to satisfy the $p+1$ linear constraints $\widehat{\varepsilon\,}_1+\cdots+\widehat{\varepsilon\,}_n = 0,$ and $\widehat{\varepsilon\,}_1 x_{j,1}+\cdots+\widehat{\varepsilon\,}_n x_{j,n} = 0,$ for $j=1,\ldots,p.$

And again, the residuals are observable estimates of unobservable errors.

Difference beween Residual and Error in Regression

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in REGRESSION-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions