How can we interpret residual plot in case we have many variables?

50 Views Asked by At

In Residual plots, we try to visualize & interpret whether linearity is valid or not in the linear regression model. One way to do this is to plot error term wrt to the independent variable(say x). If the linearity condition is valid, we may expect a uniform width(constant variance) of errors (provided we have sufficient data points) and error well distributed within that width as shown below. But suppose we have more than 2 independent variables, in such case, it is not possible to plot and make an interpretation. Is there an alternative way to do this?below diagram

1

There are 1 best solutions below

1
On BEST ANSWER
  1. Usually you plot the residual vs. the predicted values $\hat{y}$. The predicted values are linear combinations of the $x$s, i.e., $\hat{y}_i = \sum_{j=1}^p x_{ij} \hat{\beta}_j$, for $i=1,..,n$.

  2. You have some confusion between constant variance, which is the uniform width, and linearity. You may observe constant width with a systematic and nonlinear trends, or vice verse - heteroscedasticity with perfectly linear trend.

  3. You may consider a formal test for detection of particular nonlinear structures, e.g., polynomials. Ramsey RESET test: https://en.wikipedia.org/wiki/Ramsey_RESET_test#:~:text=In%20statistics%2C%20the%20Ramsey%20Regression,help%20explain%20the%20response%20variable.