Does scatterplot matrix "work" with quadratic variables?

58 Views Asked by At

basically I want to plot a scatterplot matrix using a few variables. For simplicity lets say my model is: $$z=\alpha_0 + \alpha_1w+\alpha_2x+\alpha_3y+\alpha_4y^2 + \epsilon$$ When I plot the matrix, I got that all of the explanatory variables exhibit a (strong) positive relationship to the response variable.

Also, given the data, I regress the variables, and eliminated some irrelevant variables. I got that w is irrelevant, and the end regression is: $$z=0.98234+1.02852x+0.38271y-0.83721y^2+\epsilon$$ I know that this end regression is right, because $y^2$ is supposed to have a negative relationship with the response variable however why does the scatterplot matrix fail to capture this?

1

There are 1 best solutions below

2
On BEST ANSWER

In multivariate analysis, the sign of the correlation is not necessarily equal to the sign of the coefficient in the equation.

This is discussed here: https://stats.stackexchange.com/questions/34151/positive-correlation-and-negative-regressor-coefficient-sign

It may help to consider the two-variable case. In this special case the sign of the correlation coefficient will be the same as the sign of the coefficient in the equation, but the size of the coefficient has no relation to the correlation.

One set of data could have $y=100 + 0.01x$ and have a very strong positive correlation (most of the data on the line).

A second set of data could have $y=100 + 20x$ and have a very weak positive correlation (data scattered widely either side of the line).