Linear regression and standardization

137 Views Asked by At

I am trying to use a linear regression to model an expected value Y for an input X.

X and Y have a large difference between them, so I was converting to standard (z) score, doing my calculation (finding where Y would be when X is at a certain level) and converting back to the raw score for Y.

However I am getting numbers that make me think I'm doing something wrong.

Is this a valid way to predict what a "raw score" would be, or should I be doing this another way? It's been many years since I've taken statistics.

1

There are 1 best solutions below

2
On BEST ANSWER

If you standardize both your $X$ and your $Y$, then you are predicting the values of standardized $Y$, in which case you can convert it back to its original scale after prediction. You can also only standardize your predictor $X$ while leaving your $Y$ unstandardized (this is more common) in order to predict $Y$ based on the way $X$ deviates from its mean.

Standardization usually involves centering and/or scaling. Centering transforms your variables so that their mean is $0$ whereas scaling transforms your variables so that their standard deviation is $1$.

If your values don't make sense, make sure that no mistakes are being made during standardization. You should also check to make sure that your data satisfies the assumptions necessary for linear regression to provide a good fit. Issues like non-normality of your residuals and heteroscedasticity of the data will mean that your model will produce erroneous results.