after normalize data, Using multiple regression analysis how to predict y?

761 Views Asked by At

I want to predict yield(y), I have independent variables are rain(x1) and Soil(x2)

yield(y) : 25000, 26000, 27000, 28000, 29000

Rain_mm (x1) : 1000, 875, 852, 1005, 1250

Soil ( x2) : 0, 0, 1, 1, 2,

Next, I was normalize this data to scale in same range using this code :

from sklearn.preprocessing import Normalizer
import pandas
import numpy
url = "data.csv"
dataframe = pandas.read_csv(url)
array = dataframe.values
# separate array into input and output components
X = array[:,0:3]
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)

next, I have use regression formula ,using this normalizedX values ,

suppose now we have bo , b1 and b2 value . I want to predict yield using regression model.

yield = b0 + b1 * rain + b2 * soil

for example : bo = 1.25 , b1 =0.45 , b2 =-0.36

Now, I am confusing when I use this regression equation for predict yield(y), I get all value in the scale range data, (in normalize data), I want to convert this numbers in original yield.

for example : if I have plug value rain 1000, and soil 0 , then yield will be comes 25000 or any predicted number but not in 0 to 1 range.

Any one help me how to get that? or I am doing wrong something? any hints?

1

There are 1 best solutions below

3
On BEST ANSWER

I'll begin by stating that normalization (scaling/shifting) of the data used in training a regression model isn't strictly necessary, and won't typically affect the accuracy of prediction for new input values. I say typically because in cases where the inputs or the predicted values approach the upper/lower limits of digital representation you may experience truncation errors or data type overflow, though with a dynamically typed language such as python, these worries should be mostly hidden from the user.

I would encourage you to perform the regression fitting on the unnormalized training data. If you insist on performing the normalization pre-processing then you can use a sklearn.preprocessing class that implements the inverse_transform() function (like the StandardNormalizer class). You can then fit the class to the training data and later apply the inverse_transform() function to the predicted output value(s).

The class you are using: "Normalizer" is performing L2-norm normalization on a per-sample basis by default, which is not recommended for the regression problem because the adapted weights in your model will be seeing different scales for a particular feature across the training sample. I don't expect your regression model to accurately predict the outcome of a new input vector when the "Normalizer" pre-processing class is employed.