I want to predict yield(y), I have independent variables are rain(x1) and Soil(x2)
yield(y) : 25000, 26000, 27000, 28000, 29000
Rain_mm (x1) : 1000, 875, 852, 1005, 1250
Soil ( x2) : 0, 0, 1, 1, 2,
Next, I was normalize this data to scale in same range using this code :
from sklearn.preprocessing import Normalizer
import pandas
import numpy
url = "data.csv"
dataframe = pandas.read_csv(url)
array = dataframe.values
# separate array into input and output components
X = array[:,0:3]
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)
next, I have use regression formula ,using this normalizedX values ,
suppose now we have bo , b1 and b2 value . I want to predict yield using regression model.
yield = b0 + b1 * rain + b2 * soil
for example : bo = 1.25 , b1 =0.45 , b2 =-0.36
Now, I am confusing when I use this regression equation for predict yield(y), I get all value in the scale range data, (in normalize data), I want to convert this numbers in original yield.
for example : if I have plug value rain 1000, and soil 0 , then yield will be comes 25000 or any predicted number but not in 0 to 1 range.
Any one help me how to get that? or I am doing wrong something? any hints?
I'll begin by stating that normalization (scaling/shifting) of the data used in training a regression model isn't strictly necessary, and won't typically affect the accuracy of prediction for new input values. I say typically because in cases where the inputs or the predicted values approach the upper/lower limits of digital representation you may experience truncation errors or data type overflow, though with a dynamically typed language such as python, these worries should be mostly hidden from the user.
I would encourage you to perform the regression fitting on the unnormalized training data. If you insist on performing the normalization pre-processing then you can use a sklearn.preprocessing class that implements the inverse_transform() function (like the StandardNormalizer class). You can then fit the class to the training data and later apply the inverse_transform() function to the predicted output value(s).
The class you are using: "Normalizer" is performing L2-norm normalization on a per-sample basis by default, which is not recommended for the regression problem because the adapted weights in your model will be seeing different scales for a particular feature across the training sample. I don't expect your regression model to accurately predict the outcome of a new input vector when the "Normalizer" pre-processing class is employed.