Given that linear regression or polynomial regression can be represented as:
$\textbf{w} = (X^{T}X)^{-1}X^{T}Y$
It is standard practice in machine learning to scale each column in their training sets by the maximum value in that column, forcing each column j to have values $ -1 \leq x_{j} \leq 1 $. I am having trouble figuring out how this effect translates in to the weight vector.
Lets say if we scaled everything by a constant term $c$. Then we are left with the following:
$\textbf{w} = \frac{1}{c}(X^{T}X)^{-1}X^{T}Y$
This effectively cancels out when we actually try to predict an output for a given scaled input. But how does this work out when each column is multiplied by a different scalar? Is there a certain bound it places on the weights as a result?
Exactly the same idea. Heres another way to look at it.
Lets write out the regression equation:
$Y=\alpha + \beta_1x_1 + \beta_2x_2 +\epsilon$
Let $c_1,c_2$ be the scaling of $x_1$ and $x_2$, respectively. The regression equation can then be re-written as:
$Y=\alpha + (\beta_1c_1)\frac{x_1}{c_1} + (\beta_2c_2)\frac{x_2}{c_2} +\epsilon$
Thus, scaling your variables will result in a corresponding change on each of your regression coefficients.