This is a ridge regression problem. The following two problems are equivalent:
$(w_t, b_\lambda ) = argmin_{w,b}\{\sum_{i=1}^m (y_i-b-w^Tx_i)^2+\lambda w^Tw\} $
$(w_t, b_\lambda )= argmin_{w,b}\{\sum_{i=1}^m (y_i-b-w^T(x_i-\bar x))^2+\lambda w^Tw\} $
where:
- $\bar x$ is the average of the input data.
- $\lambda$ defines a trade-off between the error on the data and the norm of the vector $w$ (degree of regularization)
- (I'm assuming $b$ is a bias term)
I can't work out why, mathematically, centering the data (which is what I assume is happening here), has no effect.
Not looking for the answer, just a push in the right direction. Intuitively, I expect it's because simply scaling back each data point will have no effect on the final $w$ vector, because it's about the relationship between the data points. Showing this mathematically, however, I'm unsure.