Why does regularization have an effect in linear classifiers?

32 Views Asked by At

I'm struggling to understand how regularisation, for example using the l1 or l2 norm, has any effect on linear classification problems.

If we have a simple binary classification task where we are trying to find a weight vector $w$ to classify a number of data points $x_{1...n}$ where the predicted value $\hat{y}$ is $+1$ if $w^T x_i \geq 0$ and $-1$ if $w^Tx_i < 0$ where a loss function based on the logistic loss is used with $l_1$ regularization such that $L(w) = log(1 + exp(-y_iw^Tx_i)) + \frac{\lambda}{2}||w||_1$ then why does the regularization have any effect on the resulting weight vector? For any value of $\lambda$ could we not just reduce the magnitude of $w$ sufficiently such that the regularization term has a negligible effect on the loss but keep $w$ proportional to the optimal $w$ for the training set? For example, if the optimal $w$ is $[1, 1]$ then just reducing $w$ by the necessary factor to get $[0.00001, 0.00001]$ and thereby ignoring the effects of regularization. I'm not sure I see how the parts of the loss function that penalise misclassification would not just take full priority as they have a significantly greater effect on the resulting loss.

Any help would be appreciated. Thank you.