I'm struggling to understand how regularisation, for example using the l1 or l2 norm, has any effect on linear classification problems.
If we have a simple binary classification task where we are trying to find a weight vector $w$ to classify a number of data points $x_{1...n}$ where the predicted value $\hat{y}$ is $+1$ if $w^T x_i \geq 0$ and $-1$ if $w^Tx_i < 0$ where a loss function based on the logistic loss is used with $l_1$ regularization such that $L(w) = log(1 + exp(-y_iw^Tx_i)) + \frac{\lambda}{2}||w||_1$ then why does the regularization have any effect on the resulting weight vector? For any value of $\lambda$ could we not just reduce the magnitude of $w$ sufficiently such that the regularization term has a negligible effect on the loss but keep $w$ proportional to the optimal $w$ for the training set? For example, if the optimal $w$ is $[1, 1]$ then just reducing $w$ by the necessary factor to get $[0.00001, 0.00001]$ and thereby ignoring the effects of regularization. I'm not sure I see how the parts of the loss function that penalise misclassification would not just take full priority as they have a significantly greater effect on the resulting loss.
Any help would be appreciated. Thank you.