$L^2$ norm regularization in linear regression

49 Views Asked by Bumbble Comm At 11 May 2026 - 7:15

I'm reading Deep Learning (Ian Goodfellow e Yoshua Bengio) and I'm stuck in this section. The authors try to show how the $L^2$ norm regularization impact on a simple linear model.

Reducing the sum of squared error with the addition of $L^2$ regularization leads to:

$$w = (X^TX + \alpha I)^{-1}X^Ty$$

The matrix $X^TX$ is proportional to the variance covariance matrix of the data ($\frac{1}{m}X^TX$). The authors continue stating "The diagonal entries of this matrix $(X^TX)$ corresponds to the variance of each input feature. We can see that $L^2$ regularization causes the learning algorithm to perceive the input $X$ as having higher variance, which makes it shrink the weights on features whose covariance with the output is low compared to this added variance".

I understand why the $L^2$ regularization increases by $\alpha$ the variance perceived but why this should reduce the weights whose covariance with the output is lower to this added variance? Sorry for my bad english, I hope the question is clear. Both qualitative and quantitative explanations are well accepted.

Original Q&A

$L^2$ norm regularization in linear regression

Related Questions in LINEAR-REGRESSION

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions