L2-norm with estimated weights

977 Views Asked by At

Suppose I'm performing linear regression. My lecturer said the formula below can be used for estimating the weight vector that is passed to the L2-norm part of the loss function but he didn't elaborate. I have 2 questions. When is it a good idea to do so and why? And if I am to do the gradient descent manually do I have to update these weights in addition to the "normal" ones?

$$w = (X^TX + λI_p)^{-1}X^Ty$$

X - design matrix;

y - vector from training data (x,y)

$I_p$ - identity matrix where p is the dimension of the weight vector

λ - regularization coefficient

1

There are 1 best solutions below

2
On BEST ANSWER

This is called the Ridge regression. The original motivation for using Ridge regression was to deal with complete and high colinearity between the explained variables. The larger the $\lambda$ the more stable model you will have and less information will be used from the design matrix $X^TX$. Another effect of the so-called $l_2$ regularization is shrinkage of the weights $w$. The largest the $\lambda$ the smaller the weights. This introduces bias, but reduces variance (as part of the bias-variance trade-off). Regarding the second question - there is a close form solution, $(X'X - \lambda I )^{-1} X'y$, no need in gradient descent.