Proof regarding Ridge and Lasso regularization

Question

Proof regarding Ridge and Lasso regularization

127 Views Asked by Bumbble Comm At 24 Feb 2026 - 11:52

I've a problem with understanding this exercise. Would be very happy to receive a little help here. Thanks ![enter image description here ]1

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

You have some data $\mathcal{D} = \{ (x_i, y_i) \}_{i=1,\ldots, n}$ but instead of a model $f(x) = \beta x$ we are duplicating the predictor variable. You can imagine it like we take the original dataset, e.g $$\mathcal{D} = \{ (1,2), (4,8), (7, 14) \}$$ and duplicate the $x_i$ to get $$\mathcal{D}' = \{ (1,1,2) , (4,4,8), (7,7,14) \}$$

A linear model for $\mathcal{D}'$ would look like $f(x_1, x_2) = \beta_1 x_1 + \beta_2 x_2.$ Since we know $x_1 = x_2 = x$ the linear model is more simply written as $f(x) = \beta_1 x + \beta_2 x.$ The RSS for the linear model is $ \sum_i | y_i - f(x_i) |^2 = \sum_i | y_i - (\beta_1+\beta_2) x_i |^2.$ The ridge regression penalty on such a model is $\lambda(\beta_1^2 + \beta_2^2)$ and the lasso penalty is $|\beta_1| + |\beta_2|.$

a)

The loss in the Ridge regression model is $$L(\beta_1, \beta_2) = \sum_i | y_i - (\beta_1 + \beta_2) x_i|^2 + \lambda (\beta_1^2 + \beta_2^2)$$

Now suppose that $\hat{\beta_1}, \hat{\beta_2}$ optimize the loss. Using the fact that $0 \leq (x-y)^2$ with equality if and only if $x=y,$ you should verify that $$L\left( \frac{ \hat{\beta_1} + \hat{\beta_2} }{2}, \frac{ \hat{\beta_1} + \hat{\beta_2}}{2} \right) \leq L(\hat{\beta_1}, \hat{\beta_2})$$ with equality if and only if $\hat{\beta_1} = \hat{\beta_2}.$ Note that we must have equality, since by assumption $L(\hat{\beta_2}, \hat{\beta_2})$ is minimal. So we see that the optimal solution always has $\hat{\beta_1} = \hat{\beta_2}.$

b)

The loss in the Lasso regression model is $$L(\beta_1, \beta_2) = \sum_i | y_i - (\beta_1 + \beta_2) x_i|^2 + \lambda (|\beta_1| + |\beta_2|)$$ and you can see for yourself why for a given $\beta,$ all $\beta_1, \beta_2$ such that $\beta_1 + \beta_2 = \beta$ and having the same sign yields the same loss function, so there are an infinite number of pairs $(\hat{\beta_1}, \hat{\beta_2})$ which optimize the loss function. A concrete example of this statement is that the linear model $f(x_1, x_2) = 2x_1 + 3x_2$ has the same RSS and same Lasso penalty as $f(x_1, x_2) = 3x_1 + 2x_2,$ because in this problem $x_1 = x_2 = x.$

There is a major lesson to take from this exercise. It is quite common to see someone perform a Lasso regression and interpret the optimal parameters of the model as measures of how important a particular feature is in predicting the target. As we see from this example, if linear relationships exist between the features then the parameters can not be interpreted in that way.

Proof regarding Ridge and Lasso regularization

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in REGULARIZATION

Related Questions in COMPUTATIONAL-SCIENCE

Trending Questions

Popular # Hahtags

Popular Questions