How Does $ {L}_{1} $ Regularization Present Itself in Gradient Descent?

9.9k Views Asked by Bumbble Comm At 22 Feb 2026 - 9:41

If we incorporated $ {L}_{1} $ Loss in gradient descent, how would the update rule change? It's easy to write down the optimization objective. But I'm not sure what to put for the update rule.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 15 Mar 2018 - 12:32 BEST ANSWER

The problem is that the gradient of the norm does not exist at $0$, so you need to be careful

$$ E_{L_1} = E + \lambda\sum_{k=1}^N|\beta_k| $$

where $E$ is the cost function (E stands for error), which I will assume you already know how to calculate the gradient for.

As for the regularization term, note that if $\beta_k > 0$ then $|\beta_k| = \beta_k$ and the gradient is $+1$, similarly when $\beta_k < 0$ the gradient is $-1$, so in summary

$$ \frac{\partial |\beta_k|}{\partial \beta_l} = {\rm sgn}(\beta_k)\delta_{kl} $$

so that

$$ \frac{\partial E_{L_1}}{\partial \beta_l} = \frac{\partial E}{\partial \beta_l} + \lambda\sum_{k=1}^N {\rm sgn}(\beta_k)\delta_{kl} = \frac{\partial E}{\partial \beta_l} + \lambda {\rm sgn}(\beta_l) $$

Bumbble Comm On 15 Mar 2018 - 12:36

It changes the direction you descent towards.

You may have a look at this PDF - Steepest Descent Direction for Various Norms.
It shows the direction for few different norms.

How Does $ {L}_{1} $ Regularization Present Itself in Gradient Descent?

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in ALGORITHMS

Related Questions in VECTOR-ANALYSIS

Related Questions in GRADIENT-DESCENT

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions