If we incorporated $ {L}_{1} $ Loss in gradient descent, how would the update rule change? It's easy to write down the optimization objective. But I'm not sure what to put for the update rule.
2026-02-22 21:41:26.1771796486
Bumbble Comm
On
How Does $ {L}_{1} $ Regularization Present Itself in Gradient Descent?
9.9k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
2
Bumbble Comm
On
It changes the direction you descent towards.
You may have a look at this PDF - Steepest Descent Direction for Various Norms.
It shows the direction for few different norms.
Related Questions in OPTIMIZATION
- Optimization - If the sum of objective functions are similar, will sum of argmax's be similar
- optimization with strict inequality of variables
- Gradient of Cost Function To Find Matrix Factorization
- Calculation of distance of a point from a curve
- Find all local maxima and minima of $x^2+y^2$ subject to the constraint $x^2+2y=6$. Does $x^2+y^2$ have a global max/min on the same constraint?
- What does it mean to dualize a constraint in the context of Lagrangian relaxation?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Building the model for a Linear Programming Problem
- Maximize the function
- Transform LMI problem into different SDP form
Related Questions in ALGORITHMS
- Least Absolute Deviation (LAD) Line Fitting / Regression
- Do these special substring sets form a matroid?
- Modified conjugate gradient method to minimise quadratic functional restricted to positive solutions
- Correct way to prove Big O statement
- Product of sums of all subsets mod $k$?
- (logn)^(logn) = n^(log10+logn). WHY?
- Clarificaiton on barycentric coordinates
- Minimum number of moves to make all elements of the sequence zero.
- Translation of the work of Gauss where the fast Fourier transform algorithm first appeared
- sources about SVD complexity
Related Questions in VECTOR-ANALYSIS
- Does curl vector influence the final destination of a particle?
- Gradient and Hessian of quadratic form
- Regular surfaces with boundary and $C^1$ domains
- Estimation of connected components
- Finding a unit vector that gives the maximum directional derivative of a vector field
- Gradient of transpose of a vector.
- Solve line integral
- Directional derivative: what is the relation between definition by limit and definition as dot product?
- Chain rule with intermediate vector function
- For which $g$ is $f(x)= g(||x||) \frac{x}{||x||}$ divergence free.
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Related Questions in REGULARIZATION
- Zeta regularization vs Dirichlet series
- Uniform convergence of regularized inverse
- Linear Least Squares with $ {L}_{2} $ Norm Regularization / Penalty Term
- SeDuMi form of $\min_x\left\{\|Ax-b\|_2^2 + \lambda\|x\|_2\right\}$
- Solving minimization problem $L_2$ IRLS (Iteration derivation)
- How to utilize the right-hand side in inverse problems
- How Does $ {L}_{1} $ Regularization Present Itself in Gradient Descent?
- Proof in inverse scattering theory (regularization schemes)
- Derivation of Hard Thresholding Operator (Least Squares with Pseudo $ {L}_{0} $ Norm)
- Nuclear norm and Schatten norm in practice
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
The problem is that the gradient of the norm does not exist at $0$, so you need to be careful
$$ E_{L_1} = E + \lambda\sum_{k=1}^N|\beta_k| $$
where $E$ is the cost function (E stands for error), which I will assume you already know how to calculate the gradient for.
As for the regularization term, note that if $\beta_k > 0$ then $|\beta_k| = \beta_k$ and the gradient is $+1$, similarly when $\beta_k < 0$ the gradient is $-1$, so in summary
$$ \frac{\partial |\beta_k|}{\partial \beta_l} = {\rm sgn}(\beta_k)\delta_{kl} $$
so that
$$ \frac{\partial E_{L_1}}{\partial \beta_l} = \frac{\partial E}{\partial \beta_l} + \lambda\sum_{k=1}^N {\rm sgn}(\beta_k)\delta_{kl} = \frac{\partial E}{\partial \beta_l} + \lambda {\rm sgn}(\beta_l) $$