When to use L2 regularization?

Question

When to use L2 regularization?

788 Views Asked by Bumbble Comm At 10 May 2026 - 3:41

We know that L1 and L2 regularization are solutions to avoid overfitting.

L1 regularization, can lead to sparsity and therefore avoiding fitting to the noise. However, L2 does not.

So I wonder when there is a need to use L2 regularization?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2019-06-06 06:37:48

The sparsity assumption may not apply to every problem, in fact there is a lot of problems where it does not apply. Then using a $L^1$ regulzation technique has little effect on efficiency of the linear model. The $L^2$ regularization allows to have better performance on using the model while being simple to compute and with some useful theoretical results.

Let $x \in \mathbb{C}^n$, $y \in \mathbb{C}^m$ and $A \in \mathcal{M}_{m \times n}( \mathbb{C})$.

The usual least square problem is to find $x^*$ such that :

$$\|Ax^*-y\|_2^2=\min_x \|Ax-y\|_2^2$$

$\|\cdot\|_2$ being the Euclidean norm.

That problem can be solved anatically by taking the pseudo-inverse of $A$, noted $A^+$ here. We know that $x^* = A^+y$. The pseudo-inverse can be computed through the Singular Value Decomposition (or SVD). The SVD of $A$ is given by :

$$A=USV^*$$

with $U$ and $V$ unitary matrices and $S$ a rectangular matrix where only the diagonal coefficients can be non-zero, they are also positive numbers (so they are real), they are called singular values.

Then $A^+=VS^+U^*$, and $S^+$ is also diagonal (and rectangular) and the diagonal coefficient of $S^+$ are given by the inverse of the non-nul singular-values of $A$.

Suppose one of the singular value is very small relatively to others, then, by taking its inverse, one of the coefficient of $S^+$ will be very large. The problem is, small eigenvalues are often related to noise. When you are facing real-world data, which is always noisy, using this technique may then amplify it. The $L^2$ regularization is one way to prevent this phenomenon.

The $L^2$ regularization problem can be written this way:

$$\min_x \|Ax-y\|_2^2+ \| \Gamma x\|_2^2,$$

with $\Gamma$ a complex square matrix.

If $AA^*+\Gamma\Gamma^*$ is invertible and the problem is overdetermined, the solution is given by :

$$\bar x = (AA^*+\Gamma\Gamma^*)^{-1}A^*y.$$

The singular values of $A$ are the square roots of the eigenvalues of $AA^*$ (which is hermitian so can be diagonalized with real positive eigenvalues).

Suppose, for simplicity, that $\Gamma = a I$ with $I$ the identity matrix and $a$ a positive coefficient (which is the choice commonly made when using $L^2$ regularization). The eigenvalues of $AA^*+aI$ are equals to the eigenvalues of $AA^*$ plus $a$. In other "words" :

$$AA^*+aI = Q (D+aI)Q^*,$$

with $D$ a diagonal matrix made from the squares of the singular values of $A$ and $Q$ a unitary matrix.

Then, we are no longer having problem with taking the inverse of eventually small singular values, since we added a quantity $a$ to every of them (for a reasonnable choice of $a$). If you are facing an underdetermined problem, the reasonning stays the same but you will be working with $A^*A+aI$ instead.

Note that a similar way to solve a least square problem and avoiding noise can be done by using TSVD (truncated SVD) : just compute the SVD, and before getting the pseudo-inverse, set to $0$ every eigenvalue smaller than some treshold.

When to use L2 regularization?

There are 1 best solutions below

Related Questions in REGRESSION

Related Questions in MACHINE-LEARNING

Related Questions in REGULARIZATION

Trending Questions

Popular # Hahtags

Popular Questions