Understanding L2 Regularization Formula

3k Views Asked by At

I am currently following the Machine Learning Crash Course on Tensorflow and came across this formula:

$$L_2\text{ regularization term} = \|\boldsymbol w\|_2^2 = {w_1^2 + w_2^2 + \cdots + w_n^2}$$

I am having trouble understanding its notations.

I understand that the double bar, $\|\boldsymbol w\|$, denote some kind of norm, but what kind of norm is this? I am only familiar with the Euclidean norm.

Secondly, I don't understand why there's a subscript and superscript, $_2^2$, attached to the term. I'd think the subscript should be $n$ instead of $2$?

2

There are 2 best solutions below

0
On BEST ANSWER

The genereal notation for $p$-norm for $p \in [1,+\infty)$ of vector $v \in \mathbb{R}^n$ is this:

$$ \| v \|_p = \sqrt[p]{\sum^n_{i=1} |v_i|^p}. $$

It is easy to see that $\| v\|_2$ is indeed an Euclidean norm (let $p=2$ in the formula above) That is, Euclidean norm is 2-norm.

Then squaring produces

$$ \| v\|_2^2 = (\|v\|_2)^2 =\sum^n_{i=1} v_i^2 = v_1^2 + v_2^2 \ldots + v_n^2 $$

which is what you have specified.

8
On

If you read Boyd in chapter six there is regularization and least squares problems. Regularization follows the following problem like this.

$$ \textrm{ minimize w.r.t }R_{+}^{2} (\| Ax -b\|,\|x \|) $$

this is called the bi-criterion problem which is a convex optimization problem.

Regularization has a general pattern which looks like this $$ \textrm{ minimize} \| Ax -b\| + \gamma \|x \| $$

Where we have a parameter $ \gamma \in (0,\infty) $ which is our regularization parameter. In the case of $\ell_{2}$ regularization we have

$$ \textrm{ minimize} \| Ax -b\|_{2} + \delta \|x \|_{2} $$

where our 2-norm here $\|x \|_{2} = \left( \sum_{i=1}^{m} |x_{i} |^{2} \right)^{\frac{1}{2}}$

The superscript simply means

$$ \| x \|_{2}^{2} = \sum_{i=1}^{m} |x_{i} |^{2} $$