What will be the value of sub gradient at $0$ for function $|x|$

1.6k Views Asked by At

I am learning about Lasso Regression and came across taking gradient with respect to $0$. I came to know about subgradient but could not understand what will be its value at $0$.

In lasso regression, we use $\mathcal{L}_1$-Regularisation, which means adding the absolute values of weights or coefficients to the cost. So, In order for us to do the gradient descent algorithm, we have to take the gradients. The gradient of the absolute value function at $0$ does not exist. Then I have read about something called sub-gradients, but I could not understand much about it. For subgradients for values less than $0$, subgradient is $-1$ and for values greater than $0$, subgradient is $+1$. But I could not understand what will be the value of subgradient at $0$, for function $|x|$?

2

There are 2 best solutions below

0
On

The function $f(x) = |x|$ has subtangents at $x=0$ with slopes between $-1$ and $1$. Hence, $m$ is a subgradient at $x_0=0$ if $|m|\leq 1$.

In order to derive this, we will start with the definition of the subgradient $m(x-x_0) \leq f(x)-f(x_0)$ $\forall x\in \mathbb{R}$.

$$m(x-x_0)\leq f(x)-f(x_0)$$ $$mx \leq |x|$$ $$\text{for } x>0 \text{: } m\leq 1 $$ $$\text{for } x<0 \text{: } -m\leq 1 $$ $$\text{for } x=0 \text{: } 0 \leq 0 $$ $$\implies m \in [-1,1].$$

We can also start with $|m| \leq 1$ and show that these are subgradients by deriving the definition $m(x-x_0) \leq f(x)-f(x_0)$. Assume $|m| \leq 1$ then

$$m(x-x_0)=m(x-0)=mx \leq |mx| = |m||x|\leq 1\cdot |x| =|x|-|0|=f(x)-f(x_0=0)$$ $$\implies m(x-x_0) \leq f(x) -f(x_0).$$

0
On

@MachineLearner's solution is the standard approach and applies the definition of $\partial f(x)$, that is, the subgradient of a function $f$ at the point $x$. Here is another approach that can be an instructive exercise in subdifferential rules.

We can compute the subdifferential (i.e., set of all subgradients) at $x = 0$ by using the identity $|x| = \max \{-x, x\}$. Let $f_1(x) = x$ and $f_2(x) = -x$. Since $f_1(0) = f_2(0)$, then, by the max rule of subdifferential calculus (Theorem 10.3), $$ \begin{aligned} \partial f(0) & = \text{ConvexHull}(\partial f_1(0), \partial f_2(0))\\ & = \text{ConvexHull}(-1, 1)\\ &= \{ (\alpha) (-1) + (1-\alpha) (1) : \alpha \in [0, 1]\}\\ &= [-1, 1] \end{aligned} $$

To go from the first line to the second line, we use the fact that, if $f$ is differentiable at $x$, then $\partial f(x) = \{\nabla f(x)\}$, that is, the singleton set containing the gradient.