I have an example which seems to contradict the definition of the subgradient of the L1 norm. Obviously I made a mistake, but I can't see where.
We start with the definition of a subgradient (from Section 11.4.2 in Statistical Learning with Sparsity) : given a convex function $f: \mathbb{R}^p \rightarrow \mathbb{R}$, we say that $z \in \mathbb{R}^p$ is a subgradient at $\beta$, denoted by $z \in \partial f(\beta)$, if we have
$$ f(\beta + \triangle) \geq f(\beta) + \langle z, \triangle \rangle$$
for all $\triangle \in \mathbb{R}^p$. In case $f(\beta) = ||\beta||_1$, it can be seen that $z \in\partial ||\beta||_1$ if and only if $z_j = sign(\beta_j)$ for all $j=1, 2, \dots, p$, where we allow $sign(0)$ to be any number in the interval $[-1,1]$.
Now my example: let $f(\beta) = ||\beta||_1$ and $\beta \in \mathbb{R}^1$. Let $\beta = 1 = \triangle \in \mathbb{R}$. Now for me it seems the above inequality holds for all values $z \in (-\infty, 1)$, for instance for $z = \frac{1}{2}$:
$$ 2 = f(\beta + \triangle) \geq f(\beta) + \langle z, \triangle \rangle = 1 + \frac{1}{2} $$ However, this contradicts the iff from above, which says that whenever $\beta \neq 0$ we have $z \in \{-1,1\}$.
Where did I make a mistake? Any hints are greatly appreciated!
The inequality $f(\beta + \Delta) \geq f(\beta) + \langle z, \Delta \rangle$ is supposed to hold for all values of $\Delta$, not just $\Delta = 1$.
If $\Delta = -1$, then $$ 0 = f(\beta + \Delta) \ngeq f(\beta) + \langle z, \Delta \rangle = 1 - \frac12. $$ So $z = \frac12 \notin \partial f(1)$.
Visually, if you plot the curves $y = f(\beta) = | \beta|$ and $y = 1 + \frac12 \beta$, it's clear that $\frac12$ is not a subgradient of $f$ at $\beta = 1$.