The Sub Differential of the Separable Sum $ R \left( w \right) = \sum^{d}_{j=1} \left| w_j \right|$ ($ {L}_{1} $ Norm of a Vector)

1k Views Asked by At

Recall the definition of a sub-differential:

$$\partial F(w_0) = \{ v : \forall w, F(w)-F(w_0) \geq v \cdot (w - w_0)\} $$

Intuitively, for any w in the domain of the function one can draw a plane which goes through the point (w, F(w)) and which is everywhere either touching or below the surface of F.

Then if that is the definition of $\partial F(w)$, what is the sub-differential of the separable sum:

$$R(w) = \sum^{d}_{j=1} |w_j|$$

I was told that the answer is:

$$\partial R(w) = (\partial R(w_1), ..., \partial R(w_d) )$$

However, I was unsure how to prove this or even verify it because of several reasons.

1) I don't think I know how to start such a proof. I usually start by applying the definition, but I am unsure on how to even do that and progress.

2) I don't thing I even understand what the statement $\partial R(w) = (\partial R(w_1), ..., \partial R(w_d) )$ even means. The reason that I am having a hard time parsing it is because for me a sub gradient $\partial R(w)$ is a set of vectors. If that is the case does it mean that $\partial R(w) = (\partial R(w_1), ..., \partial R(w_d) )$ means, the set of vector that is constructed from the set of vectors $\partial R(w_j)$? If that is the case then great! I understand the statement but how do I prove it?

1

There are 1 best solutions below

0
On

The notation $\partial R(w_i)$ is a little confusing so I'll generalize slightly and use a different notation: $$ F(w) = \sum_{i=1}^d f(w_i) $$ Then we claim subgradient of $F$ is: $$ \partial F(w) = (\partial f(w_1), \partial f(w_2) \ldots, \partial f(w_d)) = \{ (v_1, v_2 \ldots, v_d) \in \mathbb{R}^d : v_i \in \partial f(w_i)\} $$ Note that $\partial F(w)$ is a subset of $\mathbb{R}^d$, but $\partial f(w_i)$ is a subset of $\mathbb{R}$.

To prove $\partial F(w) \supseteq (\partial f(w_1), \partial f(w_2) \ldots, \partial f(w_d))$, suppose $v \in (\partial f(w_1), \partial f(w_2) \ldots, \partial f(w_d))$. Then for all $u \in \mathbb{R}^d$ we have: $$ F(u)-F(w)=\sum_{i=1}^d f(u_i)-f(w_i) \ge \sum_{i=1} v_i(u_i-w_i) = v \cdot (u-w). $$ Hence $v \in \partial F(w)$.

To prove $\partial F(w) \subseteq (\partial f(w_1), \partial f(w_2) \ldots, \partial f(w_d))$, suppose $v \notin (\partial f(w_1), \partial f(w_2) \ldots, \partial f(w_d))$. Then for some $j$, $v_j \notin \partial f(w_j)$, so then there exists $u_j$ such that: $$ f(u_j) - f(w_j) < v_i(u_j - w_j) $$ Then choose $u \in \mathbb{R}^d$ with $u_j$ as above and $u_i = w_i$ for $i \neq j$. Then $$ F(u)-F(w)=\sum_{i=1}^d f(u_i)-f(w_i) = f(u_j)-f(w_j) < v_i(u_j - w_j) = v\cdot(u-w) $$ Hence $v \notin \partial F(w)$.

So, in summary, this reduces the problem to finding the subgradient of a 1-dimensional function. In this case $f(x) = |x|$, the absolute value function. One can show that the subgradient is the following: $$ \partial f(x) = \begin{cases} \{-1\} & x < 0\\ [-1,1] & x = 0\\ \{1\} & x > 0\\ \end{cases} $$