I was going through this page : https://www.stats.ox.ac.uk/~lienart/blog_opti_basics.html , and at the end of part 1 "Subgradient and First-order Optimality Condition", the author says:
Before moving on, it is useful to note (and not too hard to convince oneself) that the following inclusion holds for the subdifferential of a sum: $\sum_i ∂f_i⊆∂∑_if_i$.
Can anyone explain what this means? If $f_i$ is not differentiable, then it can have multiple values of subgradient, right? Then what does the sum of subgradients of the functions $f_i$ amount to? And how do we show the above result?
Suppose $\xi_k \in \partial f_k(x)$, then $f_k(z) \ge f_k(x) + \langle \xi_k, z-x \rangle$ for all $z$.
Hence $\sum_k f_k(z) \ge \sum_k f_k(z) + \langle \sum_k \xi_k, z-x \rangle$ for all $z$ and so we have $\sum_k \xi_k \in \partial \sum_k f_k(x)$.
Note that for convex functions in a practical context, it is typically the case that we have equality and lack of equality is pathological in some sense.
For example, a sufficient condition (Rockafellar) is $\cap_k \operatorname{ri} ( \operatorname{dom} f_k ) \neq \emptyset$.
As an aside, the notion of subdifferential can be extended to locally Lipschitz functions (subgradient) where the containment goes in the opposite direction (ignoring pathologies), that is the subgradients satisfy $\partial \sum_k f_k(x) \subset \sum_k \partial f_k(x)$. An easy example if $f_1(x)= \max(0,x)$, $f_2(x) = -f_1(-x)$ in which case we have $f_1(x)+f_2(x) = x$ and so $\partial \sum_k f_k(0)= \{1\}$, $\sum_k \partial f_k(0) = [0,2]$. Of course, $f_2$ is not convex, so there is no contradiction here.