To start, I would like to state both definitions for sub-gradient and where they came from.
Definition 1
Let $f$ be $\mathbb E \mapsto \mathbb {\bar R}$ with $x\in \text{dom}(f)$ then $v\in \mathbb E$ is a sub-gradient at $x$ when: $$ f(y) \ge f(x) + \langle v, y- x\rangle + o(\Vert y - x\Vert) \text{ as }y \rightarrow x $$ There is no subgradient for $x\not\in \text{dom}(f)$
My comments:
$o$ being the small o notations, it means that $\lim_{y\rightarrow x}o(\Vert y - x\Vert)/\Vert y - x\Vert = 0$. This is depicting the pictures of infinitely many trajectories of particles $y$ approaching $x$ in the domain of $\text{dom}(f)$ with some "headlights" pointing along the direction of their velocities (a cone 90-degree cone headlights) intersecting and forms the set of all possible sub-gradient. It's just the polar of the tangent cone at $x$ for the epigraph of $f$.
Source: Dmitriy Drusvyatskiy's course notes at the University of Washington in his Convex Analysis class.
Definition 2
Let $f$ be proper and $x\in \text{dom}(f)$, then $g$ is a sub-gradient of $f$ at $x$ if:
$$ f(y) \ge f(x) + \langle g, y - x\rangle \;\forall y \in \mathbb E $$ There is no subgradient for $x\not\in \text{dom}(f)$
Source: <First Order Method in Optimizations> by SIAM
Comments
These 2 definitions are obviously different, and under the case where $f$ is convex, they are the same. There are several details here, firstly the first definition doesn't mention about $f$ being proper. In addition, the second one is describing all the normal of a hyperplane touching the epigraph of $f$ which separates the epigraph of $f$ from very other points outside of the convex closure of the epigraph. In fact, the second definition is a smaller set of sub-gradients compared to the first one.
My questions are:
- Firstly, are they both correct? if so do we have different types of sub-gradient?
- If $f$ is not convex, which one should we use and why the difference between them would be important? (Or may it's not important at all when $f$ is non-convex)