Defining the Subgradient with Polar of the Tagent Cone on the Epigraph, or the Convex Hull

64 Views Asked by At

To start, I would like to state both definitions for sub-gradient and where they came from.

Definition 1

Let $f$ be $\mathbb E \mapsto \mathbb {\bar R}$ with $x\in \text{dom}(f)$ then $v\in \mathbb E$ is a sub-gradient at $x$ when: $$ f(y) \ge f(x) + \langle v, y- x\rangle + o(\Vert y - x\Vert) \text{ as }y \rightarrow x $$ There is no subgradient for $x\not\in \text{dom}(f)$

My comments:

$o$ being the small o notations, it means that $\lim_{y\rightarrow x}o(\Vert y - x\Vert)/\Vert y - x\Vert = 0$. This is depicting the pictures of infinitely many trajectories of particles $y$ approaching $x$ in the domain of $\text{dom}(f)$ with some "headlights" pointing along the direction of their velocities (a cone 90-degree cone headlights) intersecting and forms the set of all possible sub-gradient. It's just the polar of the tangent cone at $x$ for the epigraph of $f$.

Source: Dmitriy Drusvyatskiy's course notes at the University of Washington in his Convex Analysis class.


Definition 2

Let $f$ be proper and $x\in \text{dom}(f)$, then $g$ is a sub-gradient of $f$ at $x$ if:

$$ f(y) \ge f(x) + \langle g, y - x\rangle \;\forall y \in \mathbb E $$ There is no subgradient for $x\not\in \text{dom}(f)$

Source: <First Order Method in Optimizations> by SIAM

Comments

These 2 definitions are obviously different, and under the case where $f$ is convex, they are the same. There are several details here, firstly the first definition doesn't mention about $f$ being proper. In addition, the second one is describing all the normal of a hyperplane touching the epigraph of $f$ which separates the epigraph of $f$ from very other points outside of the convex closure of the epigraph. In fact, the second definition is a smaller set of sub-gradients compared to the first one.

My questions are:

  1. Firstly, are they both correct? if so do we have different types of sub-gradient?
  2. If $f$ is not convex, which one should we use and why the difference between them would be important? (Or may it's not important at all when $f$ is non-convex)