Subgradient along one direction of a convex function and its directional derivative

437 Views Asked by At

I am considering the following two things and trying to prove that they equals to each other: $$ u = \inf_{\lambda>0}\left\{\frac{f(\boldsymbol{x} + \lambda\boldsymbol{p}) - f(\boldsymbol{x})}{\lambda}\right\} $$ $$ v =\sup\left\{\boldsymbol{p}^{T}\boldsymbol{c}:\boldsymbol{c}\in\partial f(\boldsymbol{x})\right\} $$ where $f:R^{n}\to R$ ($-\infty < f < +\infty$) is any closed proper convex function and $\boldsymbol{x}$ is an arbitrary point in ri(dom$f$). Note that $u$ can be viewed as the directional subgradient at $\boldsymbol{x}$ while $v$ is the biggest projection of subgradients of $f$. Due to the convexity I think they should be the same, so I want to prove $u = v$.

Now that $f$ is not necessarily differentiable at $\boldsymbol{x}$, so I am trying to find contradiction in both $u<v$ and $u>v$. If $u < v$, it is obvious that for $\epsilon = (v - u)/2 > 0$, there exists $\boldsymbol{c}\in\partial f(\boldsymbol{x})$ such that $\boldsymbol{p}^{T}\boldsymbol{c} > v - \epsilon = u + \epsilon$ and $$ f(\boldsymbol{x} + \lambda\boldsymbol{p}) \ge f(\boldsymbol{x}) + \lambda\boldsymbol{p}^{T}\boldsymbol{c} > f(\boldsymbol{x}) + \lambda(u + \epsilon), \quad \forall \lambda > 0 $$ and thus $$ \frac{f(\boldsymbol{x} + \lambda\boldsymbol{p}) - f(\boldsymbol{x})}{\lambda} \ge u + \epsilon $$ which is impossible according to the definition of $u$. But this is as far as I can go. I think it is true that $u = v$ for convex function $f$, but I don't know how to find a contradiction when $u > v$. I was trying to construct a new subgradient $\boldsymbol{w}$ such that $\boldsymbol{p}^{T}\boldsymbol{w} = u$ or find contradiction from the convexity but failed.

I am wondering whether it is true that $u = v$ and, if so, how to get this result.

3

There are 3 best solutions below

9
On

Hi I myself have come up with a proof and I think it meets the requirements of the question. My way of doing it is as follows: Consider a sequence of points $\{\boldsymbol{x}_k\}$ and $\{g_k\}$ defined as $\boldsymbol{x}_k = \boldsymbol{x} + \lambda_k\boldsymbol{p}$ and $g_k = f(\boldsymbol{x}) + \lambda_k u$. If I consider $\lambda_k\to 0^+$ as $k\to +\infty$ then $(\boldsymbol{x}_k, g_k)$ forms an approximating sequence to $(\boldsymbol{x}, f(\boldsymbol{x}))$ in the space $R^{n + 1}$. At the same time consider the set (slightly different from the epigraph): $$E = \{(\boldsymbol{y}, t)\in R^{n+1}: \boldsymbol{y}\in \text{ri(dom$f$)}, \ t > f(\boldsymbol{y})\}$$ Easy to see that $E$ is open and convex as $f$ is convex, and $(\boldsymbol{x}_k, g_k)\not\in E$ because $f(\boldsymbol{x}_k) \ge g_k$ according to the definition of $u$.

Now we can apply separation theorem to the sequence and $E$: for every $k$ there exists $r_k$, $\alpha_k$ and nonzero vector $\boldsymbol{q}_k\in R^n$ such that $$ \boldsymbol{q}_k^{T}\boldsymbol{x}_k + r_k g_k \ge \alpha_k > \boldsymbol{q}_k^{T}\boldsymbol{y} + r_k t, \quad \forall (\boldsymbol{y}, t)\in E $$ Apparently, $\|[\boldsymbol{q}_k^T, r_k, \alpha_k]\| > 0$. Without loss of generality, let $\|[\boldsymbol{q}_k^T, r_k, \alpha_k]\| = 1$. Then there exists a convergent subsequence $\{[\overline{\boldsymbol{q}}_k^T, \overline{r}_k, \overline{\alpha}_k]\}$ with a limit $[\overline{\boldsymbol{q}}^T, \overline{r}, \overline{\alpha}]$. Let $k\to\infty$ and the separation theorem becomes: $$ \overline{\boldsymbol{q}}^{T}\boldsymbol{x} + \overline{r}f(\boldsymbol{x}) \ge \overline{\alpha} \ge \overline{\boldsymbol{q}}^{T}\boldsymbol{y} + \overline{r}t $$ Now I want to prove that $\boldsymbol{d} = -\overline{\boldsymbol{q}}/\overline{r} \in \partial f(\boldsymbol{x})$ and that $\boldsymbol{d}^{T}\boldsymbol{p} \ge u$.

First let $t\to\infty$ then we know $\overline{r}\le 0$. Now consider the case $\overline{r} = 0$. As $\boldsymbol{x}\in \text{ri(dom$f$)}$ there exists $\epsilon > 0$ such that $\boldsymbol{y} = \boldsymbol{x} + \epsilon\overline{\boldsymbol{q}} \in \text{dom$f$}$. If $\overline{r} = 0$, choose this $\boldsymbol{y}$ and we will get $0\ge \epsilon\|\overline{\boldsymbol{q}}\|^2$ which leads to $\overline{\boldsymbol{q}} = 0$ and thus $\overline{\alpha} = 0$. But this is impossible as $\|[\overline{\boldsymbol{q}}^T, \overline{r}, \overline{\alpha}]\| = 1$. To this end, it should be clear that $\overline{r} < 0$. Similarily, we have $\overline{r}_k < 0$ for all $k$. Note that for $\forall \delta > 0$, $(\boldsymbol{y}, f(\boldsymbol{y}) + \delta)\in E$ so we can get $$ f(\boldsymbol{y}) + \delta \ge f(\boldsymbol{x}) + \frac{1}{\overline{r}}\overline{\boldsymbol{q}}^{T}(\boldsymbol{x} - \boldsymbol{y}) = f(\boldsymbol{x}) + \boldsymbol{d}^{T}(\boldsymbol{y} - \boldsymbol{x}), \quad \forall \boldsymbol{y}, \forall \delta > 0 $$ Let $\delta\to 0$ then it shows that $\boldsymbol{d}\in\partial f(\boldsymbol{x})$.

Now consider $\boldsymbol{y} = \boldsymbol{x}$ and $t = f(\boldsymbol{x}) + \delta$: $$ \overline{\boldsymbol{q}}_k^{T}\boldsymbol{x}_k + \overline{r}_k g_k \ge \overline{\alpha}_k > \overline{\boldsymbol{q}}_k^{T}\boldsymbol{x} + \overline{r}_k(f(\boldsymbol{x}) + \delta) $$ Note that $|\overline{r}_k| \le 1$ so the term $\overline{r}_k\delta$ vanishes when $\delta\to 0$. Recall that $\boldsymbol{x}_k = \boldsymbol{x} + \lambda_k\boldsymbol{p}$ and $g_k = f(\boldsymbol{x}) + \lambda_k u$, so we get $$ \lambda_k\overline{\boldsymbol{q}}_k^{T}\boldsymbol{p} + \lambda_k\overline{r}_k u \ge 0 $$ As $\lambda_k > 0$ we have $\overline{\boldsymbol{q}}_k^{T}\boldsymbol{p} + \overline{r}_k u \ge 0$ and thus $\overline{\boldsymbol{q}}^{T}\boldsymbol{p} + \overline{r} u \ge 0$. With $\overline{r} < 0$ we get $$ -\frac{1}{\overline{r}}\overline{\boldsymbol{q}}^{T}\boldsymbol{p} = \boldsymbol{d}^{T}\boldsymbol{p} \ge u $$ Therefore, $v = \sup\{\boldsymbol{p}^{T}\boldsymbol{c}:\boldsymbol{c}\in\partial f(\boldsymbol{x})\} \ge \boldsymbol{p}^{T}\boldsymbol{d} \ge u$. Combined with the fact $v\le u$ proved in the question, we get $u = v$ for closed convex functions.

The key point of this proof is how to approximate the desired subgradient at $\boldsymbol{x}$, which is implemented by separation theorem, and convexity and closure of the function guarantee this to succeed.

0
On

Note that we can write $A=\operatorname{aff} ( \operatorname{dom} f) = \{x\} + L$, where $ L$ is a linear subspace of $\mathbb{R}^n$. Note that $f(x) = +\infty$ for $x \notin A$. Since $x \in \operatorname{ri} ( \operatorname{dom} f)$, and $f$ restricted to $L$ is locally Lipschitz in a neighbourhood of $x$, say with rank $K$.

Pick some $x$ for which $f(x)$ is finite.

Let $f'(x;h) = \lim_{t \downarrow 0} {f(x+th) -f(x) \over t}$. For convenience, let $\gamma(h) = f'(x;h)$ and note that $\operatorname{dom} \gamma = A$. It is not hard to check that $\gamma$ is convex & positive homogenous. Note that $\gamma(0) = 0$ and $x \in \operatorname{ri} ( \operatorname{dom} f)$, so $\gamma$ is proper (it is bounded by $K$ on $A$ and $+\infty$ otherwise). A little work shows that $\operatorname{epi} \gamma$ is closed hence $\gamma$ is closed as a function.

Note that if $g$ is a closed convex function then $g(x) = \sup_{a \le g, a \text{ affine}} a(x)$. If $g$ is positive homogenous, then we have $g(x) = \sup_{l \le g, l \text{ linear}} l(x)$.

Hence $\gamma(h) = \sup \{ \langle \xi, h \rangle | f'(x;d) \ge \langle \xi, d \rangle \text{ for all } d \}$.

Let $\Delta_1 = \{ \xi | f'(x;d) \ge \langle \xi, d \rangle \text{ for all } d \}$ and $\Delta_2 = \{ \xi | f(y)-f(x) \ge \langle \xi, y-x \rangle \text{ for all } y \}$. Note that both are closed, and a little work shows that $\Delta_1 = \Delta_2$ and hence $\Delta_1 = \partial f(x)$.

Hence $\gamma(h) = f'(x;h) = \sup \{ \langle \xi, h \rangle | \xi \in \partial f(x) \}$.

0
On

This is a rude answer from Rockafellar's "Convex Analysis".

Theorem 23.4. (p.216) Let $f$ be a proper convex function. For $x \notin \operatorname{dom} f$, $\partial f(x)$ is empty. For $x \in \operatorname{ri} ( \operatorname{dom} f )$, $\partial f(x)$ is non empty, $h \mapsto f'(x;h)$ is a closed & proper function, and $f'(x;h) = \sup_{ \xi \in \partial f(x)} \langle h, \xi \rangle = \delta^*(y \mid \partial f(x))$. Finally, $\partial f(x)$ is a non empty bounded set iff $x \in ( \operatorname{dom} f )^\circ$, in which case $f'(x;h)$ is finite for every $h$.

I will see if I can find a slightly more self contained proof.