Subgradient of a composition w/ affine $f$

1k Views Asked by At

Let $f:\mathbb{R}^n \to \mathbb{R} \cup \{ \infty \}$ be convex, w/ subgradient at x in its domain $\partial f(x):=\{ d:f(y)\geq f(x)+d^T (y-x),\forall y\in \mathbb{R}^n \}$.
Let $h(x'):=f(Ax'+b)$, where $A \in \mathbb{R}^{n \times m}$, then its subgradient is $$\partial h(x')=A^T \partial f(Ax'+b) $$ (So I googled for this result, but I'd like to convince myself it's true)
I think I can see "$\supseteq$", but I'm having trouble w/ "$\subseteq$" b/c by defn $$\partial h(x')=\{ d' \in \mathbb{R}^m : f(Ay'+b) \geq f(Ax'+b)+d'^T(y'-x'), \forall y' \in \mathbb{R}^m \}$$ while $$A^T \partial f(Ax'+b)=\{ A^T d:f(y) \geq f(Ax'+b)+d^T (y-Ax'-b),\forall y \in \mathbb{R}^n \}$$ Another approach I tried is to let $d' \in \partial h(x')$ but assume either $d' \notin Range(A^T)$ or $d'=A^T d$ for some $d \notin \partial f(Ax+b)$. Either assumption should lead to a contradiction.

2

There are 2 best solutions below

2
On BEST ANSWER

I have another answer, which is in the spirit of https://math.stackexchange.com/a/1310604/58577

For ease of notation, I use $b = 0$. The shift with $b \ne 0$ does not cause any difficulties.

Let $d' \in \partial h(x')$ be given. We define the sets \begin{align*} A &= \{(s, y) \in \mathbb{R} \times \mathbb{R}^n ; s < f(A \, x') - f(y)\} \\ B &= \{(s, y) \in \mathbb{R} \times \mathbb{R}^n ; \exists y' \in \mathbb{R}^m : A \, y' = y,\; \text{and}\; s > -d'^\top (y' - x').\} \end{align*} Note that $A$ is the shifted, strict hypograph of $f$, whereas $B$ is the preimage of the strict epigraph of $d'^\top( \cdot - x')$.

It is easy to see that these two sets are disjoint and $A$ is open. Hence, we can use a separation theorem to get $(t, -d) \in \mathbb{R} \times \mathbb{R}^n$, $(t,-d) \ne 0$, and $a \in \mathbb{R}$, such that \begin{equation*} (t, -d)^\top (s,y) \ge a \ge (t, -d)^\top (\tilde s,\tilde y) \end{equation*} for all $(s,y) \in B$ and $(\tilde s, \tilde y) \in A$. It is easy to see that $t > 0$ and w.l.o.g. we have $t = 1$.

By using $y = \tilde y = A \, x'$, $s \searrow 0$, $\tilde s \nearrow 0$, we find $a = -d^\top A \, x'$.

Now, we use arbitrary $(y, y')$, $(\tilde s, y) \in A$, $(s, A \, y') \in B$ and $s \searrow -d'^\top (y' - x')$, $\tilde s \nearrow f(A \, x' ) - f(y)$, we get \begin{equation*} -d'^\top (y' - x') - d^\top A \, y' \ge -d^\top A \, x' \ge f(A \, x') - f(y) - d^\top y. \end{equation*} The first inequality shows $A^\top d = d'$ and the second one $d \in \partial f(A x')$.

0
On

I have an answer which uses that $$f'(x; y) = \max_{d \in \partial f(x)}y^\top d,$$ where $f'(x;y)$ is the directional derivative of $f$ at $x$ in direction $y$.

Let $d'$ be given, such that $d' \not\in A^\top \partial f(Ax+b)$. Since $\partial f(Ax+b)$ is compact, the set $A^\top \partial f(Ax+b)$ is compact and convex. Hence, we can separate $d'$ and $A^\top \partial f(Ax+b)$. That is, there is $s \in \mathbb{R}^m$, such that $$s^\top d' > s^\top \tilde d\qquad \forall \tilde d \in A^\top \partial f(Ax+b).$$ This yields $$s^\top d' > s^\top A^\top d\qquad \forall d \in \partial f(Ax+b).$$ Now, we invoke the formula for the directional derivative and obtain $$\max_{\tilde d' \in \partial h(x)} s^\top \tilde d' = h'(x;s) = f'(A \, x + b; A \, s) = \max_{d \in \partial f(Ax+b)} (A\,s)^\top d < s^\top d'$$ This shows that $d' \not\in \partial h(x)$.