I encountered the following unsubstantiated fact in a proof:
$f$ is differentiable, strictly convex, and coercive. Thus we may conclude $$(\nabla f)^{-1} = \nabla f^*$$ where $f^*$ is the Legendre transformation of $f$
Loosely speaking, I see how this might work: if a function is differentiable then Legendre trasformation will return the slope at some point $x \in \text{dom} f$. If $f$ is strictly coercive, then $\nabla f$ is monotonic, in the appropriate sense. So an inverse is defined. Loosely, I think the role of coercivity is to make the map between $x \in \text{dom} f$ and its slope at $x$ bijective. I can't understand how coercivity is causing this though.
My questions:
- Can someone provide a heuristic description what coercivity is giving us?
- Provide a proof of the fact above (or reference)? Assume $f: \mathbb{R}^d \rightarrow \mathbb{R}$.
Recall that $f^*(y) = \sup \langle x,y\rangle -f(x)$. Assuming there is a minimizer $x^*$ we have, $f^*(y) = \langle x^*,y\rangle -f(x^*)$. At the same time, $x^*$ satisfies the optimality conditions $0=\nabla_x(\langle x,y\rangle -f(x))|_{x=x^*} = y-\nabla_x f(x^*)\implies y=\nabla_xf(x^*)$. If we take the gradient of $f^*(y)$ w.r.t. $y$ to get $\nabla_y f^*(y) = x^*(y)$. So we know that $\nabla f^*(y) = (\nabla f)^{-1}(x^*(y))$. But $x^*$ might not be unique.
So you want,
$$\arg\sup_x \langle x,y\rangle - f(x)=\{x^*\}\neq\emptyset$$
a singleton for all $y$. Coercivity in $x$ will ensure this since $f$ is also strictly convex (and thus $-f$ is strictly concave).
The injectivity of $\nabla f$ needed to define an inverse is a consequence of strict convexity and coercivity. Since $f$ is strictly convex, $\nabla f$ is strictly monotone (and thus injective). Notice that
$$\arg\sup_x \langle x,y\rangle - f(x) = \arg\min_x \langle x,-y\rangle + f(x)$$
Fix $y$ and let $\phi(x) = \langle x,-y \rangle + f(x)$. The sublevel sets of $\phi$ are bounded by coercivity and closed since $\phi$ is continuous. Taking any sublevel set (e.g. the sublevel set for $\phi(0)$), we have that $\phi$ is strictly convex and differentiable and thus admits a unique minimum on a compact set. Then, for any $y$, $x^*$ exists and is unique so that $\nabla f$ is surjective.