Legendre transform preserves strict convexity

220 Views Asked by At

Let $f:A\to \Bbb R$ be a strictly convex function of Class $C^2$ in a open and convex subset $A$ of $\Bbb R^n$.

Say that $g$ is its Legendre Transform defined as: $g(p) = \max_{A} (xp - f(x))$ Say also that we already know $g’(p) = x(p)$ where $p = \nabla f(x(p))$.

How can I prove that $g$ is thus strictly convex?

2

There are 2 best solutions below

0
On

The strict convexity of $f$ is equivalent to $f''(x)>0$ for all $x\in A\,.$

The maximum in $g(p)=\max\limits_{x\in A}(xp-f(x))$ is attained for the unique $x=x(p)$ that satisfies $$\tag{1} p=f'(x)\ $$ (do you know why that $x$ is unique?). Therefore, $$ g(p)=x(p)p-f(x(p))\,. $$ It seems to me you are wrongly assuming $g(p)=x(p)$ instead.

To prove strict convexity of $g$ we only have to show $g''>0\,:$ Clearly, $$ g'(p)=x'(p)p+x(p)-f'(x(p))x'(p)=x(p) $$ by (1). Therefore, using $x(p)=(f')^{-1}(p)\,,$ and the derivative of the inverse function, $$ g''(p)=x'(p)=\frac{1}{f''((f')^{-1}(p))}>0\,. $$

2
On

A lot of assumptions can be stripped from this. We don't require $f$ to be twice differentiable, or even strictly convex. I just want $f$ to be convex and differentiable.

If $g$ is not strictly convex, then there exist distinct points $y_1, y_2$ where $$g\left(\frac{y_1 + y_2}{2}\right) = \frac{g(y_1)}{2} + \frac{g(y_2)}{2}.$$ Let $z$ maximise $z^\top \frac{y_1 + y_2}{2} - f(z)$. That is $$g\left(\frac{y_1 + y_2}{2}\right) = z^\top \frac{y_1 + y_2}{2} - f(z) \ge x^\top \frac{y_1 + y_2}{2} - f(x)$$ for all $x \in A$. This further means, from our assumption, \begin{align*} \frac{g(y_1)}{2} + \frac{g(y_2)}{2} &= z^\top \frac{y_1 + y_2}{2} - f(z) \\ &= \frac{1}{2}(z^\top y_1 - f(z)) + \frac{1}{2}(z^\top y_1 - f(z)) \\ & \le \frac{g(y_1)}{2} + \frac{g(y_2)}{2}, \end{align*} hence $z$ also maximises $z^\top y_i - f(z)$ for $i = 1, 2$.

Now, since $$y_i^\top z - f(z) \ge y_i^\top x - f(x),$$ for all $x \in A$ and $i = 1, 2$, we then get $$f(x) \ge y_i^\top x - y_i^\top z + f(z),$$ the right hand side of which is a pair of affine functions in $x$, each achieving the value $f(z)$ at $x = z$. The graphs of these affine functions would each be a tangent hyperplane to $f$ at $z$. But, the tangent hyperplane is unique ($f$ is assumed to be differentiable, and $z \in A$, where $A$ is open), so we have a contradiction.