Proof that a log-of-sum-of-exponentials is a convex function

8.2k Views Asked by At

It's well known in statistical mechanics that the following is a convex function of the vector $\theta$: $$ A(\theta) = \log \left( \sum_{i=1}^\infty e^{\theta \cdot f(i)} \right) $$ where $f(i)$ is a vector function of $i$. In the context of statistical mechanics, $A$ is known as the log partition function.

However, all the proofs of the convexity property that I know of rely on the interpretation of this function in terms of probability theory. One defines a probability distribution given by $p_i = e^{\theta\cdot f(i)-A(\theta)}$ and shows (for example) that the partial derivatives of $A$ form a covariance matrix, which implies that its Hessian must be positive definite.

For the sake of enhancing my understanding, I would like a more direct proof, one that proceeds directly from the mathematical form of $\psi$ as defined above, without considering a probability distribution. Is there a straightforward way to see that $A$ as defined above is a convex function of $\theta$, independently of its interpretation as a partition function in statistical mechanics?

1

There are 1 best solutions below

16
On

Before you add another downvote, note that showing that the function $h_p(x) = \log \sum_{k=1}^p e^{x_k}$ is convex for finite $p$ is straightforward using the Hessian and the Cauchy Schwarz Trump inequality, see the addendum below (or see https://math.stackexchange.com/a/1190438/27978, https://math.stackexchange.com/a/2089953/27978 for a few examples and https://math.stackexchange.com/a/2418721/27978 for a different proof), this answer addresses the $p \to \infty$ and inner product aspects.

Since you have an inner product, I am assuming that $\theta$ lies in some Hilbert space $\mathbb{H}$.

If the functions $g_k:\mathbb{H} \to \mathbb{R}$, are convex, and the function $h :\mathbb{R}^p \to \mathbb{R}$ is convex and non decreasing in each parameter, then it is straightforward to show that the function $\theta \mapsto h(g_1(\theta),...,g_p(\theta))$ is convex.

The 'log-sum-exponential' function $h_p(x) = \log \sum_{k=1}^p e^{x_k}$ is convex. It is straightforward to show that the Hessian is positive semi definite. It is easy to show that $h$ is non decreasing in each parameter $x_k$.

If we let $g_k(\theta) = \langle \theta, f_k \rangle$, we see that the function $\alpha_p(\theta) = h_p(g_1(\theta),...,g_p(\theta))$ is convex.

If the functions $\alpha_p$ are convex, and $\alpha(\theta) = \lim_{p \to \infty} \alpha_p(\theta)$, then the function $\alpha$ is easily shown to be convex.

Combining the above show that the function $A$ is convex.

Addendum: The function $A$ is not necessarily strictly convex. One can choose $f(i) = \phi \cdot \delta_{i0}$ for some fixed $\phi$ so that $A$ has the form $A(\theta) = \log(e^{\langle \theta, \phi \rangle} ) = \langle \theta, \phi \rangle$, which is not strictly convex.

Addendum for those downvoters and those who have lots of time to nitpick:

To prove that $h_p$ is convex, let $s(x) = \sum_{k=1}^p e^{x_k}$ and note that $h_p(x)'' = {1 \over s^2(x)} (s''(x)s(x)-s'(x)s'(x)^T)$. It is sufficient to show that $s(x) x^T s''(x)x \ge (x^Ts'(x))^2$ for all $x$, or equivalently, show $\sum_k e^{x_k} \sum_k x_k^2 e^{x_k} \ge (\sum_k x_k e^{x_k})^2$.

Cauchy Schwarz gives $(\sum_k x_k e^{x_k})^2 = (\sum_k x_k e^{x_k \over 2} e^{x_k \over 2} )^2 \le \sum_k x_k^2 e^{x_k} \sum_k e^{x_k}$