Show $f(x) = \ln\ (\sum_{i=1}^n e^{\langle a_i, x\rangle})$ is convex

412 Views Asked by At

Let $a_i \in \mathbb{R}^n, f(x) = \ln\ (\sum_{i=1}^n e^{\langle a_i, x\rangle })$. Show $f(x)$ is convex.

I don't really know where to start on this. Is it safe to say $x \in \mathbb{R}^n$ then or is that not necessary for inner products? I'm also confused because I know $\ln$ is concave so I don't see how ln of the sum would be convex.

I was thinking I needed to show the Hessian is symmetric semi-positive definite, but I'm not sure that's even possible from the info that I am given to work with. I am thinking showing that for all $x,y$, $f(\frac{x+y}{2}) \leq \frac{f(x) + f(y)}{2}$ might be best, but I am getting tripped up on the details there as well. Thoughts?

2

There are 2 best solutions below

0
On BEST ANSWER

You must show: $f(\theta x+(1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$

The trick is to use the Hölder's inequality:

$$ \sum_ {i=1}^n u_iv_i \le (\sum_ {i=1}^n|u_i|^p)^{\frac{1}{p}} \cdot (\sum_ {i=1}^n|v_i|^q)^{\frac{1}{q}} $$

with $1/p=\theta, 1/q=1-\theta$ so that $1/p+1/q=1$.

Now the details:

Let $u_i=e^ {\langle a_i, x\rangle} ,v_i=e^ {\langle a_i,y\rangle}$.

\begin{align*} f(\theta x+(1-\theta)y) &= \ln(\sum_ {i=1}^n e^{\theta \langle a_i, x\rangle + (1-\theta)\langle a_i, y\rangle}) \\ &=\ln(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)}) \end{align*}

From Hölder's inequality (your $u_i$ and $v_i$ are positives and $x\mapsto\ln(x)$ is an increasing function):

\begin{align*} \ln(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)}) &\le \ln[(\sum_ {i=1}^n u_i^ {\theta \cdot \frac{1}{\theta}})^ \theta \cdot (\sum_ {i=1}^n v_i^ {1-\theta \cdot \frac{1}{1-\theta}})^ {1-\theta}] \\ &\le \theta \ln\sum_ {i=1}^n u_i+(1-\theta)\ln\sum_ {i=1}^n v_i \end{align*} which proves that $f(\theta x+(1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$.

2
On

Picaud Vincent's way is better. I tried to solve it by showing the Hessian is positive definite. I'll post it below in case the approach is helpful:

Let us assume that the $a_i$'s are linearly independent, since the Hessian depends smoothly on the $a_i$'s and linearly dependent $a_i$'s can be approximated by linearly independent ones.

Let us change variables $y=Ax$ using some invertible matrix $A$ so that in new variables, $f(x)=g(y)=\ln\sum_1^n e^{y_i}$. The Hessian of $f$ is positive definite if and only if that for $g$ is, which follows from the equality of the quadratic terms in their Taylor expansions: $$ \langle y-y_0,D^2 g(y_0)(y-y_0)\rangle = \langle x-x_0,D^2 f(x_0)(x-x_0)\rangle, $$ so it suffices to compute the Hessian in new variables: $$ \frac{\partial g}{\partial y_i}=\frac{e^{y_i}}{\sum_k e^{y_k}},\\ \frac{\partial^2 g}{\partial y_i\partial y_j}=\frac{e^{y_i}\delta_{ij}}{\sum_k e^{y_k}}-\frac{e^{y_i}}{\sum_k e^{y_k}}\frac{e^{y_j}}{\sum_\ell e^{y_\ell}}. $$ Denote $(Dg)_i=g_i=e^{y_i}/\sum_k e^{y_k}$; then the Hessian is simply $$ \frac{\partial^2 g}{\partial y_i\partial y_j}=g_i\delta_{ij}-g_ig_j. $$ The vector $v=(1,\dots,1)$ of 1's is an eigenvector with eigenvalue 0. This follows from the identity $\langle v,Dg\rangle=\sum g_i=1$. But it's not obvious to me how to proceed.