How to show that $f:\mathbb{R}^n\to\mathbb{R},\quad f(x)=\log\big[\sum_{i=1}^{n}{e^{x_i}}\big]$ is convex on dom$f=\mathbb{R}^n$

411 Views Asked by At

for now I've got the following: set $y_i=e^{x_i}\quad\Rightarrow \frac{\partial y_i}{\partial x_i}=y_i\quad \Rightarrow$

$\nabla_x f=\frac{1}{\sum_{i=1}^{n}{y_i}}[y_1\; \dots\; y_n]^\top\quad\Rightarrow$

$\nabla_x^2f_{ij}=\begin{cases} \frac{1}{\big(\sum_{k=1}^{n}{y_k}\big)^2}y_i(\sum_{k=1}^{n}{y_k}-y_i), &\forall\; i=j\\ -\frac{1}{\big(\sum_{k=1}^{n}{y_k}\big)^2}y_iy_j, & \forall\; i\neq j \end{cases}$

Now I need to show that $\nabla_x^2f$ is positive semidefinite, but I'm stuck here. Also tried first order conditions and direct proof by definition of convexity, but also stuck there.

3

There are 3 best solutions below

2
On BEST ANSWER

Define $S=\sum_i y_i$; then $$\nabla^2 f(x) = S^{-2} ( S \mathop{\textrm{diag}}(y) - yy^T )$$ Now let's show that $\nabla^2 f(x) \succeq 0$; or, equivalently, that $v^T\left(\nabla^2 f(x)\right) v \geq 0$ for all $v\in\mathbb{R}^n$. We have $$ S^2 v^T (\nabla^2 f(x) ) v = v^T (S \mathop{\textrm{diag}}(y) - yy^T) v = \sum_i y_i \cdot \sum_i y_i v_i^2 - \left(\sum_i y_i v_i\right)^2$$ This quantity is nonnegative. To see why, define $z_i=y_i^{1/2}$ and $w_i=y_i^{1/2}v_i$, $i=1,2,\dots, n$. Then $$\sum_i y_i v_i = \sum_i z_i w_i \leq \|z\|_2 \|w\|_2$$ by the Cauchy-Schwarz inequality. Squaring both sides we have $$\left(\sum_i y_i v_i\right)^2 \leq \left( \sum_i z_i^2 \right) \left( \sum_i w_i^2 \right) = \left( \sum_i y_i \right) \left( \sum_i y_i v_i^2 \right)$$ Which establishes that $$\sum_i y_i v_i^2 - \left(\sum_i y_i v_i\right)^2 \geq 0 \quad\Longrightarrow\quad v^T(\nabla^2 f(x))v \geq 0$$ The Hessian is positive semidefinite everywhere. Note that it is not positive definite; $\nabla^2 f(x)\vec{1}=0$ for all $x$. This function is therefore not strictly convex; just "plain old" convex. :-)

0
On

Sketch: Let $a,b\in \mathbb {R}^n.$ It's enough to show $f(a+tb)$ is a convex function of $t\in \mathbb {R}.$ Differentiate this twice with respect to $t$ to see this amounts to showing

$$(\sum_{k=1}^{n}e^{a_k + tb_k})(\sum_{k=1}^{n}b_k^2e^{a_k + tb_k}) \ge (\sum_{k=1}^{n}b_ke^{a_k + tb_k})^2.$$

This inequality can be viewed as

$$\mu(E)\int_E f^2\,d\mu \ge (\int_E f\,d\mu)^2$$

for the right choices of these symbols.

2
On

Yet another approach for those who would like to avoid "messy" job with $n\times n$ Hessians. First we make a remark

Remark: It is easy to see that the function $g(t)=\ln(e^t+1)$ is convex and increasing ($g'(t)\ge 0$ and $g''(t)\ge 0$). Thus, the function $g\circ F$ is convex for any convex $F$.

Let's prove convexity by induction on the dimension.

Base $n=2$: the function $$ h_2(x)=\ln(e^{x_1}+e^{x_2})=\ln\Bigl((e^{x_1-x_2}+1)e^{x_2}\Bigr)=\ln(e^{x_1-x_2}+1)+x_2 $$ is convex by the remark above where $F(x)=x_1-x_2$.

Step $n-1$ to $n$: do the same trick with $x_n$ \begin{align} h_n(x)&=\ln(e^{x_1}+\ldots+e^{x_{n-1}}+e^{x_n})=\\ &=\ln(e^{x_1-x_n}+\ldots+e^{x_{n-1}-x_n}+1)+x_n=\\ &=\ln(e^{\ln(e^{x_1-x_n}+\ldots+e^{x_{n-1}-x_n})}+1)+x_n. \end{align} By the same remark above this function is convex if $\ln(e^{x_1-x_n}+\ldots+e^{x_{n-1}-x_n})$ is convex. But the latter is $h_{n-1}(Ax)$ where $$ A=\left[\matrix{ 1 & 0 & \ldots & 0 & -1\\ 0 & 1 & \ldots & 0 & -1\\ \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & \ldots & 1 & -1 }\right] $$ which is a superposition of convex and affine, which is convex.