The second derivative of $\log\left(\sum\limits_{i=1}^ne^{x_i}\right)$ seems negative, but I have to prove the function is convex

4.5k Views Asked by At

For the second derivative I got $$\frac{\partial^2}{\partial x_k x_j}\log \left(\sum_{i=1}^{n} e^{x_i}\right)=-\frac{e^{x_k}e^{x_j}}{\left(\sum_{i=1}^{n} e^{x_i}\right)^2},$$ where $j \neq k$, and $1 \le j,k \le n$.

This Hessian is negative, which can't allow $\log(\sum_{i=1}^{n}{e^{x_i}})$ to be convex, but I am asked to show that $\log(\sum_{i=1}^{n}{e^{x_i}})$ is convex. Obviously something is wrong here, so unless the second derivative I showed is actually positive, then I must have just gotten the wrong Hessian.

Someone helped me calculate that:

$$\frac{\partial^2}{\partial x_k^2}\log \left(\sum_{i=1}^{n} e^{x_i}\right)=\frac{e^{x_k}\left(\sum_{i=1}^{n} e^{x_i}-e^{x_k}\right)}{\left(\sum_{i=1}^{n} e^{x_i}\right)^2},$$

2

There are 2 best solutions below

4
On BEST ANSWER

Simply a story of variance...

Using the short-hands $$t_i=s\cdot e^{x_i}\qquad s=\left(\sum_je^{x_j}\right)^{-1}$$ the diagonal and off-diagonal entries of the Hessian are $$t_i(1-t_i)\qquad\text{and}\qquad -t_it_j$$ respectively, hence it suffices to show that $Q(u)\geqslant0$ for every $u=(u_i)$, where $$Q(u)=\sum_i t_i(1-t_i)u_i^2- \sum_{i\ne j}t_it_ju_iu_j$$ But, by construction, $$\sum_it_i=1$$ hence $$Q(u)=\sum_it_iu_i^2\cdot\sum_it_i-\left(\sum_it_iu_i\right)^2$$ hence Cauchy-Schwarz inequality shows that indeed $Q(u)\geqslant0$.

0
On

There is nothing wrong. What you have only shown is not that the Hessian matrix $H$ is negative definite, but merely that the off-diagonal entries of $H$ are negative. The matrix $\pmatrix{1&-1\\ -1&1}$, for instance, has negative off-diagonal entries, but the matrix itself is positive semidefinite.

In your case, it only takes a little more work to show that the $i$-th diagonal entry of the Hessian matrix $H$ is given by $$ \frac{e^{x_i}}{\sum_{i=1}^ne^{x_i}}-\left(\frac{e^{x_i}}{\sum_{i=1}^ne^{x_i}}\right)^2. $$ Therefore $H$ is a (weakly) diagonally dominant matrix with a nonnegative diagonal, meaning that its eigenvalues have nonnegative real parts (Gershgorin disc theorem). As $H$ is also real symmetric, it has real eigenvalues. Hence all its eigenvalues of $H$ are nonnegative, i.e. $H$ is positive semidefinite.