Identity regarding convexity of the logistic loss function

920 Views Asked by At

I found the following identity regarding the logistic loss function in these lecture notes (slide 16) from Berkeley university:

$$\log(1 + e^{-z}) = \max_{0 \leq v \leq 1} -zv + v\log(v) + (1-v)\log(1-v)$$

But I could find neither proof nor attribution. Although the slides do not specify it, I am assuming that this is supposed to hold for all $z \in \mathbb{R}$. Does anyone know - if the identity does indeed hold - of a proof, or a reference to a proof?

For context, I am trying to prove that $\log(1 + e^{f(x)})$ is convex in $x$ when $f : \mathbb{R}^n \rightarrow \mathbb{R}$, $n \in \mathbb{N}$, is convex but not differentiable in $x$ (i.e. I cannot apply arguments based on the Hessian), as part of a larger proof. This identity should make the proof simple by substituting $z$ for $-f(x)$, as it becomes a maximum of a sum of convex functions, but I have no idea where this result comes from.

More specifically, the function $f$ I'm interested in is a maximum of linear functions. Maybe there is a simpler proof I am missing, any feedback welcome :) .

2

There are 2 best solutions below

3
On BEST ANSWER

For every fixed $z$, the derivative with respect to $v$ of the function in the maximum is $$-z+\log\left(\frac{v}{1-v}\right),$$ hence the function is decreasing for $v\leqslant\nu(z)$ and increasing for $v\geqslant\nu(z)$, where $\nu(z)$ in $(0,1)$ is such that $$\frac{\nu(z)}{1-\nu(z)}=\mathrm e^z.$$ In particular, the maximum in the RHS is either the value of this function at $v=0$ or its value at $v=1$, that is, $$\max\{0,-z\},$$ and not the desired LHS. However, the exact same approach shows that $$\log(1 + \mathrm e^{-z}) = \max_{0 \leqslant v \leqslant 1} -zv - v\log(v) - (1-v)\log(1-v).$$

0
On

It seems the identity is wrong after all. Taking the case of $z=0$, $\log(1 + e^{0}) = \log(2) \approx 0.3$, and plotting the function $v \mapsto v\log(v) + (1-v) \log(1-v)$ between for $0 < v < 1$ makes it clear that its supremum is $0$ (as $v$ tends to either $0$ or $1$) and not $\log(2)$.

Plotting this function with a few different values of $z$, I conjecture that the RHS is equal to $\max(0, -z)$, which is a good approximation to the logistic loss for large negative values of $z$, so the identity still makes some sense. However, I am not sure it will help in proving the convexity of the function I'm interested in.

EDIT: As Did's answer pointed out, the identity holds if one changes the signs to:

$$\log(1 + e^{-z}) = \sup_{0 < v < 1} -zv -v\log(v) - (1-v)\log(1-v) $$

And it can be proven using the same reasoning. This allows me to prove the convexity of the function I am interested in :) .