Show that $x, y \to -\log(\operatorname{sigmoid}(x) - \operatorname{sigmoid}(y))$ is convex for $x > y$

166 Views Asked by At

I've managed to essentially brute force the problem by calculating the Hessian of the function, and showing that its determinant and trace are non-negative.
This was done by using a change of variable to reduce the problem to showing that two certain polynomials are positive over a subset of $[0,1]^2$, proving that it's non-negative in a neighborhood of its zeros, and numerically checking that it's positive away from them.

This solution feels a bit too messy for me, so I was wondering if there isn't a cleaner approach one could use. (I'm aware we could use Sylvester's criterion to simplify the numerical step, but I'd like to avoid using that as well if possible.)

For reference, the expression of the Hessian is. $$H(x,y) = \begin{bmatrix} -s(1-s)(1-2s)(s-t) + s^2(1-s)^2 && -s(1-s)t(1-t) \\ -s(1-s)t(1-t) && t(1-t)(1-2t)(s-t) + t^2(1-t)^2 \end{bmatrix}.$$

where $s=\operatorname{sigmoid}(x),\ t=\operatorname{sigmoid}(y)$.

2

There are 2 best solutions below

1
On BEST ANSWER

Assume $0 \leq t < s \leq 1$.

Consider $f(s,t)=-s(1-s)(1-2s)(s-t) + s^2(1-s)^2 = s(1-s)(s^2+t-2st)$. Each factor is nonnegative (the infimum over $s$ for the third factor occurs at $s=t$), so the (1,1) position of the Hessian is nonnegative. Analogously, the (2,2) position is nonnegative, so the trace is nonnegative.

The determinant is $$\begin{align}g(s,t) &= -s(1-s)(1-2s)t(1-t)(1-2t)(s-t)^2 \\ & \qquad + s^2(1-s)^2t(1-t)(1-2t)(s-t) - t^2(1-t)^2s(1-s)(1-2s)(s-t) \\ &= -s(1-s)t(1-t)(s-t)^2(2st-s-t). \end{align}$$ For the first expression I wrote the Hessian as $(a+b)(c+d)-H_{12}^2$ and noticed that $bd=H_{12}^2$. Then I used this tool to simplify the expression (click more forms to see the one I copied). Now $s(1-s) \geq 0$, $t(1-t) \geq 0$, $(s-t)^2 \geq 0$, so for the Hessian to be nonnegative, it remains to be proven that $2st-s-t\leq0$. We have: $$\sup_{s,t}\{2st-s-t\} = \sup_t \sup_s\{ 2st-s-t \}= \sup_t \begin{cases}t-1 & \text{if } 2t-1\geq 0 \\ -2t(1-t) & \text{otherwise.}\end{cases}$$ When $2t-1\geq 0$, the derivative with respect to $s$ is positive, so the supremum is attained at the largest possible value for $s$ (which is $s=1)$. Conversely, in the second branch you plug in the smallest possible value ($s=t$). Both branches are nonpositive.

Et voila!

1
On

A possible approach is to use the fact that $\log \det X$ is concave for $X$ positive definite. For a proof of this statement see Boyd & Vandenberghe page 74.

Set $$ X = \begin{pmatrix} e^x & e^y \\ (1 + e^y)^{-1} & (1 + e^x)^{-1}\end{pmatrix}$$ such that $\det X = \text{sigmoid}(x) - \text{sigmoid}(y)$ and substitute $a = e^x$ and $b=e^y$. The characteristic polynomial is quadratic and it is a straightforward calculation to show that both eigenvalues of $X$ are positive if $\frac{a}{1+a} > \frac{b}{1+b}$.