Problem about cross entropy

30 Views Asked by At

I would like to know why rewriting

$$- x * z + \log(1 + \exp(x))$$

as

$$\max(x, 0) - x * z + \log(1 + \exp(-|x|))$$

can ensure stability and avoid overflow?
Also, what is meant by stability?
Finally, why does the rewriting avoid overflow?

1

There are 1 best solutions below

0
On

About the overflow: $x > 0$ big means obviously overflow in $\exp(x)$. But in this case $\exp(x)$ is big, $1 + \exp(x)\approx\exp(x)$ and $$\log(1 + \exp(x))\approx x$$

with the alternative formula, $$ \max(x,0) + \log(1 + \exp(-|x|)) = x + \log(1 + \exp(-|x|))\approx x + 0 = x $$ and $-|x| < 0\implies\exp(-|x|)$ does not cause overflow.