I would like to know why rewriting
$$- x * z + \log(1 + \exp(x))$$
as
$$\max(x, 0) - x * z + \log(1 + \exp(-|x|))$$
can ensure stability and avoid overflow?
Also, what is meant by stability?
Finally, why does the rewriting avoid overflow?
I would like to know why rewriting
$$- x * z + \log(1 + \exp(x))$$
as
$$\max(x, 0) - x * z + \log(1 + \exp(-|x|))$$
can ensure stability and avoid overflow?
Also, what is meant by stability?
Finally, why does the rewriting avoid overflow?
About the overflow: $x > 0$ big means obviously overflow in $\exp(x)$. But in this case $\exp(x)$ is big, $1 + \exp(x)\approx\exp(x)$ and $$\log(1 + \exp(x))\approx x$$
with the alternative formula, $$ \max(x,0) + \log(1 + \exp(-|x|)) = x + \log(1 + \exp(-|x|))\approx x + 0 = x $$ and $-|x| < 0\implies\exp(-|x|)$ does not cause overflow.