I used 2 separate cases. First case is when $z < 0$ and second case $z \geq 0$. I proved the first case but I have no clue how to deal with the second.
prove that $\log(1+e^z) = \log(1+e^{-|z|}) + \max (z, 0)$ :
we use separate cases. first case, when $ z < 0 $
- $\log(1+e^z)$
- $\operatorname{sign}(z) = -1$ because $z < 0$ therefore $z = -|z|$
- $\log(1+e^z) = \log(1+e^{-|z|})$
- $z < 0$ therefore $\max (z, 0) = 0$
- $\log(1+e^z) = \log(1+e^{-|z|}) + \max (z, 0)$
when $ z \geq 0$
- $\max (z, 0) = z$
- $\log(1+e^z) = \log(1+e^{-|z|}) + z$
- $\log(1+e^z) - \log(1+e^{-|z|}) = z$
- $\log(\frac{1+e^z}{1+e^{-|z|}}) = z$
am I on right path for the second case? how to prove the identity?
When $z \ge 0$, we have $$e^{-|z|} = e^{-z},$$ hence $$\log(1 + e^z) = \log(e^z e^{-z} + e^z) = \log \left(e^z (e^{-z} + 1)\right) = \log e^z + \log (1 + e^{-z}) = z + \log(1 + e^{-|z|}).$$
A crucial hint as to the reason why the expression $\log(1+e^{-|z|}) + \max(0,z)$ is more numerically stable than $\log(1+e^z)$ is seen in the $\max(0,z)$ term. When $z$ is positive and "large," $e^z$ will be extremely large; then adding $1$ will give you a number that is almost the same as $e^z$, and taking the logarithm will give you a number that is almost the same as $z$. But there is very little precision in this result because while you already knew that $\log(1 + e^z) \approx z$ when $z$ is large, you won't be able to preserve the error term with the naive computation.
Let us illustrate this with a numeric example. Suppose $z = 1000$, which by most computer programming languages is not even "large." Then the naive computation is $$\log(1 + e^{1000}).$$ You will note that $e^{1000} \approx 10^{434}$. Adding $1$ and computing the logarithm will give the result $$1000.00000000000000000$$ in most programming languages. But in fact, you can get a far more precise result, which is $$1000 + \log(1 + e^{-1000}) \approx 1000 + e^{-1000} \approx 1000 + 5.076 \times 10^{-435}.$$ The approximation that was used here is the series $$\log (1+z) = z - \frac{z^2}{2} + \frac{z^3}{3} - \ldots, \quad |z| < 1,$$ which converges most rapidly when $|z|$ is close to $0$. As this is certainly the case for $z = e^{-1000}$, with just one term of this series expansion we now have well over $400$ decimal digits of precision. This result simply is not attainable through the naive calculation. In some programming languages, for sufficiently large $z$, the floating-point calculation of $\log(1 + e^z)$ will result in overflow because $e^z$ will be too large to express; e.g., select $z = 10^{100}$.