difficult in understanding the concept of underflow in deep learning

632 Views Asked by At

I'm currently reading the deep learning book written by Ian Goodfellow, In chapter 4, there is a paragraph about underflow "One example of a function that must be stabilized against underflow and overflow is the softmax function. The softmax function is often used to predict the probabilities associated with a multinoulli distribution. The softmax function is defined to be $$softmax(x)i =exp(xi)/\sum_{j=1}^{\infty} exp(x_j)$$

Consider what happens when all the xi are equal to some constant c. Analytically, we can see that all the outputs should be equal to 1/n . Numerically, this may not occur when c has large magnitude. If c is very negative, then exp(c) will underflow. "

My understanding is that, when sofmax(x)i (namely 1/n) equals to a tiny small number, it will lead to underflow. So why underflow occurs when c has very large negative value?

thanks~

1

There are 1 best solutions below

0
On

As mentioned in the comments it deals with floating point precision in computer implementations. However, it doesn't become either NaN or $\infty$; rather it goes to 0. Mathematically $\exp$ is never 0, but on a computer with finite precision if you take $\exp$ of a very deep negative number, this will be abbreviated to 0 exactly

More particularly, floating point numbers represent numbers in a binary (base-2) analogue of scientific notation approximations:

$$x \approx {1.b_1b_2b_3...b_n}_{(2)} \times 10_{(2)}^e$$

where the "(2)" means to take the given number as being in base-2 representation, so "10" above is not ten but two, and $b_j \in \{ 0, 1 \}$. It is actually not the precision $n$ of the significand that matters but the precision in storing the exponent $e$. $e$ is only a finite precision integer, essentially, so can only get so negative. When it gets too negative, it "underflows" and the the computer sets the whole floating point number to zero, because it cannot distinguish a number closer to zero than that lowest (most negative) possible exponent will permit. Or at least simplistically -- in actual IEEE 754 floating point standard there is a process called "gradual underflow" in which when the exponent attains its lowest (most negative) value, the "1" above (which is not actually stored but figured implicitly) before the binary point becomes a "0", so that one can add leading zeroes into the $b_j$ for a little bit more dynamic range at the cost of lost precision -- this is like writing $0.005 \times 10^{-5}$ in "improper" decimal scientific notation. Once you get to $n$ more powers of 2 down, though, all the figures become zero and the whole number is then zero, and we have complete underflow. If this happens to the weight calculation, the more serious problem is that you get a zero in the denominator (which will definitely happen if ALL the $\exp(x_i)$ underflow which is exactly what happens when you set all the $x_i$ to the same underflowing constant value $c$) and then a divide by zero error ensues and your program crashes or emits incorrect results. For IEEE 754 doubles underflow begins to occur at around a natural log (or weight factor $c_i$) of about $-709$. More specifically it will appear to the computer to give $\frac{0}{0}$, indeterminate, which IEEE 754 stipulates will be set to NaN (Not-a-Number). Mathematically, of course, it will be 0: the infinite sum of constant values, no matter how small, will diverge to infinity in the denominator and anything over infinity is 0. (Provided that is the correct equation. I am not sure what is meant by the "outputs" in the passage, so I'm not sure where $\frac{1}{n}$ comes from, but I suppose to get it you cannot have $\mathrm{softmax}$ hit zero.)

PS. I'd consider this for a move to one of the comp. sci.-related fora on StackExchange. There's math but it also seems more strongly related to computer programming.