Preventing underflow, log sum exp trick

10.9k Views Asked by At

I have some difficulties with understanding the schema to prevent underflow, which is very often mentioned as The log-sum-exp trick, the partial decription is The log-sum-exp trick.

In short, I will describe the general idea.

Assume you need to calculate $w_i=\frac{\prod_{j}^{n}p_{ij}}{\sum_{i}^{n}\prod_{j}^{n}p_{ij}}$, where $p_{ij}$ might be very small value, therefore the overall product is very small and might cause underflow when calculating on computer.

Let's apply $\log$ and $\exp$ to the enumerator and denominator and $z_i=\sum_{j}^{} \log p_{ij}$, then

$w_i = \frac{e^{z_i}}{\sum_{j}^{n} e^{z_j}}$, of course, ensure that $p_i \neq 0$.

The problem with resulting formula is still $e^{z_i}$ might have small value and might cause underflow.

So far everything is ok.

Then $m = \max_i(z_i)$

$w_{i} = \frac{e^{z_i}}{\sum_{j}^{}e^{z_j}}= \frac{e^{z_i-m}}{\sum_{j}^{}e^{z_j-m}}=$

$=0\ if\ z_i-m<-k$

$=\frac{e^{z_i-m}}{\sum_{j:z_j-m \geq -k}^{}e^{z_j-m}}$, otherwise

,when $k$ is some value such that $e^{-k}$ doesn't cause underflow.

The question is why do we need to use $m$ value.

In my opinion, $w_{i} = \frac{e^{z_i}}{\sum_{j}^{}e^{z_j}}$ with the $k$ value, would work just fine, positive values of $z_j$ don't cause the problem, and the negative are filtered out by $-k$, what the reason to artificially decrease the value of $z_j$ by m.

1

There are 1 best solutions below

1
On BEST ANSWER

If you decrease all $z_i$ by $m$, you essentially multiply numerator and denominator by $e^{-m}$, which leaves the fraction unchanged - except that the computations now work without underflow. More precisely, the $m$ is chosen in such a manner that at least one $z_i-m$ is $zero$ (and the others still negative), so that one $e^{z_i-m}$ is simpy $1$; all values that are still very small are ignored (chopped off by comparing with $-k$), which is of no harm as values causing underflow are negligible compared to at least the one $1$ value.