How can I evaluate this function numerically stable?

624 Views Asked by At

I have a cross-entropy like function, that I want to evaluate. However, I encounter numerical instabilities due to overflows.

I want to compute the cross-entropy of the following function:

$$\sigma(p,q) = \frac{2}{1+e^{d}} \text{ with } d=||p-q||_2^2$$

To be specific, for $z$ in $\{0,1\}$:

$$-z\log(\sigma(p,q))-(1-z)\log(1-\sigma(p,q))$$

I know that there are efficient ways to this for the standard sigmoid

$$\sigma(x) = \frac{1}{(1+e^{-x})}$$

which is also described here:

Doing it in a similar manner, I derived the following equivalent formula:

$$z\log(\frac{1}{2}) + \log(1+e^d) + (z-1)\log(e^d-1))$$

How do I go on from here?

There are problems for $d\approx0$ as well as $d>>0$.

2

There are 2 best solutions below

7
On BEST ANSWER

For $x \approx 0$ use the functions $\mathrm{log1p}(x) = \log(1+x)$ and $\mathrm{expm1}(x) = e^x-1$ if available. The other expressions can be computed with $$\log(1+e^d) = \mathrm{log1p}(e^d)$$ and $$\log(e^d-1) = \log(\mathrm{expm1}(d))$$

If the functions are not available use the first terms of the Maclaurin series. For C look e.g. here and here. I guess, today most programming languages have them because they are a recommend operations of the IEEE 754 Floating point standard.

For large $d$ you can simplify $e^d\pm 1 \approx e^d$ and get $\log(1+e^d) \approx \log e^d = d.$

For the complete equivalent formula and large $d$ I just computed the asymptotic expansion with Maple as

$$z d - z \log(2)+(2-z)e^{-d}-\tfrac{1}{2}ze^{-2d} + O(e^{-3d})$$ But I do not know if there is significant cancellation.

0
On

Because issues differ, it seems advisable to examine the two flavors of the function $\mathrm{f_{z}}(d)$ separately. We have

$$\quad\quad\quad\; \mathrm{f_{0}}(d) = \log (1 + \exp(d)) - \log (\exp(d) - 1)$$ and $$\mathrm{f_{1}}(d) = \log (1 + \exp(d)) + \log \left(\frac{1}{2}\right)$$

Most programming environments offer standard mathematical functions $\mathrm{log1p}(x) = \log(1 + x)$ and $\mathrm{expm1}(x) = \exp(x) - 1$ that provide accurate computation near unity. However, this still leaves issues of subtractive cancellation and premature overflow that need to be resolved.

When evaluating the stated mathematical formula for $\mathrm{f_{1}}(d)$ with IEEE-754 binary floating-point arithmetic, the term $\exp(d)$ will cause overflow in intermediate computation as $d$ increases before $\mathrm{f_{1}}(d)$ itself overflows. How to get around this is shown in this draft publication (note that similar analysis is shown in comments below gammatester's answer):

Martin Mächler, "Accurately Computing $\log(1 − \exp(− \lvert a \rvert))$. Assessed by the Rmpfr package", ETH Zurich, Oct. 2012. (online)

It points out that for sufficiently large $x$, $\log(1 + \exp(x)) = x + \exp(-x)$ to within the precision of a given floating-point format. For IEEE-754 (2008) binary64, usually called double precision, the condition $x \gt 18.1$ has to be satisfied for the substitution to be suitable. This gives us

$$\mathrm{f_{1}}(d) \approx \exp(-d) + \log\left( \frac{1}{2} \right) + d$$

However, there is another issue with $f_{1}(d)$, in that accuracy diminishes during the evaluation of the mathematical formula in floating-point arithmetic as $d$ approaches zero. A workaround for this is to expand this function locally. After first noticing that $\mathrm{f_{1}}(d) \approx \frac{1}2{}d$ near zero, I determined additional terms of the expansion numerically by examining the residual after each term was added. Resulting approximation is:

$$\mathrm{f_{1}}(d) \approx \frac{1}{2}d + \frac{1}{8}d^{2} - \frac{1}{192}d^4 + \frac{1}{2880}d^6$$

I have experimentally determined, by comparing a double-precision implementation against a multi-precision math package configured for 1000 bits of precision, that this approximation is suitable for $x \lt 0.036$. This polynomial should be evaluated using Horner's scheme, that is, by using nested operations.

By using the above approximations for $d \in [0,0.036)$, $[0.036, 18.1)$, and $[18.1, \infty]$, $\mathrm{f_{1}}(d)$ can be computed with relative error $\lt 5\cdot 10^{-15}$ across the entire input domain, assuming high-quality implementations of $\mathrm{log1p}$ and $\mathrm{expm1}$ are available. I used Intel's MKL for my work. To prevent detrimental interference from the compiler, code should be compiled with the strictest available floating-point mode (for example, /fp:strict or -fp-model strict in the case of the Intel C compiler). My actual C implementation for reference:

/* compute f1(d) = log(1+exp(d))+log(0.5) in numerically stable fashion */
double f1 (double d)
{
    double r;
    if (d < 0.036) { /* small arguments: avoid subtractive cancellation */
        double d2 = d * d;
        r = (((1.0/2880.0) * d2 - (1.0/192.0)) * d2 + 0.125) * d2 + 0.5 * d;
    } else if (d >= 18.1) { /* large arguments: avoid premature overflow */
        r = exp (-d) + log (0.5) + d;
    } else { /* medium-size arguments: use mathematical formula directly */
        r = log1p (exp (d)) + log (0.5);
    }
    return r;
}

As for $\mathrm{f_{0}}(d)$, there is an obvious and major problem with subtractive cancellation as $d$ becomes larger, causing $\log(1+\exp(d)) \approx \log(\exp(d)-1)$. We can eliminate the numerically undesirable subtraction by simple algebraic re-arrangement:

$$\log(1 + \exp(d)) - \log(\exp(d) -1) = \log\left(1 + \dfrac {2}{\exp(d) -1}\right)$$

In their answer, gammatester provides an asymptotic expansion of $\mathrm{f_{z}}(d)$ that for $\mathrm{f_{0}}$ simplifies to

$$\mathrm{f_{0}}(d) \approx 2\exp(-d)$$

For simplicity, we can chose the switch-over point $d=18.1$ as in $\mathrm{f_{1}}(d)$. Testing shows very good accuracy of this arrangement: the maximum relative error is less than $3\cdot10^{-16}$ across the entire input domain. The corresponding C implementation looks as follows:

/* compute f0(d) = log(1+exp(d))-log(exp(d)-1) in numerically stable fashion */ 
double f0 (double d)
{
    double r;
    if (d >= 18.1) { /* large arguments: use asymptotic approximation */
        r = 2.0 * exp (-d);
    } else { 
        r = log1p (2.0 / (expm1 (d)));
    }         
    return r;
}

On platforms where $\mathrm{log1p}$ and $\mathrm{expm1}$ functions aren't available they can be emulated in relatively straightforward manner using existing $\log$ and $\exp$ functions with only a minor impact on accuracy. Exemplary C code is shown below. It is extremely important that the compiler does not re-order the mathematical operations.

/* David Goldberg, "What every computer scientist should know about floating-
   point arithmetic", ACM Computing Surveys, Volume 23, Issue 1, March 1991. */
double my_log1p (double x)
{
    double s, t;
    t = 1.0 + x;
    if (t == 1.0) {
        t = x;
    } else {
        s = t - 1.0;
        t = log (t);
        if (x < 1.0) {
            t = x * t;
            t = t / s;
        }
    }
    return t;
}

/* William Kahan's trick */
double my_expm1 (double x)
{
    double s, t;
    s = exp (x);
    t = s - 1.0;
    if (t == 0.0) {
        t = x;
    } else if (fabs (x) < 1.0) {
        s = log (s);
        t = x * t;
        t = t / s;
    }
    return t;
}