I'm trying to understand how computers calculate (approximate) the exponential function.
I'm reading the code of the Cephes library, specifically this code.
From what I understand, it finds a power of two such that:
$$e^x = e^f * 2^k$$ Where k is an integer and f a fraction. The fractional part is then calculated with Padè. The fractional part is pretty clear to me, what I don't understand is how it decides which k and f to use.
In the comments it points out that
$$ a:\; e^x = e^f * 2^k = e^f * e^{k*ln(2)} = e^{f + k*ln(2)} $$ And thus $$ b: \; x = f + k*ln(2) $$
In the code it gets to k by doing:
$$ k = floor( x / ln(2) + 0.5) $$
What is the reasoning behind this? Looks like it solves expression b for k and then drops the f term.
Then f is calculated by doing:
$$ f = x - (k * 0.693145751953125) - (k * 0.00000142860682030941723212)$$
i don't understand where these constants come from, there is no comment in the code about them specifically.
The first one (called C1 in the code) seems to be close to ln(2). The other one is a mystery.
Could you help me figure this out?
$\lfloor x+0.5 \rfloor$ is a rounding idiom, it rounds $x$ to the nearest integer. In the argument reduction for $\exp$, this is used to minimize the magnitude of $f$. With $k := \lfloor x / \ln 2 + 0.5\rfloor$, $f$ will be in $[-\ln\sqrt{2}, \ln\sqrt{2}]$.
If we were to give self-explanatory names to the "magical" constants
C1andC2, we might call themln2_hiandln2_lo. This is a floating-point representation of the mathematical value $\ln 2$, withln2_hiholding the most significant bits andln2_loholding the least significant bits. Their sum represents $\ln 2$ to better than double precision. The purpose of this is to mitigate the issue of subtractive cancellation during the argument reduction and deliver a reduced argument $f$ that is as accurate as possible.Mathematically, $\mathrm{ln2\_hi} := \lfloor 2^{16} \cdot \ln 2 + 0.5\rfloor / 2^{16}$ and $\mathrm{ln2\_lo} := \ln 2 - \mathrm{ln2\_hi}$. If we look at the bit pattern of
ln2_hias an IEEE-754 double-precision (binary64) number, we find that it has 37 trailing zero bits in the significand. At the same time, when restricting to cases that do not overflow the representable range of an IEEE-754 double-precision number or underflow to zero, $k$ is in $[-1074, 1024]$. This means that only 11 leading bits in the significand of a double-precision number are needed to represent $k$, and therefore the product of $k$ and $\mathrm{ln2\_hi}$ can be computed exactly as a double precision number. The subsequent subtraction $x- k\cdot\mathrm{ln2\_hi}$ then is also exact by the Sterbenz lemma.