Mapping logarithmic ranges to linear

348 Views Asked by At

I'm trying to map the table 3.2.5 from RFC1951 from its distance values to code values. The table as follows (ignore the bits column, ranges are inclusive):

         Code Bits Dist  Code Bits   Dist     Code Bits Distance
         ---- ---- ----  ---- ----  ------    ---- ---- --------
           0   0    1     10   4     33-48    20    9   1025-1536
           1   0    2     11   4     49-64    21    9   1537-2048
           2   0    3     12   5     65-96    22   10   2049-3072
           3   0    4     13   5     97-128   23   10   3073-4096
           4   1   5,6    14   6    129-192   24   11   4097-6144
           5   1   7,8    15   6    193-256   25   11   6145-8192
           6   2   9-12   16   7    257-384   26   12  8193-12288
           7   2  13-16   17   7    385-512   27   12 12289-16384
           8   3  17-24   18   8    513-768   28   13 16385-24576
           9   3  25-32   19   8   769-1024   29   13 24577-32768

I need a mathematical function that maps the distance values to the code values e.g. f(777) = 19, f(9000) = 26. Both input and output values are integer and I know there must be some expression that would work using Log2 functions. The range of distance doubles every 2 codes (except code 0 and 1) e.g. there are 4096 distinct values for codes 26 and 27, 8192 for 28 and 29, etc.

Exceptions can be made for codes 0 and 1

3

There are 3 best solutions below

0
On

In the end, I figured the simplest solution is to just map all the values rather than calculating it each time. In python:

DISTANCE_CODES = bytearray([255, 0, 1, 2, 3, 4, 4, 5, 5] +
                           4 * [6] + 4 * [7] + 8 * [8] + 8 * [9] +
                           16 * [10] + 16 * [11] + 32 * [12] + 32 * [13] +
                           64 * [14] + 64 * [15] + 128 * [16] + 128 * [17] +
                           256 * [18] + 256 * [19] + 512 * [20] + 512 * [21] +
                           1024 * [22] + 1024 * [23] + 2048 * [24] +
                           2048 * [25] + 4096 * [26] + 4096 * [27] +
                           8192 * [28] + 8192 * [29])

and then just reference the value by code = DISTANCE_CODES[distance]. Note that distance cannot be 0 and the value at DISTANCE_CODES[0] is invalid but necessary as python index starts from 0.

1
On

Every second entry is a power of two, but the entry in between is situated at the midpoint of consecutive powers of two (e.g. 24 is halfway between 16 and 32). This strongly resembles IEEE-754 binary floating-point formats, with exponents assigned based on the base-2 logarithm, but the significand progressing linearly within each binade.

This can be exploited by representing the distance in IEEE-754 binary32 format, mapped to the C\C++ float type, for example. Then extract from the underlying binary bits the relevant exponent bits for the logarithmic portion, and the most significant bit of the significand (mantissa). Since the upper bound of distance for of each code interval is $2^n$ rather than $2^{n}-1$, a small correction must be made to the distance prior to bit extraction. Subtracting $0.5$ provides a suitable correction to map the upper bound correctly. The lowest interval cannot be handled by this general algorithm and must be special cased.

The ISO-C99 code below demonstrates the computation in function code() and also includes a test framework for checking proper operation. Most programming languages provide a mechanism for re-interpreting the bits of a floating-point operand as an (unsigned) integer. In C and C++ the canonical way of doing so is invoking memcpy(), which in this context typically gets compiled to a single instruction for most hardware platforms. A quick test with Compiler Explorer shows that code() compiles to fewer than ten instructions on ARM64 and x86-64.

If $d$ is the distance, and $c$ is the code to be generated, code() can be mathematically described as:

$$c=\begin{cases} 0, &x\le 1\\ \left \lfloor \left(d - \frac{1}{2}\right) 2^{1- \lfloor \log_2{(d - \frac{1}{2})} \rfloor} \right \rfloor + 2 \left \lfloor\log_2{\left(d-\frac{1}{2}\right)} \right \rfloor -2, &x \gt 1 \end{cases} $$ A straightforward translation into common programming languages is going to be functional but likely quite slow.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

// reinterpret bit pattern of IEEE-754 'binary32' as a 32-bit unsigned integer
uint32_t float_as_uint32 (float a) 
{
    uint32_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

// compute code based on semi-logarithmic partitioning of input domain
int code (int dist)
{
    const int FP32_MANT_BITS = 23;
    float fd = dist - 0.5f;
    uint32_t i = float_as_uint32 (fd) - float_as_uint32 (1.0f);
    return (dist <= 1) ? 0 : (i >> (FP32_MANT_BITS - 1));
}

int main (void)
{
    int dist [2*30] = {
        1,1,
        2,2,
        3,3,
        4,4,
        5,6,
        7,8,
        9,12,
        13,16,
        17,24,
        25,32,
        33,48,
        49,64,
        65,96,
        97,128,
        129,192,
        193,256,
        357,384,
        385,512,
        513,768,
        769,1024,
        1025,1536,
        1537,2048,
        2049,3072,
        3073,4096,
        4097,6144,
        6145,8192,
        8193,12288,
        12289,16384,
        16385,24576,
        24577,32768
    };
    for (int i = 0; i < 30; i++) {
        printf ("code=%d  dist=%d\n", code (dist[2*i]), dist[2*i]);
        printf ("code=%d  dist=%d\n\n", code (dist[2*i+1]), dist[2*i+1]);
    }
    return 0;
}
5
On

Hat tip to @nuffja for noticing there are transitions at the linear half-way point, not logarithmic.

In "math":

$$f(n) = \log(n-1) \ \ / \ \log(2)$$

$$\text{code}(n \ge 1) = \begin{cases} n-1, & n \lt 3 \\[2ex] \lfloor f \rfloor + \lfloor f+a \rfloor, & n \ge 3 \end{cases}$$

where $a = 1 - \log(3/2) \ \ / \ \log(2)$, $n$ is a positive integer and $\lfloor f \rfloor$ is the floor function (rounds down to integer).

In Python:

import numpy as np

def fun(n):
    if n < 3:
        result = n - 1
    else:
        f = np.log(n-1) / np.log(2)   # needs n-1
        a = 1 - np.log(1.5) / np.log(2)   # halfway point https://math.stackexchange.com/a/4433235/284619
        y0 = f.astype(int)   # ensures integer results
        y1 = (f + a).astype(int)
        result = y0 + y1
    return result