$a-b = (e + (a<<s)-(b<<s))>>s$, why it does not work for some $e$?

26 Views Asked by At

If I subtract 2 numbers shifted left to a number and shift right, I get the original number. Example:

$$30 - 1 = ((30<<7) - (1<<7)) >> 7 = 29$$

or in other words

$$a-b = ((a<<s)-(b<<s))>>s$$

The $s$ is chosen carefully so $a<<s$ and $b<<s$ are so big that a small number $e$ added to $(a<<s)-(b<<s)$ won't interfere with the result. In theory, this small number would get added to the lower bits, which would disappear when shifted right again:

$$a-b = (e + (a<<s)-(b<<s))>>s $$

However, in practice, this does not happen:

$$ ((30<<7) - (1<<7)) >> 7 = 29\\ (-4 + (30<<7) - (1<<7)) >> 7 = 28 $$ why the results are different?

1

There are 1 best solutions below

0
On

Your assumption that $e$ added to ... won't interfere" does not hold for negative $e$ because a negative number has (formally) infintely many set high bits when written in 2's complement. For example, take 8-bit numbers then

-2 = 11111110

in 2's complement.


What you are trying to do is to manually form a bit-field where the lower $k$ bits encode $e$ and the upper $n-k$ bits encode $a-b$. You can do that, but you'll have to mask out the upper bits of $e$. When you are decoding the packed value, you'll have to restore the upper bits of $e$ again, i.e. you'll have to sign-extend the signed $k$-bit number $e$ to a signed $n$-bit number.

Here is an example output of a Python3 code. C/C++ would be mostly the same$^1$. $e$ is encoded in the lower 4 bits, and $ab$ is encoded in the upper 8 bits:

ab=+11=000000001011, e= +3=000000000011  =>  enc=000010110011  =>  ab=+11, e= +3
ab=-11=111111110101, e= +3=000000000011  =>  enc=111101010011  =>  ab=-11, e= +3
ab=+11=000000001011, e= -3=111111111101  =>  enc=000010111101  =>  ab=+11, e= -3
ab=-11=111111110101, e= -3=111111111101  =>  enc=111101011101  =>  ab=-11, e= -3
                   bit 3 is the sign-bit of encoded e ---^

Notice how the higher bits are all '1''s for negative values.

The respective Python3 follows. encode encodes e into the 4 LSBs and ab into the upper 8 bits. As e will be encoded in signed 4-bit, it may range from $-8$ to $7$. decode decodes the values and sign-extends e as needed.


$^1$In C/C++ you might raise Undefined Behaviour when you shift a signed value left and it overflows. Hence, in C/C++ always operate on unsigned values internally to avoid UB. But in C/C++ one would likely prefer bit-fields.


#!/usr/bin/env python3

def encode (ab, e, bits_e):
    # Encode E in lower BITS_E bits and AB in upper bits.
    mask_e = (1 << bits_e) - 1
    e_max = mask_e >> 1
    # E is supposed to be signed
    assert e <= e_max and e >= -e_max-1
    return (ab << bits_e) | (e & mask_e)

def decode (val, bits_e):
    # Decode E from the lower BITS_E bits of VAL and AB from upper bits.
    mask_e = (1 << bits_e) - 1
    ab, e = val >> bits_e, val & mask_e
    if 1 & (e >> (bits_e - 1)):
        # if the sign-bit is set, then sign-extend E.
        e |= -1 << bits_e
    return ab, e

def bits (x):
    # 12-bit binary representation as string.
    return "{0:b}".format(x & 0xfff).zfill(12)

def show (ab, e, bits_e):
    enc = encode (ab, e, bits_e)
    dec_ab, dec_e = decode (enc, bits_e)

    print ("ab=%+3d=%s, e=%+3d=%s  =>  enc=%s  =>  ab=%+3d, e=%+3d"
           % (ab, bits(ab), e, bits(e), bits(enc), dec_ab, dec_e))

show (+11, 3, 4)
show (-11, 3, 4)
show (+11, -3, 4)
show (-11, -3, 4)

Note: Instead of sign-extending by means of e |= -1 << bits_e you can also do e -= 1 << bits_e if preferred.