How to convert from floating point binary to decimal in half precision(16 bits)?

22.6k Views Asked by At

I'm trying to convert a 16 bit precision binary number to decimal format however I am completely failing to do so.

The binary I'm trying to convert is $0101011101010000$ My current method is:

Separation: $0|10101|1101010000$

Sign = 0

Mantissa = $1.1101010000$

Exponent = $21 - (2^4 - 1) = 6 $

Mantissa Denormalised = $1110101.0000$

This gives an answer of 117. Is this actually correct or am I making a mistake in my method?

3

There are 3 best solutions below

0
On BEST ANSWER

You are right.

You can do that automatically with python and numpy :

import numpy as np
import struct
a=struct.pack("H",int("0101011101010000",2))
np.frombuffer(a, dtype =np.float16)[0]

and you get : 117.0

0
On

Your formula produces the correct result 117.0 in this case but it may fail for subnormal numbers, for NaNs, for +/- infinity.

>>> float_from_unsigned16(int("0101011101010000", 2))
117.0

where float_from_unsigned16(n) (in Python):

def float_from_unsigned16(n):
    assert 0 <= n < 2**16
    sign = n >> 15
    exp = (n >> 10) & 0b011111
    fraction = n & (2**10 - 1)
    if exp == 0:
        if fraction == 0:
            return -0.0 if sign else 0.0
        else:
            return (-1)**sign * fraction / 2**10 * 2**(-14)  # subnormal
    elif exp == 0b11111:
        if fraction == 0:
            return float('-inf') if sign else float('inf')
        else:
            return float('nan')
    return (-1)**sign * (1 + fraction / 2**10) * 2**(exp - 15)

See binary16.py

0
On

thank you @jfs - I wrote a Ruby version

def float_from_unsigned16(input, debug: false)
  raise 'INPUT OUT OF RANGE' unless (0..2**16).include?(input)

  sign = input >> 15
  exp = (input >> 10) & 0b011111
  fraction = Float(input & (2**10 - 1))

  $stdout&.puts "sign (#{sign.zero? ? '+' : '-'}) fraction #{fraction} exp #{exp}" if debug

  if exp.zero?
    return 0.0 if fraction.zero?

    (-1)**sign * fraction / 2**10 * 2**-14
  elsif exp == 0b11111
    raise 'NaN' unless fraction.zero?

    sign.zero? ? -Float::INFINITY : Float::INFINITY
  else
    (-1)**sign * (1 + fraction / 2**10) * 2**(exp - 15)
  end
end
```