Go CBOR encoder: Episode 9, floating point numbers
This is a tutorial on how to write a CBOR encoder in Go, where we’ll learn more about reflection and type introspection.
Make sure to read the previous episodes, each episode builds on the previous one:
- Episode 1, getting started
- Episode 2, booleans
- Episode 3, positive integers
- Episode 4, reflect and pointers
- Episode 5, strings
- Episode 6, negative integers and arrays
- Episode 7, maps
- Episode 8, structs
This episode is about floating point numbers. There are 3 kinds of floating point numbers supported by CBOR:
Go only supports float32 & float64 natively. To support 16 bits numbers we will build the 16 bits values ourselves. We’ll implement 32 & 64 bits floats first, and then do the 16 bits numbers. We’ll minimize the size of the output by encoding numbers as tightly as possible, this means we’ll use 64 bits numbers only when using smaller numbers would lose precision. We don’t want to lose information or precision, the encoded numbers have to be exact.
As usual we take some examples from the CBOR spec, and look for numbers that can only be represented as 32 and 64 bits floats and add a test case for them. We find that 100,000.0 can be encoded exactly with a float32, while 1.1 can only be represented by a float64.
We start with those two examples and add the new test:
func TestFloat(t *testing.T) {
var cases = []struct {
Value float64
Expected []byte
}{
{
Value: 1.1,
Expected: []byte{0xfb, 0x3f, 0xf1, 0x99, 0x99, 0x99, 0x99, 0x99, 0x9a},
},
{Value: 100000.0, Expected: []byte{0xfa, 0x47, 0xc3, 0x50, 0x00}},
}
for _, c := range cases {
t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
testEncoder(t, c.Value, c.Expected)
})
}
}
To decide whether to use float32 or float64 for a value we convert the value to float32 and compare it to the original float64 value. If both values are the same we can safely encode the number as a float32 without losing precision. Let’s add a new function writeFloat to do that:
const (
// floating point types
minorFloat16 = 25
minorFloat32 = 26
minorFloat64 = 27
)
func (e *Encoder) writeFloat(input float64) error {
if float64(float32(input)) == input {
if err := e.writeHeader(majorSimpleValue, minorFloat32); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, float32(input))
} else {
if err := e.writeHeader(majorSimpleValue, minorFloat64); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, input)
}
}
We add writeFloat to our big switch statement on the input’s type:
switch x.Kind() {
...
case reflect.Float32, reflect.Float64:
return e.writeFloat(x.Float())
}
go test
confirms TestFloat passes. We are done with 32 and 64 bits floats. The
first part was easy, but the second part won’t be this simple: there’s more work
ahead of us.
Next let’s add support for 16 bits floats. As mentioned before Go doesn’t support float16 natively, so we’ll generate the binary value ourselves. What kind of number can we store in a 16bits float? A 16 bits float looks like this:
SEEEEEFFFFFFFFFF
S is the sign bit, 0 for positive, 1 for negative. EEEEE is the 5 bits exponent, and FFFFFFFFFF is the 10 bits fractional part.
According to the IEEE 754 spec the 5 bits exponent’s range is -14 to 15. if a number’s exponent is within those limits we can encode it as a 16 bits float.
The 10 bits fractional is quite a bit smaller than the 23 bits of the 32 bits floats’ exponent. We may lose precision when we chop off the end of a number’s fractional part: if there’s a 1 anywhere in those dropped bits we lose precision. To prevent this we will use a bit mask to ensure we’re not dropping any bits. In summary we can encode a number as a 16 bits if and only if:
- Its exponent is between -14 and 15
- Its fractional part doesn’t have any 1’s after its 10th bit
Let’s write some code: first we break down our numbers into those 3 parts. We add the function unpackFloat64 to decompose float64 into its sign bit, exponent, and fractional part. We unpack 64 bits floats because it’s the type with the highest precision we support, and all numbers can be represented as float64. We also add constants at the top to use for bit mask and shifting operations:
const (
float64ExpBits = 11
float64ExpBias = 1023
float64FracBits = 52
expMask = (1 << float64ExpBits) - 1
fracMask = (1 << float64FracBits) - 1
)
func unpackFloat64(f float64) (exp int, frac uint64) {
var r = math.Float64bits(f)
exp = int(r>>float64FracBits&expMask) - float64ExpBias
frac = r & fracMask
return
}
math.Float64bits converts the floating number to a uint64 type containing float64’s raw binary value. We then extract the exponent by shifting r by float64FracBits and mask it with expMask to trim off the bit sign. The result is converted to an integer and we subtract the exponent’s bias from it to get the real exponent value. The fractional part is extracted with a bit mask.
We’ll refactor writeFloat and introduce unpackFloat64, add use a bit mask to determing what type we should use. The exponent range of 32 bits float is -126 to 127 and we need at least float32MinZeros = 23 - 10 = 13 trailing zeros at the end of the fractional part.
We use a switch case with the smallest type first and use float64 only if float16 and float32 don’t work:
func (e *Encoder) writeFloat(input float64) error {
var (
exp, frac = unpackFloat64(input)
trailingZeros = bits.TrailingZeros64(frac)
)
if trailingZeros > float64FracBits {
trailingZeros = float64FracBits
}
switch {
case (-14 <= exp) && (exp <= 15) && (trailingZeros >= float16MinZeros):
// FIXME write float16 here
return ErrNotImplemented
case float64(float32(input)) == input:
if err := e.writeHeader(majorSimpleValue, minorFloat32); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, float32(input))
default:
if err := e.writeHeader(majorSimpleValue, minorFloat64); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, input)
}
}
go test
still works, because we haven’t added any test to verify 16 bits
floats work. Let’s add support for 16 bits floats and add a test case for it.
1.0 is an easy number to represent with 16 bits, so we start with that:
...
{Value: 1.0, Expected: []byte{0xf9, 0x3c, 0x00}},
...
To write 16 bits floats we add a new method writeFloat16 that takes all three parameters needed to build a 16 bits float: sign bit, exponent, and fractional. We turn them into a single 16 bits integer, and write the value to the output:
func (e *Encoder) writeFloat16(negative bool, exp uint16, frac uint64) error {
if err := e.writeHeader(majorSimpleValue, minorFloat16); err != nil {
return err
}
var output uint16
if negative {
output = 1 << 15 // set sign bit
}
output |= exp << float16FracBits
output |= uint16(frac >> (float64FracBits - float16FracBits))
return binary.Write(e.w, binary.BigEndian, output)
}
Finally we hook up writeFloat16 to writeFloat with a switch case. We check the exponent’s range and that we’re not dropping any 1’s at the end of the fractional for float16 and float32, if none match we fall-back to float64:
func (e *Encoder) writeFloat(input float64) error {
var (
exp, frac = unpackFloat64(input)
trailingZeros = bits.TrailingZeros64(frac)
)
switch {
case (-14 <= exp) && (exp <= 15) && (trailingZeros >= float16MinZeros):
return e.writeFloat16(math.Signbit(input), uint16(exp+float16ExpBias), frac)
case float64(float32(input)) == input:
if err := e.writeHeader(majorSimpleValue, minorFloat32); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, float32(input))
default:
if err := e.writeHeader(majorSimpleValue, minorFloat64); err != nil {
return err
}
return binary.Write(e.w, binary.BigEndian, input)
}
}
Our encoder handles float16, we’ve covered all 3 floating point number types. It looks like we’re done with floats, but there’s still more cases and special numbers we have to take care of. In the next episode we’ll add support for more special numbers: Zero, Infinity, Not A Number, and subnormal numbers.
Check out the repository with the full code for this episode.