# Go CBOR encoder: Episode 10, special floating point numbers

This is a tutorial on how to write a CBOR encoder in Go, where we’ll learn more about reflection and type introspection.

Make sure to read the previous episodes, each episode builds on the previous one:

- Episode 1, getting started
- Episode 2, booleans
- Episode 3, positive integers
- Episode 4, reflect and pointers
- Episode 5, strings
- Episode 6, negative integers and arrays
- Episode 7, maps
- Episode 8, structs
- Episode 9, floating point numbers

In the previous episode we added floating point number support to our encoder.

We minimized the size of the output without losing precision. There’s still room for improvement though: we encode all regular floating point numbers as 16 bits numbers when possible, but there are also special numbers in the standard IEEE 754 that can be packed more efficiently:

- Subnormal numbers, also called denormal numbers, denormalized numbers, or subnumbers. They includes 0 which can’t be encoded accurately as a regular floating point number
- Infinities
- Not a Number

The way the encoder works now these special values are all encoded as 32 or 64 bits floats, and lots of them we can be encoded as 16 bits numbers without losing information.

We’ll starts with infinite values, then not a number values, and finish subnormal numbers.

For infinite values, there are two types: positive and negative. The only thing that changes with infinite values is the sign bit, the exponent is all 1’s, and the fractional part is all 0’s. Infinite values are easy to detect in Go with the math.IsInf function. To detect infinites values we add an if block with math.IsInf at the beginning of the writeFloat function, and write a 16 bits float with all 1’s in the exponent and all 0’s in the fractional:

```
func (e *Encoder) writeFloat(input float64) error {
if math.IsInf(input, 0):
return e.writeFloat16(math.Signbit(input), (1<<float16ExpBits)-1, 0)
}
...
}
```

Nan or Not a number is similar to infinites but has a changing fractional part. The fractional part of a NaN carries some information, we’ll copy it as is and just chop off the end, all the important information is in the first few bits. We add the following to the second switch statement in writeFloat:

```
func (e *Encoder) writeFloat(input float64) error {
...
var (
exp, frac = unpackFloat64(input)
)
...
switch {
case math.IsNaN(input):
return e.writeFloat16(math.Signbit(input), 1<<float16ExpBits-1, frac)
...
}
}
```

And that’s all we need for not a number. To verify we implemented it correctly we add the corresponding test cases from the CBOR spec in cbor_test.go:

```
func TestFloat(t *testing.T) {
var cases = []struct {
Value float64
Expected []byte
}{
...
{Value: math.Inf(1), Expected: []byte{0xf9, 0x7c, 0x00}},
{Value: math.NaN(), Expected: []byte{0xf9, 0x7e, 0x00}},
{Value: math.Inf(-1), Expected: []byte{0xf9, 0xfc, 0x00}},
...
}
for _, c := range cases {
t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
testEncoder(t, c.Value, c.Expected)
})
}
}
```

We now store tightly infinites and not a number, but here comes the hard part: subnormal numbers. There’s a lot of bit fiddling ahead.

When an exponent’s binary value is all 0’s, it means we have a subnormal number, and Zero is a subnormal number. Zero needs a special number because it cannot be represented precisely when fractional part is prefixed by a 1 like with regular floating point numbers. Even if the factional was all zeros and the exponent very small, a regular floating point number can’t precisely represent 0 because there’s always a 1 somewhere in in the fractional (like 0.000…01). Therefor we have subnormal numbers that start with a 0 instead of a 1 to represent zero precisely and other very small numbers more accurately.

Let’s start by encoding efficiently zero and negative zero. Negative zero is zero with its sign bit set to one. Here are the two test cases we add to our TestFloat test in cbor_test.go:

```
...
{Value: 0.0, Expected: []byte{0xf9, 0x00, 0x00}},
{Value: math.Copysign(0, -1), Expected: []byte{0xf9, 0x80, 0x00}},
...
```

To get a negative zero in Go we have to use the math.Copysign function, because the compiler turns the expression -0.0 into a positive zero. We turn the if statement at the beginning into a switch, with an additional case to detect zero, and encode it as a 16 bits float to preserve space:

```
func (e *Encoder) writeFloat(input float64) error {
switch {
case input == 0:
return e.writeFloat16(math.Signbit(input), 0, 0)
case math.IsInf(input, 0):
...
}
...
}
```

We don’t check if the input equals -0 because -0 equals 0. Zeros are done!

What other numbers can we represent as subnormal numbers? Let’s learn more about them and the difference with regular numbers. Here’s the formula for regular 16 bits floating point numbers:

(−1)

^{signbit}× 2^{exponent−15}× 1.significantbits_{2}

When we have 16 bits subnormal numbers the formula turns into:

(−1)

^{signbit}× 2^{−14}× 0.significantbits_{2}

Regular numbers are prefixed with a 1 bit, and subnormal numbers start with 0 bit. This means that by shifting the bits to the left, we can represent regular numbers with exponent lower than -14 as subnormal numbers. We’ll use the smallest 16 bits subnormal number: 5.960464477539063e-8 as a example. Its regular floating point representation is:

2

^{-24}× 1.0000000000_{2}

The fractional part is all zeros and the exponent is -24. How can we represent it as a 16 bits floating point number when the exponent is set to -14 and can’t be changed? We shift the fractional part to the left, it’s like lowering the exponent by the same amount. Every time we shift left the fractional part by 1 bit it’s equivalent to lowering the exponent by 1.

For our example we shift the fractional part by 10 bits, which is equivalent to lowering the exponent by 10 to -24:

2

^{-24}× 1.0000000000_{2}= 2^{-14}× 0.0000000001_{2}

As long as we can shift the fractional part to the left without dropping any 1’s we can represent the number as a 16 bits float. In summary to encode a value as a 16 bits subnormal numbers we have to:

- Verify the exponent and the number of trailing zeros are within the ranges required to encode precisely the input
- Add a trailing 1 at the head of the regular fractional, since a those number’s fractional doesn’t have a leading 1 like regular number’s do
- Shift the fractional part to match the number’s exponent

The smallest possible 16 bits subnormal number is one of the example in the CBOR spec. Let’s add it to the TestFloat test suite:

```
...
{Value: 5.960464477539063e-8, Expected: []byte{0xf9, 0x00, 0x01}},
...
```

To check if we have a number that can be encoded as a subnormal number we add a predicate function subnumber() with two parameters: the exponent, and the number of trailing zeros in the fractional part. It verifies that the exponent is within the range of what’s representable by a subnormal number, and that we don’t drop any 1 from the fractional when we cut it:

```
func subnumber(exp int, zeros int) bool {
var d = -exp + float16MinBias
var canFitFractional = d <= zeros-float64FracBits+float16FracBits
return d >= 0 && d <= float16FracBits && canFitFractional
}
```

Then we add a case statement at the beginning of the second switch, such as we encode the value as a 16 bits subnormal number when possible, and then fallback to 32 bits float otherwise:

```
func (e *Encoder) writeFloat(input float64) error {
...
var (
exp, frac = unpackFloat64(input)
trailingZeros = bits.TrailingZeros64(frac)
)
if trailingZeros > float64FracBits {
trailingZeros = float64FracBits
}
switch {
...
case subnumber(exp, trailingZeros):
// this number can be encoded as 16 bits subnormal numbers
frac |= 1 << float64FracBits
frac >>= uint(-exp + float16MinBias)
return e.writeFloat16(math.Signbit(input), 0, frac)
case float64(float32(input)) == input:
...
}
}
```

Let’s take a closer look step by step. When subnumber() matches, we build the new fractional part by prefixing the fractional part with a 1, this is the implicit 1 prefix from the regular number formula:

```
frac |= 1 << float64FracBits
```

Then we shift the fractional by the difference between the number’s exponent and the fixed exponent: -14 for 16 bits subnormal numbers:

```
frac >>= uint(-exp + float16MinBias)
```

Finally we write the number as a 16 bits floating point with a zero exponent:

```
return e.writeFloat16(math.Signbit(input), 0, frac)
```

One last run of `go test`

confirms that everything works. We now pack tightly
all special float values, and with the subnormal numbers optimization we just
implemented we also pack 2^{10} numbers more efficiently as 16 bits
floats.

We successfully encoded one of the most complex types Go natively supports. Next time we’ll implement a custom type: timestamps.

Check out the repository with the full code for this episode.