2018-04-20T11:02:55-07:00

Go CBOR encoder: Episode 3, positive integers

In the previous episode we wrote a CBOR encoder that can handle the values nil, true, and false. Next we’ll focus on positive integers.

To proceed we have to learn more about how values are encoded. A CBOR object’s type is determined by the 3 first bits of its first byte. The first byte is called the header: it describes the data type and tells the decoder how to decode what follows, sometime the header contains data about the value in the additional 5 bits leftover but most of the time it contains information about the type.

For example: the encoded nil value is a single byte with the value 246, in binary that’s: 0b11110110. The first 3 bits are all 1’s, that’s 7 in decimal. The nil value’s major type is 7, which correspond to the “simple values” major type. The last 5 bits are 0b10110 or 22 in decimal, that’s the additional value with the type of the value, in our case it’s nil. To summarize the nil value’s major type is 7, and the additional value 22 identifies it as type nil. Here’s how you’d reconstruct the header for nil from the major type and the additional value:

byte(majorType << 5) | additionalValue

The booleans true and false have the same major type as nil: 7 and their additional values are 20 and 21 respectively. We’d build booleans from their major type and additional value like this:

fmt.Printf("%x\n", byte(7 << 5) | 20)   // prints f4
fmt.Printf("%x\n", byte(7 << 5) | 21)   // prints f5

Positive integers have their own major type: 0. With only 5 bits in the header that’s not enough to encode values higher than 32, therefor integers’ encoding in more complex than booleans and nil. The first 24 values are reserved for integers from 0 to 23, for integers bigger than 23 we have to write extra bytes to the output to encode them. To indicate how much data is needed to decode the integer we have the special additional values 24, 25, 26, and 27, they correspond to 8, 16, 32, and 64 bits integers respectively.

For example to encode 500 we need to use at least a 2 bytes integer, because 500 is too much to be represented as a single byte. So the first byte would be major type 0 and additional value 25 to tell the decoder: “hey, what follows is a two byte positive integer”. The header would look like this: 0b000_11001, followed by two byte 0x01 0xf4, that’s 500 encoded as a 16 bits big-endian integer.

Start with the easy case: integers from 0 to 23. We add a method called writeHeader to cbor.go that writes the single byte header to the output. To avoid using magic numbers all over our code we’ll also set some constants for the types we can encode thus far. We add the following to cbor.go:

const (
    // major types
    majorPositiveInteger = 0
    majorSimpleValue     = 7

    // simple values == major type 7
    simpleValueFalse = 20
    simpleValueTrue  = 21
    simpleValueNil   = 22
)

func (e *Encoder) writeHeader(major, minor byte) error {
    h := byte((major << 5) | minor)
    _, err := e.w.Write([]byte{h})
    return err
}

We use writeHeader to clear the magic numbers we put in the Encode method from the previous episodes. Our Encode method looks tighter now:

func (e *Encoder) Encode(v interface{}) error {
    switch v.(type) {
    case nil:
        return e.writeHeader(majorSimpleValue, simpleValueNil)
    case bool:
        var minor byte
        if v.(bool) {
            minor = simpleValueTrue
        } else {
            minor = simpleValueFalse
        }
        return e.writeHeader(majorSimpleValue, minor)
    }
    return ErrNotImplemented
}

Our mini-refactoring is done, we check everything is still working with go test and it does still work. Now that we cleaned that up and verified it works we add tests for the small integers in cbor_test.go:

func TestIntSmall(t *testing.T) {
    for i := 0; i <= 23; i++ {
        testEncoder(t, uint64(i), nil, []byte{i})
    }
}

We loop from 0 to 23, we build our expected return value and check it corresponds to what the encoder gives us. In this case a single byte with the major type 0, and our value i.

Some of you may have noticed that we turn our value i into an uint64 when we pass it to testEncoder instead of a plain int. That’s because Go has different integers types like uint64, and int16, and plain int, unfortunately all those types are different for the Go type system and require adding extra code to work. We will handle the other integers later for now we’ll stick to uint64.

Small integers are easy to implement: in Encode switch’s statement we add a case uint64: clause, and if the integer is between 0 and 23 we output the header with the right additional value and that’s all:

case uint64:
	var i = v.(uint64)
    if i <= 23 {
        return e.writeHeader(majorPositiveInteger, byte(i))
    }
}

A quick run with go test confirms TestIntSmall works. Time to work on the extended integers: as usual we’ll write the tests first. To get good coverage, we’re going to copy the examples given in the appendix of the CBOR spec for our tests.

We’ll use subtests to make it easier to track what test fails, subtests allows you to define multiple sub-tests with different names inside a single test function. Our subtests’ names will be the numbers we’re checking, for example to test the integer 10 we’d do something like this:

func TestExample(t *testing.T) {
    t.Run(
        "10",                 // name of the subtest
        func(t *testing.T) {  // function to execute
            testEncoder(t, uint64(10), nil, byte{0x0a})
        },
    )
}

When we run go test with this example we’ll have a test named “TestExample/10”, we could add another call to t.Run() with the string “foo” as name to create another subtest named “TestExample/foo”.

Let’s replace this example with real tests. We’ll use a table to store our test cases, iterate over it, and verify each results. Our tests values and expected outputs are taken from the CBOR spec examples:

func TestIntBig(t *testing.T) {
    var cases = []struct {
        Value    uint64
        Expected []byte
    }{
        {Value: 0, Expected: []byte{0x00}},
        {Value: 1, Expected: []byte{0x01}},
        {Value: 10, Expected: []byte{0x0a}},
        {Value: 23, Expected: []byte{0x17}},
        {Value: 24, Expected: []byte{0x18, 0x18}},
        {Value: 25, Expected: []byte{0x18, 0x19}},
        {Value: 100, Expected: []byte{0x18, 0x64}},
        {Value: 1000, Expected: []byte{0x19, 0x03, 0xe8}},
        {Value: 1000000, Expected: []byte{0x1a, 0x00, 0x0f, 0x42, 0x40}},
        {
            Value: 1000000000000,
            Expected: []byte{
                0x1b, 0x00, 0x00, 0x00, 0xe8, 0xd4, 0xa5, 0x10, 0x00,
            },
        },
        {
            Value: 18446744073709551615,
            Expected: []byte{
                0x1b, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
            },
        },
    }

    for _, c := range cases {
        t.Run(fmt.Sprintf("%d", c.Value), func(t *testing.T) {
            testEncoder(t, uint64(c.Value), nil, c.Expected)
        })
    }
}

If we run the tests as they are now, the ones with numbers less than 24 will pass, but all the bigger numbers will fail with a not implemented error:

--- PASS: TestIntBig/0 (0.00s)
--- PASS: TestIntBig/1 (0.00s)
--- PASS: TestIntBig/10 (0.00s)
--- PASS: TestIntBig/23 (0.00s)
--- FAIL: TestIntBig/24 (0.00s)
	cbor_test.go:18: err: &errors.errorString{s:"Not Implemented"} != <nil> with 0x18
--- FAIL: TestIntBig/25 (0.00s)
	cbor_test.go:18: err: &errors.errorString{s:"Not Implemented"} != <nil> with 0x19
--- FAIL: TestIntBig/100 (0.00s)
	cbor_test.go:18: err: &errors.errorString{s:"Not Implemented"} != <nil> with 0x64
...

Big CBOR integers have 2 parts: a header to determine the type, followed by the value encoded as a big endian integer. For example 25 is encoded as 0x1819, that’s 2 bytes: the header is 0x18 or 24 in decimal, that corresponds to a 8 bit integer type. The second byte after the header is 0x19 or 25 in decimal the integer we encoded. To re-iterate: the header gives us the type of the value and the bytes following the header is the value being encoded.

The first thing we’ll do is add a helper function to write our native integers as big endian integers. It takes an interface{} as parameter instead of an integer because the package encoding/binary uses the type of the value it writes to determine how much data to write. For example passing the value 1 typed as a uint16 to binary.Write will output 2 bytes: 0x0001. This allows us to cast our integer to the right type to encode our the correct sized integer with binary.Write:

// writeHeaderInteger writes out a header created from major and minor magic
// numbers and write the value v as a big endian value
func (e *Encoder) writeHeaderInteger(major, minor byte, v interface{}) error {
    if err := e.writeHeader(major, minor); err != nil {
        return err
    }
    return binary.Write(e.w, binary.BigEndian, v)
}

We don’t want the big switch statement in the Encode method to become messy as we’re adding more code, so we create a new method for our encoder: writeInteger where we’ll put all the code to encode integers.

The writeInteger method encodes our single integer value and casts it to the smallest integer type that can hold its value:

func (e *Encoder) writeInteger(i uint64) error {
    switch {
    case i <= 23:
        return e.writeHeader(majorPositiveInteger, byte(i))
    case i <= 0xff:
        return e.writeHeaderInteger(
            majorPositiveInteger, minorPositiveInt8, uint8(i),
        )
    case i <= 0xffff:
        return e.writeHeaderInteger(
            majorPositiveInteger, minorPositiveInt16, uint16(i),
        )
    case i <= 0xffffffff:
        return e.writeHeaderInteger(
            majorPositiveInteger, minorPositiveInt32, uint32(i),
        )
    default:
        return e.writeHeaderInteger(
            majorPositiveInteger, minorPositiveInt64, uint64(i),
        )
    }
}

As you can see we cast the value i into different integer types depending on how big it is to minimize the size of what we write to the output. The less bytes we use the better.

Encode now looks like this:

func (e *Encoder) Encode(v interface{}) error {
    switch v.(type) {
    case nil:
        return e.writeHeader(majorSimpleValue, simpleValueNil)
    case bool:
        var minor byte
        if v.(bool) {
            minor = simpleValueTrue
        } else {
            minor = simpleValueFalse
        }
        return e.writeHeader(majorSimpleValue, minor)
    case uint64,:
        return e.writeInteger(v.(uint64))
    }
    return ErrNotImplemented
}

Once we add this little bit of code our integer tests will pass:

--- PASS: TestIntBig (0.00s)
    --- PASS: TestIntBig/0 (0.00s)
    --- PASS: TestIntBig/1 (0.00s)
    --- PASS: TestIntBig/10 (0.00s)
    --- PASS: TestIntBig/23 (0.00s)
    --- PASS: TestIntBig/24 (0.00s)
    --- PASS: TestIntBig/25 (0.00s)
    --- PASS: TestIntBig/100 (0.00s)
    --- PASS: TestIntBig/1000 (0.00s)
    --- PASS: TestIntBig/1000000 (0.00s)
    --- PASS: TestIntBig/1000000000000 (0.00s)
    --- PASS: TestIntBig/18446744073709551615 (0.00s)

Let’s add the integer types we ignored thus far to be more exhaustive with what our encoder supports:

case uint, uint8, uint16, uint32, uint64, int, int8, int16, int32, int64:
	if v.(uint64) >= 0 {
		return e.writeInteger(v.(uint64))
	}

Now we can pass a positive int, int8, int16, int32, or int64 and it will work. We can’t handle negative number yet.

That’ll all for now. There’s a repository with the code for this episode. In the next episode we’ll introduce the reflect package to care of pointers.