2018-05-20T09:40:32-07:00

Go CBOR encoder: Episode 6, negative integers and arrays

This is tutorial on how to write a CBOR encoder in Go. Its goal is to teach reflection and type introspection. I recommend you read the previous episodes before jumping into this one:

Our CBOR encoder only accepts unsigned integers at the moment, to support all integer types we have to handle negative numbers. Negative number encoding is similar to positive number with a different major type. The spec says:

Major type 1: a negative integer. The encoding follows the rules for unsigned integers (major type 0), except that the value is then -1 minus the encoded unsigned integer. For example, the integer -500 would be 0b001_11001 (major type 1, additional information 25) followed by the two bytes 0x01f3, which is 499 in decimal.

As usual the tests come first, we re-use the examples from the CBOR specification to add TestNegativeIntegers in cbor_test.go:

import "math"  // for math.MinInt64
...
func TestNegativeIntegers(t *testing.T) {
    var cases = []struct {
        Value    int64
        Expected []byte
    }{
        {Value: -1, Expected: []byte{0x20}},
        {Value: -10, Expected: []byte{0x29}},
        {Value: -100, Expected: []byte{0x38, 0x63}},
        {Value: -1000, Expected: []byte{0x39, 0x03, 0xe7}},
        {
            Value: math.MinInt64,
            Expected: []byte{
                0x3b, 0x7f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
            },
        },
    }

    for _, c := range cases {
        t.Run(fmt.Sprintf("%d", c.Value), func(t *testing.T) {
            testEncoder(t, c.Value, nil, c.Expected)
        })
    }
}

For the encoder to recognize all integers types we add a new case clause in Encode()’s switch statement with the additional integer kinds like reflect.Int. It checks the sign of the integer: If the integer is positive we write it as a positive number, if it’s negative we turn it into an unsigned integer using the formula -(x+1), and we write that number to the output:

const majorNegativeInteger = 1

...

func (e *Encoder) encode(x reflect.Value) error {
    ...
    case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
        var i = x.Int()
        if i < 0 {
            return e.writeInteger(majorNegativeInteger, uint64(-(i + 1)))
        } else {
            return e.writeInteger(majorPositiveInteger, uint64(i))
        }
    ...
}

8 lines of code was all we needed to support all integer types. That was easy, now we move onto something harder: arrays.

Arrays are the first composite type we add to the encoder. An array is a list of objects, it can contain any type of object like a JSON array:

[null, true, 1, "hello"]

Arrays have their own major type according to the spec:

Major type 4: an array of data items. Arrays are also called lists, sequences, or tuples. The array’s length follows the rules for byte strings (major type 2), except that the length denotes the number of data items, not the length in bytes that the array takes up. Items in an array do not need to all be of the same type. For example, an array that contains 10 items of any type would have an initial byte of 0b100_01010 (major type of 4, additional information of 10 for the length) followed by the 10 remaining items.

Because arrays can contain any type we’ll have to recursively encode objects, like we did in episode 4 with pointers.

Before we get started we’ll refactor how we recursively encode objects. Our encoder works with reflect.Value but the Encode() method takes an interface{} not a reflect.Value. When we call Encode() recursively we convert the reflect.Value into an interface which is then converted back into a reflect.Value. Those conversions aren’t efficient, so we’ll move all the code in the Encode() method into a new method called encode() —all lowercase— that takes a reflect.Value as parameter. Encode() is now just a call to this new method:

func (e *Encoder) Encode(v interface{}) error {
    return e.encode(reflect.ValueOf(v))
}

func (e *Encoder) encode(x reflect.Value) error {
    switch x.Kind() {
    ...
    case reflect.Ptr:
        if x.IsNil() {
            return e.writeHeader(majorSimpleValue, simpleValueNil)
        } else {
            // this replaces e.Encode(reflect.Indirect(x).Interface())
            return e.encode(reflect.Indirect(x))
        }
    ...
    }
    return ErrNotImplemented
}

With this small refactoring done let’s add tests based on the examples from the CBOR specification to our test suite, we have five cases:

[]
[1, 2, 3]
[1, [2, 3], [4, 5]]
[1, 2, 3, ... 25]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

We add TestArray to cbor_test.go that runs a subtest for each of the cases above:

func TestArray(t *testing.T) {
    var cases = []struct {
        Value    []interface{}
        Expected []byte
    }{
        {Value: []interface{}{}, Expected: []byte{0x80}},
        {Value: []interface{}{1, 2, 3}, Expected: []byte{0x83, 0x1, 0x2, 0x3}},
        {
            Value:    []interface{}{1, []interface{}{2, 3}, []interface{}{4, 5}},
            Expected: []byte{0x83, 0x01, 0x82, 0x02, 0x03, 0x82, 0x04, 0x05},
        },
        {
            Value: []interface{}{
                1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
                19, 20, 21, 22, 23, 24, 25,
            },
            Expected: []byte{
                0x98, 0x19, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
                0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12,
                0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x18, 0x18, 0x19,
            },
        },
    }

    for _, c := range cases {
        t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
            testEncoder(t, c.Value, nil, c.Expected)
        })
    }
}

To get the tests to pass we have to match all array and slice types, except byte array and byte slice. We already match arrays and slices in the previous episode when we implemented byte strings.

When we have an array-like object to encode, we pass it to a new method writeArray. It writes the header with the length of the array, then iterates over the array’s elements and encode them recursively. To iterate over the array all we need are the methods reflect.Value.Len and reflect.Value.Index, we write a simple for loop and retrieve each item with v.Index(i):

majorArray           = 4
...
func (e *Encoder) writeArray(v reflect.Value) error {
    if err := e.writeInteger(majorArray, uint64(v.Len())); err != nil {
        return err
    }
    for i := 0; i < v.Len(); i++ {
        if err := e.encode(v.Index(i)); err != nil {
            return err
        }
    }
    return nil
}

Let’s hook up writeArray to the main switch statement in encode(). We want to match array and slice not made of bytes. To achieve this we just need to add a call to writeArray after the if statement to check if we have a byte string in the reflect.Slice’s case clause. We literally add a single line to cbor.go:

func (e *Encoder) encode(x reflect.Value) error {
    switch x.Kind() {
    ...
    case reflect.Array:
        // Create slice from array
        var n = reflect.New(x.Type())
        n.Elem().Set(x)
        x = reflect.Indirect(n).Slice(0, x.Len())
        fallthrough
    case reflect.Slice:
        if x.Type().Elem().Kind() == reflect.Uint8 {
            return e.writeByteString(x.Bytes())
        }
        // We don’t have a byte string therefor we have an array
        return e.writeArray(x)
    ...
    }
    return ErrNotImplemented
}

TestArray successfully runs, we are done with arrays. Check out the repository with the full code for this episode.

With the addition of arrays our encoder can now encode complex data structures. We’re about to make it even better and dive deeper into Go reflection with the next major type: maps. See you in the next episode.