Go CBOR encoder: Episode 7, maps
This is a tutorial on how to write a CBOR encoder in Go, where we’ll learn more about reflection and type introspection in Go.
Read the previous episodes, each episode builds on the previous one:
- Episode 1, getting started
- Episode 2, booleans
- Episode 3, positive integers
- Episode 4, reflect and pointers
- Episode 5, strings
- Episode 6, negative integers and arrays
CBOR has a object or map type like JSON: it’s an ordered list of key/value pairs. We’ll use it to encode two different kinds of Go types: maps and structs. We’ll implement maps first and add support for structs in the next episode.
Here’s what the spec says about maps:
Major type 5: a map of pairs of data items. Maps are also called tables, dictionaries, hashes, or objects (in JSON). A map is comprised of pairs of data items, each pair consisting of a key that is immediately followed by a value. The map’s length follows the rules for byte strings (major type 2), except that the length denotes the number of pairs, not the length in bytes that the map takes up. For example, a map that contains 9 pairs would have an initial byte of 0b101_01001 (major type of 5, additional information of 9 for the number of pairs) followed by the 18 remaining items. The first item is the first key, the second item is the first value, the third item is the second key, and so on. […]
As usual we’ll use the examples from the CBOR RFC to write the tests. Maps are challenging to test because CBOR maps are ordered while Go maps aren’t. The order in which Go maps keys are returned is unspecified according to Value.MapKeys’s documentation:
MapKeys returns a slice containing all the keys present in the map, in unspecified order.
It means testEncoder cannot verify maps with more than one key/value pair in it, because it expects a unique result. Consider this map:
{1: 2, 3: 4}
They are multiple valid CBOR encoding for this map, because Go maps’ items can be in any order. With the example above the first key could either be 1 or 3.
We’ll have to check the different possibilities in the tests. For example from the CBOR spec we see that:
{1: 2, 3: 4}
Turns into:
0xa201020304
Here’s the breakdown of the output:
0xa2 → header for a map of 2 pairs
0x01 → first key: 1
0x02 → first value: 2
0x03 → second key: 3
0x04 → second value: 4
Because the map has two elements there’s another valid CBOR encoding for it with 3 as the first key and 1 as the second key like this:
0xa2 → header for a map of 2 pairs
0x03 → first key: 3
0x04 → first value: 4
0x01 → second key: 1
0x02 → second value: 2
Our tests will have to handle unordered keys in the output. To verify the results we will search for every individual key/value pair in the encoded map to ensure all the values are here regardless of the order.
We don’t need to worry about this for maps with less than two entries, we have two examples from the CBOR spec like that:
- The empty map:
{}
- Array containing a map with a single element:
["a", {"b": "c"}]
We’ll use testEncoder for those two cases. Let’s add a new test function TestMap in cbor_test.go with two subtests for the easy examples:
func TestMap(t *testing.T) {
// {}
t.Run("{}", func(t *testing.T) {
testEncoder(t, map[struct{}]struct{}{}, nil, []byte{0xa0})
})
// ["a", {"b": "c"}]
t.Run("[\"a\", {\"b\": \"c\"]", func(t *testing.T) {
testEncoder(
t,
[]interface{}{"a", map[string]string{"b": "c"}},
nil,
[]byte{0x82, 0x61, 0x61, 0xa1, 0x61, 0x62, 0x61, 0x63},
)
})
}
Now we’ll add what’s needed for multi-item maps: we verify the header’s major type, and the map length, then search all key-value pairs in the output. The tests cases we’ll use to test maps are:
{1: 2, 3: 4}
{"a": 1, "b": [2, 3]}
{"a": "A", "b": "B", "c": "C", "d": "D", "e": "E"}
To verify unordered maps the test needs the list of encoded key-value pairs. In our previous tests the test cases where stored in a structure like this:
struct {
Value interface{}
Expected []byte
}
We’ll change it to hold what we need to verify the map, we’ll turn Expected from a slice of bytes into a slice of slice of bytes. The length of Expected is the size of the map. Items in Expected are encoded key-value pairs to look-up in the result:
struct {
Value interface{}
Expected [][]byte
}
We add the new test cases and the code to check the result in the TestMap function:
var cases = []struct {
Value interface{}
Expected [][]byte
}{
{
Value: map[int]int{1: 2, 3: 4},
Expected: [][]byte{
[]byte{0x01, 0x02}, // {1: 2}
[]byte{0x03, 0x04}, // {3: 4}
},
},
{
Value: map[string]interface{}{"a": 1, "b": []int{2, 3}},
Expected: [][]byte{
[]byte{0x61, 0x61, 0x01}, // {"a": 1}
[]byte{0x61, 0x62, 0x82, 0x02, 0x03}, // {"b": [2, 3]}
},
},
{
Value: map[string]string{
"a": "A", "b": "B", "c": "C", "d": "D", "e": "E",
},
Expected: [][]byte{
[]byte{0x61, 0x61, 0x61, 0x41}, // {"a": "A"}
[]byte{0x61, 0x62, 0x61, 0x42}, // {"b": "B"}
[]byte{0x61, 0x63, 0x61, 0x43}, // {"c": "C"}
[]byte{0x61, 0x64, 0x61, 0x44}, // {"d": "D"}
[]byte{0x61, 0x65, 0x61, 0x45}, // {"e": "E"}
},
},
}
Each case the test will extract the major type and length of the map from the header using a bit mask, then verify their values. Our test cases have less than 23 elements in them, so the header will be only one byte. If we had a case with more than 23 elements we would have to change the code accordingly.
Let’s add the loop to iterate over the cases and verify what’s in the header:
for _, c := range cases {
t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
var buffer bytes.Buffer
if err := NewEncoder(&buffer).Encode(c.Value); err != nil {
t.Fatalf("err: %#v != nil with %#v", err, c.Value)
}
var (
header = buffer.Bytes()[0]
result = buffer.Bytes()[1:]
lengthMask = ^uint8(0) >> 3 // bit mask to extract the length
length = header & lengthMask
)
if header>>5 != majorMap {
t.Fatalf("invalid major type: %#v", header)
}
if int(length) != len(c.Expected) {
t.Fatalf("invalid length: %#v != %#v", length, len(c.Expected))
}
}
}
We haven’t verified the map’s content yet, let’s add it: we search for each pair in the encoder’s output, then remove it from the output. Once we’re done verifying all the key-values, we check if the slice is empty to ensure there’s nothing left-over in the output. We add that code is at the end of the loop:
for _, c := range cases {
t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
...
// Iterate over the key/values we expect in the map
for _, kv := range c.Expected {
if !bytes.Contains(result, kv) {
t.Fatalf("key/value %#v not found in result", kv)
}
// remove the value from the result
result = bytes.Replace(result, kv, []byte{}, 1)
}
// ensure we got everything is the map
if len(result) > 0 {
t.Fatalf("leftover in result: %#v", result)
}
}
}
Tests are done, now let’s get them working. To encode the map we’ll write its size in the header, then recursively encode each key followed by its Then we’ll add a case clause matching reflect.Map in the encode’s switch statement and call writeMap from it:
const majorMap = 5
...
func (e *Encoder) writeMap(v reflect.Value) error {
if err := e.writeInteger(majorMap, uint64(v.Len())); err != nil {
return err
}
for _, key := range v.MapKeys() {
e.encode(key)
e.encode(v.MapIndex(key))
}
return nil
}
func (e *Encoder) encode(x reflect.Value) error {
switch x.Kind() {
...
case reflect.Map:
return e.writeMap(x)
}
}
As you can see we didn’t have to add much code to encode maps. The real challenge was the tests. Implemention was easy this time, but it won’t be next time: we’ll work with structs in the next episode and it’ll be a big one.
Check out the repository with the full code for this episode.