Index

Go CBOR encoder: Episode 8, structs

This is a tutorial on how to write a CBOR encoder in Go, where we’ll learn more about reflection and type introspection.

Read the previous episodes, each episode builds on the previous one:


To encode structs we’ll mimic what the standard JSON encoder does and encode structs into maps of strings to values. For example if we pass the following to the JSON encoder:

struct {
    a int
    b string
}{
    a: 1,
    b: "hello",
}

It outputs:

{"a": 1, "b": "hello"}

The struct kind is different from the map kind we implemented in the previous episode: with struct the fields’ are ordered and the keys are always strings. Because struct’s keys are strings, we can’t use all the examples from the spec like we did with maps, we can only use example with string-only keys. This leaves us with these three test cases:

{}
{"a": 1, "b": [2, 3]}
{"a": "A", "b": "B", "c": "C", "d": "D", "e": "E"}

On the flip side because the keys are ordered we don’t have to look for each individual pair in the output like we did with maps. We can use the function testEncoder as it is for our test. Let’s add TestStruct to cbor_test.go:

func TestStruct(t *testing.T) {
    var cases = []struct {
        Value    interface{}
        Expected []byte
    }{
        {Value: struct{}{}, Expected: []byte{0xa0}},
        {
            Value: struct {
                a int
                b []int
            }{a: 1, b: []int{2, 3}},
            Expected: []byte{
                0xa2, 0x61, 0x61, 0x01, 0x61, 0x62, 0x82, 0x02, 0x03,
            },
        },
        {
            Value: struct {
                a string
                b string
                c string
                d string
                e string
            }{"A", "B", "C", "D", "E"},
            Expected: []byte{
                0xa5, 0x61, 0x61, 0x61, 0x41, 0x61, 0x62, 0x61, 0x42, 0x61,
                0x63, 0x61, 0x43, 0x61, 0x64, 0x61, 0x44, 0x61, 0x65, 0x61,
                0x45,
            },
        },
    }

    for _, c := range cases {
        t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
            testEncoder(t, c.Value, c.Expected)
        })
    }
}

To encode struct we’ll iterate over the fields of the struct with an index using Value.NumField and Value.Field, like this:

var v = reflect.ValueOf(struct {
	AKey string
	BKey string
}{AKey: "a value", BKey: "b value"})

for i := 0; i < v.NumField(); i++ {
    fmt.Println(v.Field(i))
}

This prints:

a value
b value

We have the fields’ values, we still need their names to write the map. The fields’ names aren’t stored in the value itself, they are stored in its type. We’ll use v.Type().Field() to get a StructField with the name of this particular field. For instance if we added the following at the end of the listing above:

for i := 0; i < v.NumField(); i++ {
    fmt.Println(v.Type().Field(i).Name)
}

We’d get the names of each field printed at the end:

AKey
BKey

Let’s assemble all that into a new function writeStruct in cbor.go. writeUnicodeString writes the keys, and we encode the value recursively with the encode() method:

func (e *Encoder) writeStruct(v reflect.Value) error {
    if err := e.writeInteger(majorMap, uint64(v.NumField())); err != nil {
        return err
    }
    // Iterate over each field and write its key & value
    for i := 0; i < v.NumField(); i++ {
        if err := e.writeUnicodeString(v.Type().Field(i).Name); err != nil {
            return err
        }
        if err := e.encode(v.Field(i)); err != nil {
            return err
        }
    }
    return nil
}

We add a call to writeStruct in the main switch statement:

case reflect.Struct:
	return e.writeStruct(x)

A quick run of go test confirms everything works as intended:

$ go test -v
...
--- PASS: TestStruct (0.00s)
    --- PASS: TestStruct/{} (0.00s)
    --- PASS: TestStruct/{1_[2_3]} (0.00s)
    --- PASS: TestStruct/{A_B_C_D_E} (0.00s)
PASS
ok

Basic structs work, but we aren’t done yet. We’ll extend support for structs by mimicking the standard JSON encoder and add support for struct tagging. Here’s a summary of what option the encoder supports:

// Field appears in JSON as key "myName".
Field int `json:"myName"`

// Field appears in JSON as key "myName" and
// the field is omitted from the object if its value is empty[...]
Field int `json:"myName,omitempty"`

// Field appears in JSON as key "Field" (the default), but
// the field is skipped if empty.
// Note the leading comma.
Field int `json:",omitempty"`

// Field is ignored by this package.
Field int `json:"-"`

// Field appears in JSON as key "-".
Field int `json:"-,"`

We’ll implement these features with the cbor tag instead of json, like this:

Field int `cbor:"name,omitempty"`

Let’s write a test with the feature we want to verify, we’ll re-use this example from the CBOR spec:

{"a": 1, "b": [2, 3]}

In TestStructTag we call testEncoder with a tagged struct and checks the output. AField & BField have the name a & b respectively, while all the other fields must be ignored:

func TestStructTag(t *testing.T) {
    testEncoder(t,
        struct {
            AField int   `cbor:"a"`
            BField []int `cbor:"b"`
            Omit1  int   `cbor:"c,omitempty"`
            Omit2  int   `cbor:",omitempty"`
            Ignore int   `cbor:"-"`
        }{AField: 1, BField: []int{2, 3}, Ignore: 12345},
        []byte{0xa2, 0x61, 0x61, 0x01, 0x61, 0x62, 0x82, 0x02, 0x03},
    )
}

If we run TestStructTag now the struct won’t be encoded correctly: every field will be in the output and the first two fields won’t have the right key.

The encoding/json package implements best-in-class tagging: we are going to steal some of its code to save time. Why write something new when we have some battle-tested code available?

We’ll copy encoding/json/tags.go into our project and we’ll add the function isEmptyValue from encoding/json/encode.go to it. We’ll replace package json with package cbor at the top to import the new code into our package. The new file tags.go looks like this:

// Source: https://golang.org/src/encoding/json/tags.go
//
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package cbor

import (
    "reflect"
    "strings"
)

// tagOptions is the string following a comma in a struct field's "json"
// tag, or the empty string. It does not include the leading comma.
type tagOptions string

// parseTag splits a struct field's json tag into its name and
// comma-separated options.
func parseTag(tag string) (string, tagOptions) {
    if idx := strings.Index(tag, ","); idx != -1 {
        return tag[:idx], tagOptions(tag[idx+1:])
    }
    return tag, tagOptions("")
}

// Contains reports whether a comma-separated list of options
// contains a particular substr flag. substr must be surrounded by a
// string boundary or commas.
func (o tagOptions) Contains(optionName string) bool {
    if len(o) == 0 {
        return false
    }
    s := string(o)
    for s != "" {
        var next string
        i := strings.Index(s, ",")
        if i >= 0 {
            s, next = s[:i], s[i+1:]
        }
        if s == optionName {
            return true
        }
        s = next
    }
    return false
}

// Source for isEmptyValue:
//
// https://golang.org/src/encoding/json/encode.go
func isEmptyValue(v reflect.Value) bool {
    switch v.Kind() {
    case reflect.Array, reflect.Map, reflect.Slice, reflect.String:
        return v.Len() == 0
    case reflect.Bool:
        return !v.Bool()
    case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
        return v.Int() == 0
    case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
        return v.Uint() == 0
    case reflect.Float32, reflect.Float64:
        return v.Float() == 0
    case reflect.Interface, reflect.Ptr:
        return v.IsNil()
    }
    return false
}

Copying code like this may be bad for long-term maintenance, if the Golang developers fix something in the upstream code we won’t get the fix until we copy it ourselves. It’s OK to do that with this exercise because we’re here to learn, not to ship! Here’s what each function does:

Let’s refactor writeStruct to handle tagging. We need to know how many elements are in our map before we write the header. For example if we had a struct with 3 fields but one of them had the tag cbor:"-" indicating the field must be ignored, the encoded map would only have 2 key-value pairs. Instead of iterating and writing key-values on the fly, we’ll parse the fields first, and write the result to the output second. We’ll build the list of fields to encode and then write the encoded map from that list.

We define a new type fieldKeyValue to hold our key-value pairs, and iterate over each field in the struct and skip the fields marked with a tag to ignore it. Then we write the list of fields to the output, the new writeStruct function looks like:

func (e *Encoder) writeStruct(v reflect.Value) error {
    type fieldKeyValue struct {
        Name  string
        Value reflect.Value
    }
    var fields []fieldKeyValue
    // Iterate over each field and add its key & value to fields
    for i := 0; i < v.NumField(); i++ {
        var fType = v.Type().Field(i)
        var fValue = v.Field(i)
        var tag = fType.Tag.Get("cbor")
        if tag == "-" {
            continue
        }
        name, opts := parseTag(tag)
        // with the option omitempty skip the value if it's empty
        if opts.Contains("omitempty") && isEmptyValue(fValue) {
            continue
        }
        if name == "" {
            name = fType.Name
        }
        fields = append(fields, fieldKeyValue{Name: name, Value: fValue})
    }
    // write map from fields
    if err := e.writeInteger(majorMap, uint64(len(fields))); err != nil {
        return err
    }
    for _, kv := range fields {
        if err := e.writeUnicodeString(kv.Name); err != nil {
            return err
        }
        if err := e.encode(kv.Value); err != nil {
            return err
        }
    }
    return nil
}

As you can see we get the information about each field’s tag via fType.Tag.Get("cbor"). We skip the field if its tag is “-”; or has an empty value with the “omitempty” option.

go test runs and confirms that struct tagging is implemented correctly. structs are done and our encoder is getting closer to be usable by a third party. We only have a few reflect.Kind’s that needs to handled:

We’ll implement floating and complex numbers, and ignore the UnsafePointer kind since we can’t reliably encode it. We’ll cover floating point numbers in the next episode.

Check out the repository with the full code for this episode.