2020-01-12T10:59:02-08:00

Go CBOR encoder: Episode 11, timestamps

This is a tutorial on how to write a CBOR encoder in Go, where we’ll learn more about reflection and type introspection.

Make sure to read the previous episodes, each episode builds on the previous one:

In the previous episode we improved floating point number support in our encoder. We implemented all the Go native types, now we’ll implement a custom stype: time.Time, a timestamp type from Go’s standard library. The CBOR format supports 3 timestamp types natively:

RFC3339 string like “2019-02-01T17:45:23Z”
floating point epoch based values
integer value epoch based values

The CBOR format has special values called tags to represent data with additional semantics like timestamps. Tags’ headers major type is 6 and represents an integer number used to determine the tag content’s type. Each tagged type has a unique integer identifier number.

For example URIs are represented as a tagged unicode string: first there’s the header with the major type 6 —indicating it’s a tagged value— encoding the integer 32 —the URIs’ identifier—, followed by the URI encoded as an UTF-8 CBOR string.

How can we detect if we have a time.Time value in the encoder? Looking at time.Time’s definition we see that it’s a struct, a kind of value we already handle in the encoder. The reflect package lets us query and compare value’s types, so we will check if the value’s type is time.Time when we have a reflect.Struct kind and write a CBOR timestamp when that’s the case.

There’s a bit of gymnastic needed to get time.Time’s type without allocating extra stuff, we can either do:

reflect.TypeOf(time.Time{})

Or:

reflect.TypeOf((*time.Time)(nil)).Elem()

In the first case we create an empty time.Time object, pass an interface pointing to it to reflect.TypeOf that will return its reflect.Type. In the second case we create an empty interface to time.Time and retreive its type directly. We’ll use the second way because it doesn’t create an empty time.Time object and is therefor a bit more efficient.

In the main switch block we add a conditional statement in the reflect.Struct case to check is the struct’s type is time.Time:

case reflect.Struct:
	if x.Type() == reflect.TypeOf((*time.Time)(nil)).Elem() {
		return ErrNotImplemented
	}
	return e.writeStruct(x)

Timestamps have two tagged data item types: 0 for RFC3339 timestamps encoded as unicode strings, or 1 for epoch-based timestamps —floating point & integer values—. Let’s add a new function to write the timestamps: writeTime. We’ll handle string timestamps first, and implement scalar epoch-based timestamp types second. Starting with RFC3339 strings, we lookup the example from the spec, and add our first test case:

func TestTimestamp(t *testing.T) {
    var rfc3339Timestamp, _ = time.Parse(time.RFC3339, "2013-03-21T20:04:00Z")

    var cases = []struct {
        Value    time.Time
        Expected []byte
    }{
        {
            Value: rfc3339Timestamp,
            Expected: []byte{
                0xc0, 0x74, 0x32, 0x30, 0x31, 0x33, 0x2d, 0x30, 0x33, 0x2d,
                0x32, 0x31, 0x54, 0x32, 0x30, 0x3a, 0x30, 0x34, 0x3a, 0x30,
                0x30, 0x3a,
            },
        },
    }

    for _, c := range cases {
        t.Run(fmt.Sprintf("%v", c.Value), func(t *testing.T) {
            testEncoder(t, c.Value, c.Expected)
        })
    }
}

Back in cbor.go we add a few header constants required to encode the new tagged types:

const (
    // major types
    ...
    majorTag             = 6
    ...

    // major type 6: tagged values
    minorTimeString = 0
    minorTimeEpoch  = 1
    ...
)

The function writeTime writes the tag’s header with minorTimeString to indicate a string follows, then it converts the timestamp into a RFC3339 string and writes it to the output:

func (e *Encoder) writeTime(v reflect.Value) error {
    if err := e.writeHeader(majorTag, minorTimeString); err != nil {
        return err
    }
    var t = v.Interface().(time.Time)
    return e.writeUnicodeString(t.Format(time.RFC3339))
}

We hook it up to the rest of the code by adding a call to writeTime in our main switch statement:

case reflect.Struct:
	if x.Type() == reflect.TypeOf((*time.Time)(nil)).Elem() {
		return e.writeTime(x)
	}
	return e.writeStruct(x)

A quick go test to confirm writing string timestamps works, so let’s get started with epoch-based timestamps.

Epoch-based timestamps are scalar values where 0 corresponds to the Unix epoch (January 1, 1970), that can either be integer or floating point values. We’ll minimize the size of our output by using the most compact type without losing precision. The timestamp can either be an integer, a floating point number, or a RFC3339 string. If the timestamp’s timezone isn’t UTC we’ll have to use the largest type: RFC3339 strings, because we need to encode the timezone information and we can’t do it with scalar timestamps. If the timestamp’s timezone is UTC or is nil we can use a scalar timestamp because they are set in UTC time. We’ll use an integer when the timestamp can be represented as whole seconds or use a floating point number otherwise.

First we add a condition to only use RFC3339 strings when the timestamp has a timezone that’s not UTC:

func (e *Encoder) writeTime(v reflect.Value) error {
    var t = v.Interface().(time.Time)
    if t.Location() != time.UTC && t.Location() != nil {
        if err := e.writeHeader(majorTag, minorTimeString); err != nil {
            return err
        }
        return e.writeUnicodeString(t.Format(time.RFC3339))
    }
    return ErrNotImplemented
}

Because we are changing the behavior of writeTime when the timezone is UTC, we have to fix the first test case to use a timestamp with a non-UTC timezone set, otherwise the test will fail with ErrNotImplemented returned. We replace the Z —a shortcut for the UTC timezone— at the end of rfc3339Timestamp with +07:00:

func TestTimestamp(t *testing.T) {
    var rfc3339Timestamp, _ = time.Parse(time.RFC3339, "2013-03-21T20:04:00+07:00")

    var cases = []struct {
        Value    time.Time
        Expected []byte
    }{
        {
            Value: rfc3339Timestamp,
            Expected: []byte{
                0xc0, 0x78, 0x19, 0x32, 0x30, 0x31, 0x33, 0x2d, 0x30, 0x33, 0x2d,
                0x32, 0x31, 0x54, 0x32, 0x30, 0x3a, 0x30, 0x34, 0x3a, 0x30, 0x30,
                '+', '0', '7', ':', '0', '0',
            },
        },
    }
    ...
}

Let’s implement floating point numbers when there’s no timezone information to encode. As usual we start by adding a test case for this from the spec:

func TestTimestamp(t *testing.T) {
    ...
    var cases = []struct {
        Value    time.Time
        Expected []byte
    }{
        ...
        {
            Value:    time.Unix(1363896240, 0.5*1e9).UTC(),
            Expected: []byte{0xc1, 0xfb, 0x41, 0xd4, 0x52, 0xd9, 0xec, 0x20, 0x00, 0x00},
        },
    }
    ...
}

Note that we had to call the .UTC() method on the time.Time object returned by time.Unix, that’s because otherwise the object will have the computer’s local timezone associated to it, a call on the UTC method get us a UTC timestamp.

Since time.Time stores its internal time as an integer counting the number of nanoseconds since the Epoch, we’ll have to convert it into a floating point number in seconds before writing it. To do this we define a constant to convert from nanoseconds to seconds from the time’s module units:

const nanoSecondsInSecond = time.Second / time.Nanosecond

Then we add the code after the block to handle string timestamps. We write the header with minorTimeEpoch as its sub-type to indicate we have a scalar timestamp, then write the converted value as a floating point number:

func (e *Encoder) writeTime(v reflect.Value) error {
    var t = v.Interface().(time.Time)
    if t.Location() != time.UTC && t.Location() != nil {
        if err := e.writeHeader(majorTag, minorTimeString); err != nil {
            return err
        }
        return e.writeUnicodeString(t.Format(time.RFC3339))
    }

    // write an epoch timestamp to preserve space
    if err := e.writeHeader(majorTag, minorTimeEpoch); err != nil {
        return err
    }
    var unixTimeNano = t.UnixNano()
	return e.writeFloat(
		float64(unixTimeNano) / float64(nanoSecondsInSecond))
}

If the timestamp in seconds is an integer number we can write it as an integer timestamp without losing precision. Integers are usually more compact than floating point numbers, we’ll always use them when possible. Another test case from the spec makes it into cbor_test.go:

func TestTimestamp(t *testing.T) {
    ...
    var cases = []struct {
        Value    time.Time
        Expected []byte
    }{
        ...
        {
            Value:    time.Unix(1363896240, 0).UTC(),
            Expected: []byte{0xc1, 0x1a, 0x51, 0x4b, 0x67, 0xb0},
        },
    }

    ...
}

To determine if we can write an integer timestamp we check if the fractional part of the timestamp in seconds is zero, then we convert unixTimeNano into seconds, set the CBOR integer’s header minor type depending on the timestamp’s sign, and use writeInteger to write the timestamp:

const nanoSecondsInSecond = time.Second / time.Nanosecond

func (e *Encoder) writeTime(v reflect.Value) error {
    ...

    // write an epoch timestamp to preserve space
    if err := e.writeHeader(majorTag, minorTimeEpoch); err != nil {
        return err
    }
    var unixTimeNano = t.UnixNano()
    if unixTimeNano%int64(nanoSecondsInSecond) == 0 {
        var unixTime = unixTimeNano / int64(nanoSecondsInSecond)
        var sign byte = majorPositiveInteger
        if unixTime < 0 {
            sign = majorNegativeInteger
            unixTime = -unixTime
        }
        return e.writeInteger(sign, uint64(unixTime))
    } else {
        return e.writeFloat(
            float64(unixTimeNano) / float64(nanoSecondsInSecond))
    }
}

And it’s all we needed to do to support the non-native type time.Time!

We are done writing our CBOR encoder. It you would like to see other things covered feel free to reach me at henry@precheur.org.