Sort order preserving serialization
FoundationDB is a distributed key-value store. From the database perspective, both keys and values are just bytes. Keys are sorted lexicographically and different chunks are assigned to different servers. Because of this locality property, the range read operation is efficient as most of the time the client would be able to fetch the data from a single server.
As the keys are just bytes, the application must serialize the keys and more importantly if we need to use range operation, then the serialization should preserve the order of the keys. Fortunately, all the clients implement the tuple layer which has this property. This post attempts to explain how various data types are encoded in a way that preserves their natural sort order.
I will represent bytes using their hex format for the rest of the blog post. Each data type is tagged with an identifier which uniquely identifies the type.
Null
Null is encoded with the tag
Byte String
Tag
It’s clear that order would be preserved for byte strings without
Because the terminator
Integer
Integers upto 8 bytes are supported. Integers get encoded into
variable size which depends on minium bytes required to represent
the integer. Tags
For positive integers, big-endian byte
order is used
which naturally preserves the sort order. For negative integers, the
sign is removed and bit complement of big-endian byte order is used
bit-complement(big-endian(abs(n)))
. The bit complement operation
reverses the sort order. If two integers get encoded into different
sized bytes, then the order would be preserved by tag values. The tag
value increases as the integer value increases.
-98344948949494949
-303040404040
-20404
-42
42
20404
303040404040
98344948949494949
Float
Tags
def transform(<<sign::big-integer-size(8), rest::binary>> = full) do
if (sign &&& 0x80) != 0x00 do
:binary.bin_to_list(full)
|> Enum.map(fn e -> 0xFF ^^^ e end)
|> IO.iodata_to_binary()
else
<<0x80 ^^^ sign>> <> rest
end
end
The value is first encoded in IEEE 754 binary format. For positive numbers just the sign bit should be flipped and for negative numbers all the bits should be flipped. This should provide total order as per the IEEE specification.
Tuple
Because of the way other values are encoded, a tuple can be encoded just by concatenating the encoded values. Note that the sort order is preserved only between values of same types. Specially comparison between types like Float and Integer won’t work.
(
, 42)
(
, 42)
This example illustrates why
Nested Tuple
Tuple type can’t be nested as there is no difference between (1, (2,
3))
and (1, 2, (3))
. Both values would get encoded as the same as
the flattened tuple (1, 2, 3)
. Nested tuple, as the name implies
supports arbitrary nesting. [
is used to represent nested tuple.
Tag
[1, [2, 3]]
[1, 2, [3]]
There are other data types like unicode string, arbitrary precision number, UUID, boolean etc, but all of them follow similar principles discussed so far. As such, they are not considered here.