User:Mrnuke/float serialization talk

How to properly serialize a float

Introduction

Serializing floating point data, i.e. writing it to a stream in a deterministic and reproducible form, is a non-trivial undertaking. When dealing with unsigned integers of known bit-width, the format is universally understood to the binary representation of the integer value. Although the byte order varies between architectures, and the bits may be numbered MSb first or LSb first, this is inconsequential as long as C code dealing with unsigned integers does not make assumptions about such ordering.

As long as the C code is structured to make a clear distinction between data streams and host integers, writing portable serialization/deserialization routines is fairly straightforward using Standard C operators with a deterministic behavior. For the purpose of this talk, a data stream is defined to be a sequence of bytes which contain data in a well-established format. For example, a big-endian 32-bit integer in the stream will be represented as a byte containing bits [31:24], followed by a byte containing bits [23:16], and so on. A host integer is defined to be an integer of known bit-width, that is represented in the arch-specific format for that integer. As a result, no assumptions should be made about the format of the host integer, not even the assumption that the byte-order is little or big endian.

Portable serialization of integer

It is fairly straightforward to serialize an integer into a data stream, using bit-shift operations, which are completely deterministic in the C standard.

inline static void h_to_le32(uint32_t val32, void *dest)
{
	uint8_t *b = dest;
	b[0] = (val32 >> 0) & 0xff;
	b[1] = (val32 >> 8) & 0xff;
	b[2] = (val32 >> 16) & 0xff;
	b[3] = (val32 >> 24) & 0xff;
};

This snipped will place bits [7:0] at the first byte, bits [15:8] at the second byte, and so on, corresponding to the representation in a little-endian stream. At no point was any assumption made about the host byte order, meaning that this snippet will produce identical results on a big-endian, small-endian, and mixed-endian machine. It also does non byte-address the integers, nor does it break pointer aliasing rules.

The only assumption this code makes is that bytes are 8-bit long, which is a universally reasonable assumption.

User:Mrnuke/float serialization talk

How to properly serialize a float

Introduction

Portable serialization of integer

Navigation menu

Search