User:Mrnuke/float serialization talk

From sigrok
Jump to navigation Jump to search

How to properly serialize a float

Introduction

Serializing floating point data, i.e. writing it to a stream in a deterministic and reproducible form, is a non-trivial undertaking. When dealing with unsigned integers of known bit-width, the format is universally understood to the binary representation of the integer value. Although the byte order varies between architectures, and the bits may be numbered MSb first or LSb first, this is inconsequential as long as C code dealing with unsigned integers does not make assumptions about such ordering.

As long as the C code is structured to make a clear distinction between data streams and host integers, writing portable serialization/deserialization routines is fairly straightforward using Standard C operators with a deterministic behavior. For the purpose of this talk, a data stream is defined to be a sequence of bytes which contain data in a well-established format. For example, a big-endian 32-bit integer in the stream will be represented as a byte containing bits [31:24], followed by a byte containing bits [23:16], and so on. A host integer is defined to be an integer of known bit-width, that is represented in the arch-specific format for that integer. As a result, no assumptions should be made about the format of the host integer, not even the assumption that the byte-order is little or big endian.

Portable serialization of integer

It is fairly straightforward to serialize an integer into a data stream, using bit-shift operations, which are completely deterministic in the C standard.

inline static void h_to_le32(uint32_t val32, void *dest)
{
	uint8_t *b = dest;
	b[0] = (val32 >> 0) & 0xff;
	b[1] = (val32 >> 8) & 0xff;
	b[2] = (val32 >> 16) & 0xff;
	b[3] = (val32 >> 24) & 0xff;
};

This snipped will place bits [7:0] at the first byte, bits [15:8] at the second byte, and so on, corresponding to the representation in a little-endian stream. At no point was any assumption made about the host byte order, meaning that this snippet will produce identical results on a big-endian, small-endian, and mixed-endian machine. It also does non byte-address the integers, nor does it break pointer aliasing rules.

The only assumption this code makes is that bytes are 8-bit long, which is a universally reasonable assumption.

Difficulties in serializing floating point numbers

The above snippet will not work reliably if we replace 'uint32_t' with 'float'. While it may work on certain compilers, bit shift operations on floating point numbers are undefined behavior. Even if we manage to get a bit-for-bit representation of the float as a uint32_t, we still need to make one fatal assumption: That the host representation of the float is identical to the representation of the float in the data stream. i.e. How many mantissa bits, how many exponent bits, and whether the sign bit is placed in front of the exponent or attached to the mantissa, etc.

The concerns about the floating point representation can be alleviated if the stream is specified to be in the IEE754 format, and we know the host uses the same format. Unfortunately, this already makes assumptions about the host representation, which we are able to avoid in the unsigned integer case. As we will see below, this is further complicated by other rules in the C standard.

Wrong ways of serializing floats

First approach: alias as an integer

The obvious solution to the problem is to leverage the existing h_to_le32 macro, by telling the compiler to treat the float as a 32-bit integer:

inline static void float_to_le(uint8_t *buf, float value)
{
	uint32_t *my_hack;

	my_hack = &value;
	h_to_le32(*my_hack, buf);
}

This method will work on may compilers, because, from the compiler's point of view, it makes most sense to do what we want it to do; however, this approach breaks the pointer aliasing rules in the C standard, and thus the behavior of the above snippet is undefined.

This approach also makes two other grave mistakes. First, it assumes that the host stores floating point numbers in IEEE-754 format, which may or may not be the case. Secondly, it assumes that the host byte order of the float is the same as that of an integer. This does not have to be the case.

Also, the compiler may have promoted the float 'value' to a double, in which case we are guaranteed to obtain incorrect results. Even if this function weren't inlined, the compiler could make such promotions, for example, with link-time optimizations.

Second attempt: memcpy to an integer

It is easy, and legal in C, to get aroud the aliasing problem by copying the memory contents of value to an integer:

inline static void float_to_le(uint8_t *buf, float value)
{
	uint32_t my_hack;

	memcpy(&my_hack, &value, sizeof(value));
	h_to_le32(my_hack, buf);
}

This approach solves the pointer aliasing problem; however, it suffers from all the shortcomings of the previous approach, and as such, is not portable. Also note that the "promotion to double" concern of the previous approach is no longer an issue: due to the use of aliasing through memcpy, the compiler is no longer permitted to make such optimizations.

Second attempt, version B

It might be possible to utilize a union to alias the float as an integer:

static void float_to_le(void *dest, float value)
{
	union {
		float float_alias;
		uint32_t int_alias;
	} my_hack;

	my_hack.float_alias = value;
	h_to_le32(my_hack.int_alias, dest);
}

However, I have not checked whether the behavior of this approach is well-defined according to the C standard, or what gotcha's, if any, apply to this.

Third attempt: no memcpy

If we wanted to avoid the call to memcpy(), we could use a char* to alias the float, an aliasing which is permitted in the C standard:

/*
 * Stores the float in little-endian BINARY32 IEEE-754 2008 format.
 */
static void float_to_le(uint8_t *buf, float value)
{
	char *old;

	old = (char *)&value;
#ifdef WORDS_BIGENDIAN
	buf[0] = old[3];
	buf[1] = old[2];
	buf[2] = old[1];
	buf[3] = old[0];
#else
	buf[0] = old[0];
	buf[1] = old[1];
	buf[2] = old[2];
	buf[3] = old[3];
#endif
}

Unfortunately, with this approach, we now need to explicitly know the host byte order. This snippet may work on a little-endian machine. It may work on big-endian machines, but it will not work on anything else, and will fail if the macro WORDS_BIGENDIAN is wrongly defined. It also assumes that floats are byte-addressable, which does not have to be the case (then, the application will crash).

It suffers from the same assumptions as before. Since WORDS_BIGENDIAN is decided at configuration time based on the byte-order of 16-bit integers, it doesn't guarantee the byte order will be correct for floats. Also the assumption that the host uses the IEEE-754 format is still deeply ingrained in this approach. In fact, this approach is less portable than the previous one, wehere we memcpy()d the float to an integer.

Fourth attempt: GFloatIEEE754 or unions

The GFloatIEEE754 is a type provided by glib, which attempts to solve this issue. It is defined as the union between a float, and a bitfield reproducing the IEEE-754 representation. This approach, unfortunately is a terrible idea. First, it uses bitfields to map to hardware bits, which is not guaranteed to work, due to the possibility of the compiler applying padding. The declaration of the union also depends on the host byte order (and bit order !!!). This approach introduces too many unknowns to be considered portable, and thus, an example snippet is not given here.

Portable approaches

Right now, the second approach is the most portable, at the expense of an extra memcpy for every float, but it is not completely safe. If we contend with the assumptions that the host uses the IEEE-754 format, and that the host integer byte order is the same as the host float byte order, it is perfectly valid.

If we could obtain the sign bit, the exponent as an integer, and the significant as an bit-for-bit integer representation of the mantissa, we would then be able to shift all the bits to their correct positions in the data stream, and the serialization would work in a way truly independent of the host, without the need to make any assumption. Unfortunately, the C standard does not provide a way to do this without needing to use floating point math to extract these parameters.

Unfortunately, a truly portable approach involves heavy floating-point computations, as exemplified in this stackoverflow answer. The question of whether the assumptions made by the second approach are sufficiently reasonable for libsigrok must be asked before exploring the truly portable approaches.