User:Mrnuke/float serialization talk
How to properly serialize a float
Introduction
Serializing floating point data, i.e. writing it to a stream in a deterministic and reproducible form, is a non-trivial undertaking. When dealing with unsigned integers of known bit-width, the format is universally understood to the binary representation of the integer value. Although the byte order varies between architectures, and the bits may be numbered MSb first or LSb first, this is inconsequential as long as C code dealing with unsigned integers does not make assumptions about such ordering.
As long as the C code is structured to make a clear distinction between data streams and host integers, writing portable serialization/deserialization routines is fairly straightforward using Standard C operators with a deterministic behavior. For the purpose of this talk, a data stream is defined to be a sequence of bytes which contain data in a well-established format. For example, a big-endian 32-bit integer in the stream will be represented as a byte containing bits [31:24], followed by a byte containing bits [23:16], and so on. A host integer is defined to be an integer of known bit-width, that is represented in the arch-specific format for that integer. As a result, no assumptions should be made about the format of the host integer, not even the assumption that the byte-order is little or big endian.
Portable serialization of integer
It is fairly straightforward to serialize an integer into a data stream, using bit-shift operations, which are completely deterministic in the C standard.
inline static void h_to_le32(uint32_t val32, void *dest) { uint8_t *b = dest; b[0] = (val32 >> 0) & 0xff; b[1] = (val32 >> 8) & 0xff; b[2] = (val32 >> 16) & 0xff; b[3] = (val32 >> 24) & 0xff; };
This snipped will place bits [7:0] at the first byte, bits [15:8] at the second byte, and so on, corresponding to the representation in a little-endian stream. At no point was any assumption made about the host byte order, meaning that this snippet will produce identical results on a big-endian, small-endian, and mixed-endian machine. It also does non byte-address the integers, nor does it break pointer aliasing rules.
The only assumption this code makes is that bytes are 8-bit long, which is a universally reasonable assumption.
Difficulties in serializing floating point numbers
The above snippet will not work reliably if we replace 'uint32_t' with 'float'. While it may work on certain compilers, bit shift operations on floating point numbers are undefined behavior. Even if we manage to get a bit-for-bit representation of the float as a uint32_t, we still need to make one fatal assumption: That the host representation of the float is identical to the representation of the float in the data stream. i.e. How many mantissa bits, how many exponent bits, and whether the sign bit is placed in front of the exponent or attached to the mantissa, etc.
The concerns about the floating point representation can be alleviated if the stream is specified to be in the IEE754 format, and we know the host uses the same format. Unfortunately, this already makes assumptions about the host representation, which we are able to avoid in the unsigned integer case. As we will see below, this is further complicated by other rules in the C standard.
Wrong ways of serializing floats
First approach: alias as an integer
The obvious solution to the problem is to leverage the existing h_to_le32 macro, by telling the compiler to treat the float as a 32-bit integer:
inline static void float_to_le(uint8_t *buf, float value) { uint32_t *my_hack; my_hack = &value; h_to_le32(*my_hack, buf); }
This method will work on may compilers, because, from the compiler's point of view, it makes most sense to do what we want it to do; however, this approach breaks the pointer aliasing rules in the C standard, and thus the behavior of the above snippet is undefined.
This approach also makes two other grave mistakes. First, it assumes that the host stores floating point numbers in IEEE-754 format, which may or may not be the case. Secondly, it assumes that the host byte order of the float is the same as that of an integer. This does not have to be the case.
Also, the compiler may have promoted the float 'value' to a double, in which case we are guaranteed to obtain incorrect results. Even if this function weren't inlined, the compiler could make such promotions, for example, with link-time optimizations.
Second attempt: memcpy to an integer
It is easy, and legal in C, to get aroud the aliasing problem by copying the memory contents of value to an integer:
inline static void float_to_le(uint8_t *buf, float value) { uint32_t my_hack; memcpy(&my_hack, &value, sizeof(value)); h_to_le32(my_hack, buf); }
This approach solves the pointer aliasing problem; however, it suffers from all the shortcomings of the previous approach, and as such, is not portable.
Third attempt: no memcpy
If we wanted to avoid the call to memcpy(), we could use a char* to alias the float, an aliasing which is permitted in the C standard:
/* * Stores the float in little-endian BINARY32 IEEE-754 2008 format. */ static void float_to_le(uint8_t *buf, float value) { char *old; old = (char *)&value; #ifdef WORDS_BIGENDIAN buf[0] = old[3]; buf[1] = old[2]; buf[2] = old[1]; buf[3] = old[0]; #else buf[0] = old[0]; buf[1] = old[1]; buf[2] = old[2]; buf[3] = old[3]; #endif }
Unfortunately, with this approach, we now need to explicitly know the host byte order. This snippet may work on a little-endian machine. It may work on big-endian machines, but it will not work on anything else, and will fail if the macro WORDS_BIGENDIAN is wrongly defined. It also assumes that floats are byte-addressable, which does not have to be the case (then, the application will crash).
It suffers from the same assumptions as before. Since WORDS_BIGENDIAN is decided at configuration time based on the byte-order of 16-bit integers, it doesn't guarantee the byte order will be correct for floats. Also the assumption that the host uses the IEEE-754 format is still deeply ingrained in this approach. In fact, this approach is less portable than the previous one, wehere we memcpy()d the float to an integer.