File format:sigrok/v3

From sigrok
(Redirected from New sigrok file format)
Jump to navigation Jump to search

This page describes the proposed file/stream format (v3) for storing and transmitting sigrok related data.

NOTE: This is work in progress and has not yet been implemented!

Motivation

The previous sigrok session file format (version 2) is a ZIP file containing multiple files (some metadata files and data files containing the actual samples). This works fine, but it also has some issues:

  • In order to get to the data you want, you need to decompress the whole file.
  • Appending to a file is not possible easily (and it's not efficient).
  • It doesn't support storing additional information for frontends (channel colors, and so on).
  • ...

Goals

The following list highlights some of the goals of the new file format (v3):

  • It must be able to store
    • arbitrary data (logic samples, and/or analog samples, and/or protocol decoder data, and more), as well as
    • arbitrary meta-/config-data and other extra information that may be useful to frontends (UI state data, user-configured probe colors, names, positions, and so on).
  • It must support and facilitate stream-oriented processing (save, load, transmission, compression/decompression, and so on).
  • It must support compression of the payload data.
  • It must be usable independent of hardware architecture (x86, ARM, PowerPC, MIPS, and so on), operating system, endianness, float representation, and so on. All data fields must be properly specified (endianness, signedness, size, format).
  • It must allow for sufficiently good performance for the common operations a frontend needs to perform on the data/file/stream (save, load, compress/uncompress, append, and so on) so that it doesn't become the bottleneck. This is especially important for stream-oriented devices which could otherwise lose samples if the processing on the host side is not sufficiently fast (Saleae Logic, Saleae Logic16, IKALOGIC ScanaPLUS, others).
  • It should be able to handle run-time changes in the data streams (via meta packets on the session bus), e.g. changing samplerates, changing probes, etc. etc.
  • It should have better compression properties than ZIP (e.g. using LZO or other algorithms, this is to be evaluated). What we ideally want out of the compression algorithm is:
    • Good and relatively fast compression results at only moderate CPU usage.
    • Very fast decompression (LZO is probably the best one here, as it's specifically designed for this).
    • Ideally, support for appending further data to already compressed data chunks (though this could be also implemented outside of the compression algorithm per se).
    • Open-source license and OS portability. There should be an open-source library or code chunk for compression/uncompression and it should be widely available in Linux distros, and portable to Windows, Mac OS X, FreeBSD, Android, and so on.

Specification

UUIDs

The format uses random UUIDs (version 4) as per RFC4122 in various places. These UUIDs are always 16 bytes long.

A simple way to generate a random (version 4) UUID (ASCII and hex representation):

$ python3 -c 'import uuid; u = uuid.uuid4(); print(u); print(u.hex)'
14c49f22-f08a-4ef2-b3d7-82ee16c3d531
14c49f22f08a4ef2b3d782ee16c3d531

File/stream format

The format consists entirely of a stream of packets of various types.

These packets can be either written to or read from a file, buffer, pipe, socket, or any other source/destination.

Packet format

Every packet consists of four fields:

Field Length Description
Short-UUID 2 An ID (2 bytes, big-endian) that maps to a previously defined 16-byte packet type UUID. The Short-UUID values can range from 0x0000 to 0xffff, which allows for 65536 different packet types in a single file/stream. The Short-UUIDs 0x0000 and 0x0001 are special and cannot be used for "normal" packets, see below. The reason for using a (Short-)UUID here instead of some simple index number is to allow for clients to define and use their own special-purpose packet types as they see fit, without having to fear any conflicts with existing packet types (or packet types that someone else might add later).
Reference-ID 4 An ID (4 bytes, big-endian) that is assigned to this packet, so that other packets can reference it. Valid values: 0x00000001 - 0xffffffff. A value of 0x00000000 means that this packet doesn't have a Reference-ID. Note that a (Short-)UUID specifies a certain type of packet, whereas the Reference-ID identifies a specific individual packet. For example, there can be multiple different packets (different Reference-ID) that are of the same type (same Short-UUID).
Length 4 The length, in bytes, of the data in this packet (4 bytes, big-endian). The length does not include the length of the Short-UUID, Reference-ID or Length field, only the length of the Data field.
Data 0..n The actual payload data, max. 2^32 bytes (4GiB). For some packet types the Data field is optional (in that case it is completely omitted and the Length field is set to 0). The contents of the Data field are entirely dependent on (and vary with) the type of packet.

Using the common type-length-value idom for each packet allows clients to easily skip over (ignore) any packets they do not know how to handle, and instead continue on to checking/handling the next packet.

Example packet with a 7-byte data field (Short-UUID is 0x55aa, Reference-ID is 0x00008ab2):

Short-UUID Reference-ID Length Data
55 aa 00 00 8a b2 00 00 00 07 11 22 33 44 55 66 77

Example packet without a data field (Short-UUID is 0x55aa, Reference-ID is 0x00005f31):

Short-UUID Reference-ID Length
55 aa 00 00 5f 31 00 00 00 00

PKT_MAP_UUIDS packet

This is a special packet that is used to map 16-byte UUIDs to 2-byte Short-UUIDs.

Since every packet has a 2-byte Short-UUID, PKT_MAP_UUIDS must be the first packet in a file/stream, otherwise the client will not be able to interpret any other packets.

However, PKT_MAP_UUIDS can occur multiple times in a stream. Every time PKT_MAP_UUIDS is seen, mappings that were not yet defined are added to the list of mappings, and mappings that already existed will be overwritten with the respective new mapping.

Since PKT_MAP_UUIDS is a packet itself, it also consists of the four common fields Short-UUID/Reference-ID/Length/Data. The Short-UUID of PKT_MAP_UUIDS is always 0x0000.

The Data field has the following contents:

Field Length Description
Special Short-UUID for magic marker 2 A reserved special Short-UUID (2 bytes, big-endian) for the magic marker. Value: 0x0001.
Special UUID for magic marker 16 This is a special marker that can be used by the file utility (and other tools) to detect the file format easily. Contents: $sIgRoK$$SiGrOk$.
Short-UUID 1 2 The 2-byte Short-UUID (2 bytes, big-endian) with index 1 (valid values: 0x0002 to 0xffff) that will, from now on, map to the UUID specified below.
UUID 1 16 The UUID with index 1 (binary representation, 16 bytes, big-endian) which identifies the type of packet (globally unique).
Short-UUID 2 2 The 2-byte Short-UUID (2 bytes, big-endian) with index 2 (valid values: 0x0002 to 0xffff) that will, from now on, map to the UUID specified below.
UUID 2 16 The UUID with index 2 (binary representation, 16 bytes, big-endian) which identifies the type of packet (globally unique).
... ... ...

Important notes:

  • The Data field contains a list of Short-UUID to UUID mappings. Since every such pair is 18 bytes in size, the Length field of PKT_MAP_UUIDS can be used to deduce how many such mappings are contained in the Data field.
  • The special "magic marker" fields (2 + 16 bytes) are required to be in every PKT_MAP_UUIDS and are required to always be the first entries of PKT_MAP_UUIDS. The file format can thus easily be detected by looking at the unique bytes 10-27 in the file (additionally, the file also always starts with the two bytes 0x00 0x00).
  • The special Short-UUID 0x0000 must not be used in any mapping, it is reserved for PKT_MAP_UUIDS itself.
  • The special Short-UUID 0x0001 must not be used in any mapping, it is reserved for the special "magic marker", see above.
  • There is no guarantee of any kind about which Short-UUIDs will be mapped (and to what). Specifically, a client can not assume that Short-UUIDs start at 0x0002, and it can not assume that Short-UUIDs are ordered in any way. The Short-UUIDs can have a completely random order and they can also have gaps.
  • Mappings are generally not static in nature. Every additional PKT_MAP_UUIDS that occurs can dynamically add or overwrite/change mappings, for example.

Example packet:

Short-UUID Reference-ID Length Data
00 00 xx xx xx xx 00 00 00 48 00 01 24 73 49 67 52 6f 4b 24 24 73 49 67 52 6f 4b 24
77 a1 5a 17 72 eb 28 54 48 a8 a4 1c 73 97 d7 e9 22 3d
00 06 59 de f3 30 53 6a 46 b1 8e dd 62 f2 19 5d 1c 95
a3 9f ec 6b d7 63 c8 79 4a a7 a9 7a 7e df 0e 68 af c7

The above PKT_MAP_UUIDS maps three different UUIDs to the Short-UUIDs 0x77a1, 0x0006 and 0xa39f.

sigrok packets

The following packets are currently defined for use in projects hosted on sigrok.org.

The "names" (e.g. "SIGROK_PKT_LOGIC") are for documentation purposes only, the (Short-)UUIDs are what actually matters. The names are prefixed with SIGROK_ to make it clear that other 3rd-party software may define their own additional packet types with arbitrary contents and for arbitrary purposes.

One of the reasons for splitting up different properties into many small packets (SIGROK_PKT_CH_TYPE, SIGROK_PKT_CH_NAME, and so on) is that this allows for future additions (of e.g. various other channel properties), without the need to change an existing packet format. Additional packets for e.g. the channel color (for use in UIs) that also back-reference a SIGROK_PKT_CH packet can be added without the need for protocol/format changes or version field bumps.

Guidelines for sigrok packets:

In principle there are two ways to handle the case of having to change the information or format of things we want to describe/transport (e.g. in case additional information needs to be added or some information/format needs to change):

  • An new packet type (new UUID) could be added, which is defined to contain the new/changed fields.
  • A "packet version" field in an existing packet could be bumped (the packet UUID would remain the same, though), and the other fields within the packet would change format and/or size and/or semantics.

Our current guideline of when to use which of the above methods (i.e., when to add some information as new packet or as additional field in an existing packet) is as follows:

  • If the thing that we want to describe/transport is something optional, it should be an extra packet with its own UUID (not a field in an existing packet). This has the advantage that the packet parser can simply skip over this whole packet if it is unknown/unsupported. If it were a field in an existing packet we'd have to bump the "packet version" field and the rest of the fields would look different and be incompatible with clients/parsers only supporting the old packet version; the old client wouldn't be able to use/parse the whole packet anymore (the changed optional thing and the unchanged old fields as well), which is undesirable.
  • If the thing that we want to describe/transport is something that's required, it's preferrable to have it be a field inside an existing packet (not being a packet by itself).

SIGROK_PKT_DEV

This is a packet type used to define a device.

This packet uses the fixed UUID 94aa863d-bb58-4d79-b944-ab9dd30eecdf.

The Data field is empty.

Example packet:

Short-UUID Reference-ID Length
uu uu tt tt tt tt 00 00 00 00

SIGROK_PKT_DEV_VENDOR

This is a packet type used to define a device vendor name.

This packet uses the fixed UUID c09c7a5c-8566-42ec-8fde-7737436b0e64.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV).
Vendor name length 2 The length, in bytes, of the vendor name (2 bytes, big-endian).
Vendor name n The vendor name (UTF-8 string), not NUL-terminated.

Example packet:

The following packet defines a vendor name "Saleae" (with the Reference-ID tt tt tt tt). The Backreference-ID bb bb bb bb references a previously defined device.

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 0c bb bb bb bb 00 06 Saleae

SIGROK_PKT_DEV_MODEL

This is a packet type used to define a device model name.

This packet uses the fixed UUID 88058d2f-225e-4ee6-b915-9fd009944464.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV).
Model name length 2 The length, in bytes, of the model name (2 bytes, big-endian).
Model name n The model name (UTF-8 string), not NUL-terminated.

Example packet:

The following packet defines a model name "Logic16" (with the Reference-ID tt tt tt tt). The Backreference-ID bb bb bb bb references a previously defined device.

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 0d bb bb bb bb 00 07 Logic16

SIGROK_PKT_DEV_VERSION

This is a packet type used to define a device version.

This packet uses the fixed UUID 1607d8f4-4eef-4d1b-b679-c37729de2b32.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV).
Device version length 2 The length, in bytes, of the device version (2 bytes, big-endian).
Device version n The device version (UTF-8 string), not NUL-terminated.

Example packet:

The following packet defines a device version "3.07" (with the Reference-ID tt tt tt tt). The Backreference-ID bb bb bb bb references a previously defined device.

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 0a bb bb bb bb 00 04 3.07

SIGROK_PKT_DEV_SERNUM

This is a packet type used to define a device serial number.

This packet uses the fixed UUID e11259d3-8214-4bd9-899d-4ba0f4aa042e.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV).
Device serial number length 2 The length, in bytes, of the device serial number (2 bytes, big-endian).
Device serial number n The device serial number (UTF-8 string), not NUL-terminated.

Example packet:

The following packet defines a device version "1234" (with the Reference-ID tt tt tt tt). The Backreference-ID bb bb bb bb references a previously defined device.

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 0a bb bb bb bb 00 04 1234

SIGROK_PKT_DEV_SAMPLERATE

This is a packet type used to define a device samplerate. This is the overall samplerate currently used in the device (across all channels and channel groups).

This packet can be used for devices where all channels and channel groups use the same samplerate (e.g. most logic analyzers). For devices where this is not the case, SIGROK_PKT_CG_SAMPLERATE can be used to override SIGROK_PKT_DEV_SAMPLERATE (if any) for a whole channel group, and SIGROK_PKT_CH_SAMPLERATE can be used to override both SIGROK_PKT_DEV_SAMPLERATE (if any) and SIGROK_PKT_CG_SAMPLERATE (if any) for a specific individual channel.

This packet uses the fixed UUID 649f0ea5-b410-460d-a4b1-6d5e45c6725f.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV).
Packet version 1 The version of the SIGROK_PKT_DEV_SAMPLERATE packet format. Current version: 0x01. This will be bumped when backwards-incompatible changes to this packet format are introduced.
Sample rate data type 1 The data type used to represent the samplerate. 0x01: uint64_t, big-endian.
Sample rate 8 (can vary) The samplerate (using the format/length specified by the previous field).

Example packet:

(Samplerate is 4 MHz == 0x3d0900 MHz)

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 0e bb bb bb bb 01 01 00 00 00 00 00 3d 09 00

SIGROK_PKT_CH

This is a packet type used to define a channel.

This packet uses the fixed UUID 1325b595-0d5e-40a4-ac4d-36e89224dcb9.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined device (SIGROK_PKT_DEV) that the channel belongs to.

Example packet:

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 04 bb bb bb bb

SIGROK_PKT_CH_TYPE

This is a packet type used to define a channel type.

This packet uses the fixed UUID 6b12bdcc-02c8-493a-a89d-662ee9d1a34d.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined channel (SIGROK_PKT_CH).
Channel type 1 The type of the back-referenced channel. 0x01: Logic, 0x02: Analog.

Example packet:

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 05 bb bb bb bb 01

SIGROK_PKT_CH_NAME

This is a packet type used to define a channel name.

This packet uses the fixed UUID 730ba9b7-638a-4b79-94dc-b9beb0735acf.

The Data field has the following contents:

Field Length Description
Backreference-ID 4 A Reference-ID (4 bytes, big-endian) referencing a previously defined channel (SIGROK_PKT_CH).
Channel name length 2 The length, in bytes, of the channel name (2 bytes, big-endian).
Channel name n The channel name (UTF-8 string), not NUL-terminated.

Example packet:

The following packet defines a channel name "CH1" (with the Reference-ID tt tt tt tt). The Backreference-ID bb bb bb bb references a previously defined channel for which this name is to apply.

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 09 bb bb bb bb 00 03 CH1

SIGROK_PKT_CH_SAMPLERATE

SIGROK_PKT_CG

SIGROK_PKT_CG_NAME

SIGROK_PKT_CG_SAMPLERATE

SIGROK_PKT_FRAME

This is a packet type used to define a frame. SIGROK_PKT_LOGIC and SIGROK_PKT_ANALOG can backreference such a packet to indicate that the sample data belongs to a particular frame.

This packet uses the fixed UUID aa9c4d20-49f0-4ec4-b6ab-92daa3f81a5d.

The Data field has the following contents:

Field Length Description
Packet version 1 The version of the SIGROK_PKT_FRAME packet format. Current version: 0x01. This will be bumped when backwards-incompatible changes to this packet format are introduced.
Frame start time 8 A 64-bit unsigned integer (8 bytes, big-endian) which specifies the time during acquisition at which this frame began. It is given in samples, using the samplerate specified by SIGROK_PKT_DEV_SAMPLERATE. If the device samplerate isn't specified, this field may be ignored. If the acquisition time for this frame is unknown, this field shall be set to 0.

SIGROK_PKT_LOGIC

This is a packet type used to store/transmit (only) digital samples, usually from a logic analyzer.

This packet uses the fixed UUID 2236202e-9ee7-4bc6-81f6-56b4e6e029ba.

The Data field has the following contents:

Field Length Description
Packet version 1 The version of the SIGROK_PKT_LOGIC packet format. Current version: 0x01. This will be bumped when backwards-incompatible changes to this packet format are introduced.
Frame backreference ID 4 A Reference ID (4 bytes, big-endian) referencing a previously defined frame (SIGROK_PKT_FRAME).
Payload format Short-UUID 2 A Short-UUID (2 bytes, big-endian) which identifies a certain payload format.
Compression scheme Short-UUID 2 A Short-UUID (2 bytes, big-endian) which identifies a certain compression scheme that is applied to the payload data.
Payload length 4 The length, in bytes, of the actual payload data in this SIGROK_PKT_LOGIC packet (4 bytes, big-endian). The length only includes the Payload field.
Payload 0..n The actual payload data, i.e. logic analyzer samples in the specified payload format, using the specified compression scheme.

Example packet:

(Packet type SIGROK_PKT_LOGIC Short-UUID 0xuuuu, Reference-ID 0xtttttttt, 0x12 bytes packet data, SIGROK_PKT_LOGIC version 0x01, SIGROK_PAYLOAD_LOGIC_M1 payload format Short-UUID 0xvvvv, SIGROK_COMPR_NONE compression scheme Short-UUID 0xwwww, 8 bytes of logic analyzer payload (uncompressed))

Short-UUID Reference-ID Length Data
uu uu tt tt tt tt 00 00 00 11 01 vv vv ww ww 00 00 00 08 11 22 33 44 55 66 77 88

SA: How does the mapping to the individual channels work? Do we need to backreference a channel group here that declares e.g. which 8 logic channels make up one such logic packet and in which order they are stored?

SIGROK_PKT_ANALOG

This is a packet type used to store/transmit (only) analog samples, e.g. from a multimeter, oscilloscope, sound level meter, or any other source for analog data.

This packet uses the fixed UUID 59def330-536a-46b1-8edd-62f2195d1c95.

Details yet to be defined.

The Data field has the following contents:

Field Length Description
Packet version 1 The version of the SIGROK_PKT_LOGIC packet format. Current version: 0x01. This will be bumped when backwards-incompatible changes to this packet format are introduced.
Frame backreference ID 4 A Reference ID (4 bytes, big-endian) referencing a previously defined frame (SIGROK_PKT_FRAME).
TBD x TBD

List of known packet types

This is a short overview of known packet types that are in use. This includes the packet types used in projects hosted at sigrok.org, as well as pointers to packet types that other (3rd-party) software is known to use.

UUID Packet type Description
94aa863d-bb58-4d79-b944-ab9dd30eecdf SIGROK_PKT_DEV See above.
c09c7a5c-8566-42ec-8fde-7737436b0e64 SIGROK_PKT_DEV_VENDOR See above.
88058d2f-225e-4ee6-b915-9fd009944464 SIGROK_PKT_DEV_MODEL See above.
1607d8f4-4eef-4d1b-b679-c37729de2b32 SIGROK_PKT_DEV_VERSION See above.
e11259d3-8214-4bd9-899d-4ba0f4aa042e SIGROK_PKT_DEV_SERNUM See above.
649f0ea5-b410-460d-a4b1-6d5e45c6725f SIGROK_PKT_DEV_SAMPLERATE See above.
1325b595-0d5e-40a4-ac4d-36e89224dcb9 SIGROK_PKT_CH See above.
6b12bdcc-02c8-493a-a89d-662ee9d1a34d SIGROK_PKT_CH_TYPE See above.
730ba9b7-638a-4b79-94dc-b9beb0735acf SIGROK_PKT_CH_NAME See above.
aa9c4d20-49f0-4ec4-b6ab-92daa3f81a5d SIGROK_PKT_FRAME See above.
2236202e-9ee7-4bc6-81f6-56b4e6e029ba SIGROK_PKT_LOGIC See above.
59def330-536a-46b1-8edd-62f2195d1c95 SIGROK_PKT_ANALOG See above.

List of known payload formats

This is a short overview of known payload formats that are in use. This includes the payload formats used in projects hosted at sigrok.org, as well as pointers to payload formats that other (3rd-party) software is known to use.

UUID Payload format Description
d2964f38-8b13-4570-9add-add5678a0394 SIGROK_PAYLOAD_LOGIC_M1 This payload format can only store digital samples from a logic analyzer (0/1 values for a certain channel/probe/pin). It is basically identical to the format that was used in the previous ZIP-based file format versions. Details are yet to be defined.
79e7cfd1-0f56-4d5e-968a-b66fdbdff624 SIGROK_PAYLOAD_ANALOG_M1 A certain type of payload format that can store (only) analog samples of a certain number of analog channels. Details are yet to be defined.

List of known compression schemes

This is a short overview of known compression schemes that are in use. This includes the schemes used in projects hosted at sigrok.org, as well as pointers to schemes that other (3rd-party) software is known to use.

UUID Compression scheme Description
ec6bd763-c879-4aa7-a97a-7edf0e68afc7 SIGROK_COMPR_NONE No compression whatsoever is used.
acd2e249-5c4d-426d-96ae-ded5b6020e6f SIGROK_COMPR_RLE_M1 A certain type of RLE-based compression is used. Details are yet to be defined.
  • JH: Do we need info about interleaving here? We could insist that all channels be de-intereleaved, or add support for interleaved streams.
  • JH: Would could add support for device compression schemes. It may sometimes be desirable to be able to pass the device stream straight into the file.

Futher notes and ideas to consider

  • Data should be encoded in a data aware way. This would give greater compression:
    • Logic Data is most efficient stored in RLE+Huffman or Golomb coding. e.g. a clock signal may compress to one bit per edge.
      • JH: I wonder if we can do even better by XOR-ing the data with some kind of frequency tracking oscillator. This would convert a square wave into mostly continuous 0s or 1s, with occasional pulses where jitter occurs.
      • JH: This kind of thing is best prototyped with a script e.g python + the bitset library.
    • FLAC (libflac) or a FLAC inspired codec (linear predicition) is probably as good as it gets for lossless analog data encoding.
  • If data is stored in a format specific way, it would be best to store it as a series of stream-blocks, similar to how video containers work. Would it be possible to simply leverage a video container such as OGG? IIRC this contains headers to declare metadata about each stream, then a series of timestamped stream blocks interleaved together. The time stamp is a format specific number... for audio: the sample number, for video: the frame number, so sigrok formats can easily leverage this.
    • Similarly RTP is a rather natural protocol for sigrok network streaming.