Difference between revisions of "User:Danielk/Simple Internal Compression Proposal"

Latest revision as of 14:50, 5 July 2015

Goal

Many logic analyzers and file formats support simple stream compression; usually employing some variation of RLE. However, within sigrok logic streams are currently always uncompressed, and a driver for a device with compression has to expand the compressed stream coming from the device before passing it on.

This is not a problem for disk storage, since many of the output modules re-compress the logic stream (e.g. ZIP for the srzip format, or RLE for the VCD format). It is also not a problem if the run lengths for each sample are very short.

However, the worst case behavior is incredibly awful. For instance, with many LAs it is very easy to fill Gigabytes of RAM within seconds, simply by forgetting to turn on the signal source before starting acquisition. It makes for a very bad user experience if a simple mistake like that instantly puts your machine into swap death.

Apart from mistakes that can be made, there are also legitimate reasons to over-sample logic signals -- for instance, if accurate timing information is required.

Even if the RAM is not exhausted, processing will be very slow if streams contain Kilobytes or even Megabytes of consecutive identical samples. This applies to all stages in the processing pipeline, including decoders and display in PulseView.

Therefore, sigrok should support some means of internal stream compression. The compression scheme does not need to be particularly efficient. It should be some form of RLE, to allow easy transcoding from one format to another without having to expand the data. It is not intended to replace srzip compression. However, it would make sense to apply ZIP on top of RLE, in order to avoid having to expand the data at any point.

Proposed Format

Currently, logic data feed packets have a channel count and a unit size in bytes. For the LWLA1034, the channel count is 34 and the unit size is 5. Each sample looks like this:

Byte 0	Byte 1	Byte 2	Byte 3	Byte 4
Ch. 7..0	Ch. 15..8	Ch. 23..16	Ch. 31..24	Ch. 33..32

The 6 high bits of byte 4 are unused.

This could be extended by introducing another parameter repeat count width, specifying the number of bits to use for the repeat count. The choice of a good width would be at the discretion of the driver and may vary from one acquisition to another. However, channel count plus repeat count width must be not be greater than 8 times the unit size.

As an example, the LWLA1034 driver could specify a repeat count width of 30, which makes for a nice unit size of 8 bytes. Each sample would then look like this:

Byte 0	Byte 1	Byte 2	Byte 3	Byte 4	Byte 5	Byte 6	Byte 7
Ch. 7..0	Ch. 15..8	Ch. 23..16	Ch. 31..24	Rep. 5..0, Ch. 33..32	Rep 13..6	Rep 21..14	Rep 29..22

For ease of processing, the repeat count width should be restricted to at most 31 bits, so that it fits into a 32-bit integer while allowing some room for arithmetic without overflow. For devices that generate larger repeat counts, it is easy enough to split up longer runs on the fly. Also, the repeat count should not be allowed to span more than 4 bytes.

Instead of requiring tight packing of the channel state and repeat count bits, one could also have a repeat count offset property which specifies the bit index at which the repeat count starts. This would allow an efficient arrangement for e.g. a 28 channels device with a 31-bit repeat count stored in bytes 4 to 7, while satisfying the 4-byte span restriction and guaranteeing 32-bit alignment. The separate property should make no difference to the extraction speed.

There should be no guarantee that the RLE-encoded stream is minimal. That is, it is legitimate for a driver to produce consecutive identical samples with run lengths less than the maximum. This avoids the need for the driver to carry state from one chunk of data to the next.

The input/output modules should do what is sensible for the format. E.g. the VCD output can just re-use the incoming run lengths without intermediate expansion. The VCD input module should generate RLE-compressed data feeds. The srzip writer should apply ZIP on top on the RLE-compressed data feed.

API Thoughts

In order to make life easy for processing modules and client code, libsigrok should provide helper macros and/or inline functions to deal with RLE data feeds. In particular, there should be a function to extract the repeat count of a sample as an integer. On machines which permit unaligned 32-bit reads (and assuming the unit size is at least 4), this function could just do a single memory read (and a byte swap on BE machines) followed by a right shift.

An alternative implementation would be to have a fast path if unit size is divisible by 4 and a generic fallback path doing byte-wise extraction. This variant would never generate unaligned reads (assuming the data feed starts on an aligned address) and needs fewer conditionals.

The channel state extraction does not change from how it works now, except that care has to be taken to mask unused bits in the last channel state byte, if any. However, in the same vein as above, it would make sense to introduce inline helper functions to simplify indexed access to single channels.

Optionally, an extension to the protocol decoder Python API may allow decoders to process unexpanded RLE data. This could speed up decoding quite significantly in some situations.

@@ Line 56: / Line 56: @@
 |}
-For ease of processing, the repeat count width should be restricted to at most 31 bits, so that it fits into a signed 32-bit integer. For devices that generate larger repeat counts, it is easy enough to split up longer runs on the fly. Also, the repeat count should not be allowed to span more than 4 bytes.
+For ease of processing, the repeat count width should be restricted to at most 31 bits, so that it fits into a 32-bit integer while allowing some room for arithmetic without overflow. For devices that generate larger repeat counts, it is easy enough to split up longer runs on the fly. Also, the repeat count should not be allowed to span more than 4 bytes.
-There should be no guarantee that the RLE-encoded stream is minimal. That is, it is legitimate for a driver to consecutive identical samples with run lengths less than the maximum. This avoids the need for the driver to carry state from one chunk of data to the next.
+Instead of requiring tight packing of the channel state and repeat count bits, one could also have a ''repeat count offset'' property which specifies the bit index at which the repeat count starts. This would allow an efficient arrangement for e.g. a 28 channels device with a 31-bit repeat count stored in bytes 4 to 7, while satisfying the 4-byte span restriction and guaranteeing 32-bit alignment. The separate property should make no difference to the extraction speed.
+There should be no guarantee that the RLE-encoded stream is minimal. That is, it is legitimate for a driver to produce consecutive identical samples with run lengths less than the maximum. This avoids the need for the driver to carry state from one chunk of data to the next.
 The input/output modules should do what is sensible for the format. E.g. the VCD output can just re-use the incoming run lengths without intermediate expansion. The VCD input module should generate RLE-compressed data feeds. The srzip writer should apply ZIP on top on the RLE-compressed data feed.
@@ Line 64: / Line 66: @@
 == API Thoughts ==
-In order to make life easy for processing modules and client code, libsigrok should provide helper macros and/or inline functions to deal with RLE data feeds. In particular, there should be a function to extract the repeat count of a sample as an integer. On machines which permit unaligned 32-bit reads, this function could just do a single memory read followed by a right shift.
+In order to make life easy for processing modules and client code, libsigrok should provide helper macros and/or inline functions to deal with RLE data feeds. In particular, there should be a function to extract the repeat count of a sample as an integer. On machines which permit unaligned 32-bit reads (and assuming the ''unit size'' is at least 4), this function could just do a single memory read (and a byte swap on BE machines) followed by a right shift.
+An alternative implementation would be to have a fast path if ''unit size'' is divisible by 4 and a generic fallback path doing byte-wise extraction. This variant would never generate unaligned reads (assuming the data feed starts on an aligned address) and needs fewer conditionals.
-The channel state extraction does not change from how it works now, except that care has to be taken to mask unused bits in the last channel state byte, if any.
+The channel state extraction does not change from how it works now, except that care has to be taken to mask unused bits in the last channel state byte, if any. However, in the same vein as above, it would make sense to introduce inline helper functions to simplify indexed access to single channels.
 Optionally, an extension to the protocol decoder Python API may allow decoders to process unexpanded RLE data. This could speed up decoding quite significantly in some situations.

Difference between revisions of "User:Danielk/Simple Internal Compression Proposal"

Latest revision as of 14:50, 5 July 2015

Goal

Proposed Format

API Thoughts

Navigation menu

Search