Bug 762 - sigrok-cli performance for fx2lafw use-case worse than for PulseView
Summary: sigrok-cli performance for fx2lafw use-case worse than for PulseView
Status: RESOLVED FIXED
Alias: None
Product: sigrok-cli
Classification: Unclassified
Component: Acquisition (show other bugs)
Version: unreleased development snapshot
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
: 1565 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-02-27 22:27 CET by Uwe Hermann
Modified: 2021-07-21 03:39 CEST (History)
11 users (show)



Attachments
sigrok-cli debug output (log level set to "-l 5") (22.93 KB, text/plain)
2018-11-28 10:13 CET, Matthias Weber
Details
Logfile of failed CLI attempt (86.28 KB, text/plain)
2021-07-21 03:38 CEST, Matt A
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uwe Hermann 2016-02-27 22:27:54 CET
Using an FX2 based LA to sample, the duration you can sample for until the inevitable acquisition stop (when the first sample is lost by the hardware) seems a lot worse when using sigrok-cli than for the PulseView case.

Details:
http://article.gmane.org/gmane.comp.debugging.sigrok.devel/1971

Some measurements for reproducability:


sigrok-cli, Linux (16ch FX2, 5 runs, all aborted prematurely):

$ sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 590848 samples.
Device only sent 301056 samples.
Device only sent 4813568 samples.
Device only sent 3483810 samples.
Device only sent 3183616 samples.


PulseView, Linux (16ch FX2, 5 runs each; all runs work fine, no premature abortion):

* samplerate 1mhz,  200m samples: 1:24min, 5x same result
* samplerate 4mhz,  200m samples: 21s, 5x same result
* samplerate 8mhz,  200m samples: 10.45s, 5x same result
* samplerate 12mhz, 200m samples: 10.42s, 5x same result


sigrok-cli, Linux (8ch FX2, 5 runs each, all aborted prematurely):

$ sigrok-cli -d fx2lafw -c samplerate=1m --samples 200m -o foo.sr
Device only sent 2800640 samples.
Device only sent 2816000 samples.
Device only sent 2785280 samples.
Device only sent 2877440 samples.
Device only sent 2954240 samples.

$ sigrok-cli -d fx2lafw -c samplerate=4m --samples 200m -o foo.sr
Device only sent 1092096 samples.
Device only sent 10334464 samples.
Device only sent 10010880 samples.
Device only sent 10900736 samples.
Device only sent 11325440 samples.

$ sigrok-cli -d fx2lafw -c samplerate=8m --samples 200m -o foo.sr
Device only sent 19975424 samples.
Device only sent 19211776 samples.
Device only sent 18407936 samples.
Device only sent 2767360 samples.
Device only sent 1436160 samples.

$ sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 2888448 samples.
Device only sent 26410240 samples.
Device only sent 1143552 samples.
Device only sent 291328 samples.
Device only sent 1251584 samples.


PulseView, Linux (8ch FX2, 5 runs each; all runs work fine, no premature abortion):

* samplerate 1mhz,  200m samples: 3:20 min, 5x same result
* samplerate 4mhz,  200m samples: 42s, 5x same result
* samplerate 8mhz,  200m samples: 25s, 5x same result
* samplerate 12mhz, 200m samples: 16.6s, 5x same result
* samplerate 16mhz, 200m samples: 12.5s, 5x same result
* samplerate 24mhz, 200m samples: 8s, 5x same result


PulseView, Windows 10, 64bit (8ch FX2, 5 runs each; all runs work fine, no premature abortion):

* samplerate 1mhz,  200m samples: 3:20 min, 5x same result
* samplerate 4mhz,  200m samples: 50s, 5x same result
* samplerate 8mhz,  200m samples: 25s, 5x same result
* samplerate 12mhz, 200m samples: 16.6s, 5x same result
* samplerate 16mhz, 200m samples: 12.5s, 5x same result
* samplerate 24mhz, 200m samples: 8s, 5x same result


sigrok-cli, Windows 10, 64bit (8ch FX2, 5 runs each, almost all aborted prematurely):

$ sigrok-cli -d fx2lafw -c samplerate=1m --samples 200m -o foo.sr
Device only sent 10905600 samples.
Device only sent 10946560 samples.
Device only sent 10905600 samples.
Device only sent 10905600 samples.
Device only sent 10926080 samples.

$ sigrok-cli -d fx2lafw -c samplerate=4m --samples 200m -o foo.sr
Device only sent 41863680 samples.
Device only sent 41944576 samples.
Device only sent 42025472 samples.
Device only sent 41904128 samples.
Device only sent 42146816 samples.

$ sigrok-cli -d fx2lafw -c samplerate=8m --samples 200m -o foo.sr
Device only sent 80303616 samples.
Device only sent 79419392 samples.
Device only sent 80464384 samples.
Device only sent 79419392 samples.
Device only sent 80625152 samples.

$ sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 115868160 samples.
Device only sent 116229120 samples.
Device only sent 116229120 samples.
Device only sent 116229120 samples.
Device only sent 115266560 samples.

$ sigrok-cli -d fx2lafw -c samplerate=16m --samples 200m -o foo.sr
Device only sent 150320128 samples.
Device only sent 149518848 samples.
Device only sent 149999616 samples.
Device only sent 147756032 samples.
Device only sent 149358592 samples.

# No premature abortions for this setup.
$ sigrok-cli -d fx2lafw -c samplerate=24m --samples 200m -o foo.sr
OK
OK
OK
OK
OK


The reason is likely to be that sigrok-cli doesn't buffer any data, but rather saves off to .sr directly (which takes some time and could cause samples to be lost), while PulseView keeps all data in RAM (which requires less time). Saving to .sr happens from the data that's already in RAM at a later point in time.

This is not a Windows-only issue, it's likely happening on all OSes; might check Mac OS X or FreeBSD later.
Comment 1 SL 2017-02-25 09:09:45 CET
I think this issue has been hitting me hard. I was a bit surprised that a very decent laptop, with a fast SSD etc would not be able to cope with a 500KHz sampling for a few seconds. CPU usage is low, so there was plenty of room for sigrok-cli to use as much resources it needed. But still...

So I have an arduino that I want to monitor, I started patching my script to test whether sigrok-cli was showing a "Device only sent XXX samples" and try up to 10 attempts of the same test case. Failure rate is really high, I barely ever get a proper capture on the cli, while pulseview works perfectly fine.

But I found a (ugly) workaround, and that is why I'm adding this comment: I am now running my script from /dev/shm. I have my dump at each first attempt of any test case.

This is platform dependent, so buffering is still the way to go for a proper fix.

Thank you
--
Sylvain
Comment 2 Matthias Weber 2018-11-28 10:13:13 CET
Created attachment 495 [details]
sigrok-cli debug output (log level set to "-l 5")
Comment 3 Matthias Weber 2018-11-28 10:13:43 CET
Hey all,

we've been able to reproduce the issue with sigrok-cli 0.8.0 running under Window 10 (Version 1803). We've tried with two different CPUs (Intel Xeon E3-1240 v5 @ 3.50GHz; Intel i7 4810MQ), both equipped with 16 GBytes.

We're also using an FX2 based LA (Hantek 6022BE) to sample at 24 MHz. When using the sigrok-cli this leads to a loss of samples. When using Pulseview instead, no samples are lost (even for an acquisition time of several seconds).

Running the test several times, the sampling aborts with the exact same number of samples. We've tried two different scenarios:


Scenario A): Dumping the data to the hard disk drive

C:\Program Files (x86)\sigrok\sigrok-cli>sigrok-cli.exe -d fx2lafw -c samplerate="24 MHz" --time 10s > D:\Test.txt
Device only sent 8404480 samples.

C:\Program Files (x86)\sigrok\sigrok-cli>sigrok-cli.exe -d fx2lafw -c samplerate="24 MHz" --time 10s -o D:\Test.txt
Device only sent 7684096 samples.


Scenario B): Dumping the data to a RAM disk (created with SoftPerfect RAM Disk V3.4.8)

sigrok-cli.exe -d fx2lafw -c samplerate="24 MHz" --time 10s > A:\Test.txt
Device only sent 10565632 samples.

sigrok-cli.exe -d fx2lafw -c samplerate="24 MHz" --time 10s -o A:\Test.txt
Device only sent 42742784 samples.


It can be seen that using the RAM disk gets us slightly more samples but still leads to data loss. The process priority has been set to 'real time' mode via the Windows task manager which did not show any positive effects.

Please find attached a log file recorded with the '-l 5' log level. It shows that after some time the fx2lafw receive_transfer() function states LIBUSB_TRANSFER_CANCELLED or LIBUSB_TRANSFER_TIMED_OUT and returns with 0 bytes.

Do you have an idea how we can fix this issue or provide further help to narrow it down?

Is there a way to automatically trigger Pulseview instead of using sigrok-cli (as no data is lost when recording with Pulseview)? What's the difference in the handling of recordings between sigrok-cli and Pulseview?

Kind regards,
Matthias
Comment 4 Michael Tandy 2019-02-08 19:58:29 CET
Greetings, fellow Sigrok users!

I got the 'Device only sent ?? samples" problems when I ran:

sigrok-cli --driver=fx2lafw --config samplerate=4M --time 600000 --output-format srzip --output-file sigrok-recording-10mins.sr

But I didn't get the same problem when I used the less compressed vcd format:

sigrok-cli --driver=fx2lafw --config samplerate=4M --time 600000 --output-format vcd --output-file sigrok-recording-10mins.vcd

If you're affected by this problem, perhaps you could try capturing to VCD and see if that improves the situation?
Comment 5 Valentin Robl 2019-02-19 13:29:28 CET
(In reply to Michael Tandy from comment #4)
> Greetings, fellow Sigrok users!
> 
> I got the 'Device only sent ?? samples" problems when I ran:
> 
> sigrok-cli --driver=fx2lafw --config samplerate=4M --time 600000
> --output-format srzip --output-file sigrok-recording-10mins.sr
> 
> But I didn't get the same problem when I used the less compressed vcd format:
> 
> sigrok-cli --driver=fx2lafw --config samplerate=4M --time 600000
> --output-format vcd --output-file sigrok-recording-10mins.vcd
> 
> If you're affected by this problem, perhaps you could try capturing to VCD
> and see if that improves the situation?

Hi Michael, thanks for your solution. 
Now it's working fine, with the *.vcd Outputformat.

Best regards

Valentin
Comment 6 Marek Wodzinski 2019-02-21 16:12:11 CET
@Valentin - this is only workaround, so I'm reopening this bug.

It also works when using protocol decoders and pipe output to file.
Problem is only when saving to sr file.
And this is regression as it worked at full speed in sigrok-cli 2.0.
Comment 7 Soeren Apel 2020-06-27 12:48:02 CEST
*** Bug 1565 has been marked as a duplicate of this bug. ***
Comment 8 Dominik Wójt 2020-07-06 21:08:52 CEST
I am affected by the same issue. Thanks to Michaels suggestion I can get my registrarions done. I also tried some comparison between the two cases with even more verbose logs.

command:
LIBUSB_DEBUG=5 ~/local/sigrok/bin/sigrok-cli --driver fx2lafw --config samplerate=1000000 --time 60s --output-file terefere.sr --output-format srzip --loglevel 5 2>&1 | tee log_sr.txt
log:
https://domin144.pl/sigrok/log_sr.txt.gz

command:
LIBUSB_DEBUG=5 ~/local/sigrok/bin/sigrok-cli --driver fx2lafw --config samplerate=1000000 --time 60s --output-file terefere.vcd --output-format vcd --loglevel 5 2>&1 | tee log_vcd.txt
log:
https://domin144.pl/sigrok/log_vcd.txt.gz

The sr version broke at about 27 seconds. The vcd one finished the 60 second registration.

Comparing the logs with meld, I can see the flow up to the error point is different in the two cases, but I guess both are correct. I noticed that for about 19 seconds the timestamps of beginnings of each transfer match pretty well (diff < 20 ms). Later, however, they gradually drift away from each other. At 26 seconds the diff is already greater than 300 ms!

I don't really think the CPU is a bottle neck here, nor is disk performance. My blind guess (I haven't read code yet) would be, that the transfer times are calculated as intervals between each other, rather than at fixed time points from the beginning of the registration. If the time needed for storing the output get between the timeout intervals, it would put gradual misalignment between the sampling and transfers.

Please bear with me, if this is nonsense. I am new to both signal capturing and sigrok. Maybe these observations will at least point you to some other conclusion.
Comment 9 cand 2020-07-07 08:39:10 CEST
Dominik,
See https://sigrok.org/bugzilla/show_bug.cgi?id=1565#c3

Binary format works much better than vcd. To me the cause is clear: srzip and vcd formats are fully rewritten for each packet, while binary is only appended. The cpu usage increases quickly as the file sizes increase, especially for zipping, while for binary it stays constant.
Comment 10 Dominik Wójt 2020-07-07 11:18:56 CEST
I should have checked the other bug. Thanks!
Comment 11 Dominik Wójt 2020-07-08 18:09:30 CEST
I did some investigation. It absolutely confirms your diagnosis.

sigrok-cli does write for every packet received (10240 in my case). In contrast PulseView stores all the data to memory, and writes it on request in chunks of fixed size - 10MB.

srzip output is stored in a zip file. Each write is stored in a separate file in a zip file. As result PulseView can store the same registration as lets say 5 files, while sigrok-cli as 5000 files.

Srzip output is currently closing the output zip file after every transfer. Libzip library is used, which I found rewrites the whole zip at every modification (hopefully it does not recompress the non-modified files - haven't checked that). Appending in place is not supported. Also when multiple files are added to zip, the compression is done on file save.

One option would be to keep the zip file open, but then libzip would do all the compression at the very end. I did and experiment, it took about 6 s for a 60 second registration.

Another option is to switch to minizip, which does the compression on the go. A disadvantage is, it will break whole file it is not closed properly (with libzip one could usually the changes). I did an experiment on this[1]. The registration works nicely, with low CPU overhead and no lag at end even with 24 MHz. The output file is unusable it the registration is interrupted (CTRL-C).

Any ideas how to fix this?

[1] https://github.com/domin144/libsigrok/commits/srzip_minizip
Comment 12 Gerhard Sittig 2020-07-16 10:48:27 CEST
See bug 974 for a report with a shorter history, immediately starting at 
the output/srzip bottleneck. Don't want to mark either of the reports as 
a duplicate of the other, but wanted to link them to each other.

Also was not aware of this report, which is why I missed the comment 11 
change, and worked on the issue at a similar time independently. Yet tried 
to keep the existing ZIP operations, just reduced their call frequency by 
introducing a buffer between feed packets and the ZIP archive.

Could I get some test and feedback on my WIP branch which addresses the issue 
in the output/srzip module? Does it work for you? Is the output data correct? 
Changed execution time from 4h to 12s here for a degraded scenario. :)

https://repo.or.cz/libsigrok/gsi.git/shortlog/refs/heads/vcd-analog-v7
git://repo.or.cz/libsigrok/gsi.git vcd-analog-v7
Comment 13 Dominik Wójt 2020-07-16 11:58:29 CEST
@Gerhard: I confirm your changes fix the problem in my case. Tested with 24MHz registration. The resulting zip file is also smaller and has less files inside. Data looks correct in PulseView.

The buffering is definitely an improvement. The change is independent from the change of zip library.

As long as we stick with libzip, the write will not be incremental (on each append a new zip file will be written), but with buffering it should not be an issue until very large files are needed. E.g. if we appended 2 MB data to 500 MB file libzip will rewrite the 500 MB. I did not need to store that large files, so for me it stays a hypothetical issue. Given the data loss on interruption, switch to minizip should be dropped for now.

Summing up: I think the buffering should go in. Switch to minizip should be dropped for now and considered again, if issues with large files are hit.
Comment 14 Gerhard Sittig 2020-07-24 12:15:17 CEST
The output/srzip chunking issue got addressed in libsigrok c03aaf342c3f. The 
lack of threading for acquisition and storage currently remains in sigrok-cli. 
Feel free to either close the report or keep it open, as you see fit.
Comment 15 Dominik Wójt 2020-08-05 15:39:59 CEST
I tried the latest libsigrok master (b81cfbc349a12e28f8212110699652b7e2d7aec4). My tests pass including the 24MHz case. I consider the issue solved.
Comment 16 Andrii 2020-09-17 14:45:50 CEST
Hello
I'm using sigrok-cli to capture signals 24/7.
For this I need files at least 10min each, not less. 
But because of "Device only sent ??? samples" issue it's able to capture just ~3sec files, which is unacceptable. Using 'sigrok-cli 0.7.1' - the latest stable version available from package repository.
Using sigrok installed on Raspberry PI:

pi@raspberrypi:~ $ sudo apt show sigrok-cli
Package: sigrok-cli
Version: 0.7.1-1
Priority: optional
Section: electronics
Maintainer: Debian Electronics Packaging Team <pkg-electronics-devel@alioth-lists.debian.net>
Installed-Size: 168 kB
Depends: libc6 (>= 2.7), libglib2.0-0 (>= 2.32.0), libsigrok4, libsigrokdecode4
Breaks: sigrok (<< 0.2-1)
Replaces: sigrok (<< 0.2-1)
Homepage: http://sigrok.org/wiki/Sigrok-cli
Download-Size: 78.7 kB
APT-Manual-Installed: no
APT-Sources: http://raspbian.raspberrypi.org/raspbian buster/main armhf Packages


pi@raspberrypi:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"


Have the same issue on local Windows while using latest stable 'sigrok-cli 0.7.1'. But after installing NIGHTLY build which contains 'sigrok-cli 0.8.0-git-0e2f95c' issue is solved and I can capture 10+ min files.

Was trying to install NIGHTLY build to Raspberry PI:
1) install unstable version of sigrok as described here https://tech.borpin.co.uk/2019/12/17/install-a-package-from-the-testing-repository/
But looks like there is available only sigrok 0.2-5 (with sigrok-cli 0.7.1)
So it doesn't help

2) sigrok-cli-NIGHTLY-i686.AppImage doesn't work on Raspberry as well

3) was not able to build from source on Raspberry as well, too many dependencies and not sure if it's possible at all

So could you please advice me how to install sigrok-cli 0.8.0-git-0e2f95c + on Raspberry PI?
Is it planned to release new stable version which include the fix?

Thanks
Comment 17 Soeren Apel 2020-09-18 13:11:45 CEST
The issue discussed by this bug report has been fixed, closing the bug report again.

We currently don't provide arm binaries, so it's up to your distro how to handle updates. The component you want to be up-to-date here is libsigrok but unfortunately your report doesn't include any version information about it.

Either way, please check with your distro.
Comment 18 Matt A 2021-07-21 03:38:30 CEST
Created attachment 742 [details]
Logfile of failed CLI attempt
Comment 19 Matt A 2021-07-21 03:39:23 CEST
Hello All,

I am seeing this same issue on a Raspberry Pi with a fx2lafw based logic analyzer. 

Using the sr or srzip format I can capture data for approximately 5s before the capture fails, see the attached log(s) for details.

My software versions are as follows:

sigrok-cli configuration summary:
 - Package version................. 0.8.0-git-d4b0eec
 - Prefix.......................... /usr/local
 - Building on..................... armv7l-unknown-linux-gnueabihf
 - Building for.................... armv7l-unknown-linux-gnueabihf

Compile configuration:
 - C compiler...................... gcc
 - C compiler version.............. gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
 - C compiler flags................ -g -O2
 - Additional C compiler flags..... -std=c99
 - C compiler warnings............. -Wall -Wextra
 - Linker flags....................

Detected libraries (required):
 - glib-2.0 >= 2.32.0.............. 2.58.3
 - libsigrok >= 0.5.0.............. 0.6.0-git-b81cfbc3

Detected libraries (optional):
 - libsigrokdecode >= 0.5.0........ no

Examples:
pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 34170880 samples.
pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 26711040 samples.
pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 36096000 samples.
pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 34892800 samples.
pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m -o foo.sr
Device only sent 35975680 samples.

If I change the file format to VCD I see the following

pi@voidstar:~/sigrok/sigrok-cli $ ./sigrok-cli -d fx2lafw -c samplerate=12m --samples 200m --output-format vcd -o foo.vcd
pi@voidstar:~/sigrok/sigrok-cli $

Am I missing something with my built software packages? I built both from master and am still seeing the issues described in this thread. I've attached a log file (test.log) from a failed attempt as well.