Created attachment 157 [details] Log of running sigrok-cli 9 times with different sample rates I have a generic FX2LP-based Saleae clone logic analyzer. It works with sigrok and the fx2lafw firmware, but at higher sample rates, it will work on the first run, but not on subsequent runs. For example, if I capture at 1 MHz it works every time, but at 24 MHz it will work the first time but not capture any data the next time. Furthermore, if I capture at 24 MHz and try 1 MHz next, it will fail, but then work for later invocations. Debug output is attached. I ran sigrok-cli 9 times in a row: capture @ 1 MHz -> ok capture @ 1 MHz -> ok capture @ 24 MHz -> ok capture @ 24 MHz -> failed capture @ 24 MHz -> failed capture @ 24 MHz -> failed capture @ 1 MHz -> failed capture @ 1 MHz -> ok capture @ 1 MHz -> ok So a workaround seems to be to just run a dummy capture at a low sample rate before running the real capture at a high sample rate.
Created attachment 178 [details] log of running sigrok-cli multiple times on fx2lafw First attachment, confirming the same behaviour Jim reported
Created attachment 179 [details] log of running sigrok-cli multiple times on saleae-logic16 Second attachment - showing the exact same behavior on the Logic 16 (12.5MHz is max rate with 16 channels enabled) Although they both use the fx2, i think this indicates that the problem is not in sigrok-firmware-fx2law but further up the stack. It's also not isolated to sigrok-cli, since the same problem affects pulseview. sigrok-cli 0.6.0-git-1ef118a Using libsigrok 0.4.0-git-211cc97 (lib version 2:0:0). Using libsigrokdecode 0.4.0-git-76a4498 (lib version 2:0:0).
I see these problems, too. latest sources from the sigrok repos, fedora 22, Saleae clone and original USBeeSX
*** Bug 904 has been marked as a duplicate of this bug. ***
I spent some time looking into this, so I'll dump what I learned: 1) The bug appears to be related to the number of simulations bulk transfers created by libsigrok. At higher frequencies, more transfers are used, and that make it more likely to trigger the bug (in firmware?). As a work-around, but not a fix, I reduced the number of NUM_SIMUL_TRANSFERS, and the issue is mostly gone. Still maybe 1/20 transfers at 24mhz fail with no data. diff --git a/src/hardware/fx2lafw/protocol.h b/src/hardware/fx2lafw/protocol.h index 1403c8ab..688df617 100644 --- a/src/hardware/fx2lafw/protocol.h +++ b/src/hardware/fx2lafw/protocol.h @@ -36,8 +36,8 @@ #define NUM_TRIGGER_STAGES 4 #define MAX_RENUM_DELAY_MS 3000 -#define NUM_SIMUL_TRANSFERS 32 -#define MAX_EMPTY_TRANSFERS (NUM_SIMUL_TRANSFERS * 2) +#define NUM_SIMUL_TRANSFERS 4 +#define MAX_EMPTY_TRANSFERS (NUM_SIMUL_TRANSFERS * 4) #define NUM_CHANNELS 16 2) It appears that the high number of simulations transfers triggers a bug in sigrok-firmware-fx2lafw that leaves the device in a bad state when a transfer ends. I'm mostly sure that bug at then of of a transfer, as opposed to triggering a bug at the start of transfers. Verified by repeated testing transfers alternating between cable unplugs plus patched and unpatched libsigrok. e.g. running 1 capture with patched binary followed by 1 from unpatched binary failed at same 1/20 rate as patch did in pulseview. The opposite did not. 3) I've tired to dig through sigrok-firmware-fx2lafw, but there's a lot of datasheet to get through; here's what I ended up with: diff --git a/gpif-acquisition.c b/gpif-acquisition.c index 7d3dcb6..f63b8b8 100644 --- a/gpif-acquisition.c +++ b/gpif-acquisition.c @@ -263,7 +263,7 @@ void gpif_poll(void) SYNCDELAY(); /* Reset EP2. */ - FIFORESET = 0x02; + FIFORESET = 0x82; SYNCDELAY(); /* Return to auto mode. */ Post this match I've seen zero instances of no data. But I do still see some buffer overruns in firmware stop gpif workflow in first second or so, if it gets past the first few seconds it works great. On this patch; TRM Ref F: has an example for resetting auto-out endpoints, and it matches exactly gpif-acquisition.c before my patch. But... could it be a typo? Why would you write 0x02 which would release NAK ALL, and then two lines later release NAK ALL again? Presumably its not safe to release NAK ALL until the endpoint is back in auto-in?
Created attachment 378 [details] Don't release NACK ALL until back in AUTOIN mode. I _think_ that writing 0x02 releases the NACK ALL before putting the device back in auto mode. This could result in one or more bulk in from host stealing buffer from endpoint being reset?
Hi, as far as we can tell/test this issue should be fixed in cfeb1a3602f587eb981c4ce1f9ff368997f4e0e0, thanks a lot to Stefan Bruens, Piotr Esden-Tempski and others for investigating, testing and fixing the issue! Closing this issue for now, but please everyone test this new version on as many different PCs and laptops as possible, since previous iterations of the patches have shown issues on some boxes but worked fine on others. If this specific issue is still happening for some systems, please feel free to reopen the bug.