Bug 1551 - Rigol DS-1054Z single channel short acquisition
Summary: Rigol DS-1054Z single channel short acquisition
Status: CONFIRMED
Alias: None
Product: libsigrok
Classification: Unclassified
Component: Driver: rigol-ds (show other bugs)
Version: 0.5.2
Hardware: All Mac OS X
: Normal normal
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-30 12:17 CEST by Thomas Santana
Modified: 2021-04-30 08:52 CEST (History)
2 users (show)



Attachments
Session saved. (554 bytes, application/zip)
2020-05-30 12:17 CEST, Thomas Santana
Details
./sigrok-cli -l 5 -d rigol-ds --channels CH1 --frames 1 -o sample.s (13.18 KB, text/plain)
2020-05-31 15:33 CEST, Thomas Santana
Details
Capture After patching usb-driver (1.27 KB, application/zip)
2020-06-01 12:04 CEST, Thomas Santana
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Santana 2020-05-30 12:17:52 CEST
Created attachment 628 [details]
Session saved.

Since I'm new to Sigrok and the DS I'm assuming this could be user error. But I have been trying to capture signal with very limited success.

The best I managed was to read some pulse of the calibration signal, with 986 us from a single channel. If I hook up both channels I can see one or the other. But never both. I'm using the USB interface.

Most changes on the UI are reflected in the scope. 
If I look in the log (level 3):
 
sr: session: Starting.
pv: "Timing": "No input data"
"Timing": Error cleared
rigol-ds: Error while reading block data, aborting capture.
sr: session: Stopped.
pv: Acquisition took 0.38 s

Log at level 4 (without the srd lines):

sr: session: Using thread-default main context.
sr: session: Starting.
sr: hwdriver: rigol-ds: Starting acquisition.
sr: rigol-ds: handling channel CH1
sr: rigol-ds: handling channel CH2
sr: rigol-ds: handling channel CH3
sr: rigol-ds: handling channel CH4
sr: session: bus: Received SR_DF_HEADER packet.
sr: rigol-ds: Starting data capture for frameset 1
sr: session: bus: Received SR_DF_FRAME_BEGIN packet.
sr: rigol-ds: Starting reading data from channel 1
pv: "Timing": "No input data"
sr: rigol-ds: New block header expected
sr: rigol-ds: Received data block header: '#9000001200' -> block length 1200
sr: rigol-ds: Requesting read of 1200 bytes
sr: rigol-ds: Received 41 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (41 samples).
sr: rigol-ds: 41 of 1200 block bytes read
sr: rigol-ds: Requesting read of 1159 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 105 of 1200 block bytes read
"Timing": Error cleared
sr: rigol-ds: Requesting read of 1095 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 169 of 1200 block bytes read
sr: rigol-ds: Requesting read of 1031 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 233 of 1200 block bytes read
sr: rigol-ds: Requesting read of 967 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 297 of 1200 block bytes read
sr: rigol-ds: Requesting read of 903 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 361 of 1200 block bytes read
sr: rigol-ds: Requesting read of 839 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 425 of 1200 block bytes read
sr: rigol-ds: Requesting read of 775 bytes
sr: rigol-ds: Received 64 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (64 samples).
sr: rigol-ds: 489 of 1200 block bytes read
sr: rigol-ds: Requesting read of 711 bytes
rigol-ds: Error while reading block data, aborting capture.
sr: session: bus: Received SR_DF_FRAME_END packet.
sr: hwdriver: rigol-ds: Stopping acquisition.
sr: session: bus: Received SR_DF_END packet.
sr: session: Stopped.
pv: Acquisition took 0.37 s

This gives me a very short sample on a single channel. Also I have no sample rate dropdown. 

Before I decided to open this, I read a bunch of ticket. So it seems there may be several problem (like no Sampling rate in the driver). I haven't installed the CLI but I am willing to try. 

I tried to find where this line is emitted: 
  rigol-ds: Error while reading block data, aborting capture.

But could not find it. I was considering install sigrok-cli and maybe get a linux box up to test it. 

Version:

Versions, libraries and features:
PulseView 0.4.2
Qt 5.5.1
glibmm 2.56.0
Boost 1_67
libsigrok 0.5.2/5:1:1 (rt: 0.5.2/5:1:1)
- glib 2.56.1 (rt: 2.56.1/5601:1)
- libzip 1.5.1
- libserialport 0.1.1/1:0:1 (rt: 0.1.1/1:0:1)
- libusb-1.0 1.0.22.11312 API 0x01000106
- hidapi 0.8.0-rc1
- libftdi 1.4
- Host x86_64-apple-darwin15.6.0, little-endian
- SCPI backends TCP, RPC, serial, USBTMC
libsigrokdecode 0.5.3/6:1:2 (rt: 0.5.3/6:1:2)
- glib 2.56.1 (rt: 2.56.1/5601:1)
- Python 3.7.0 / 0x30700f0 (API 1013, ABI 3)
- Host x86_64-apple-darwin15.6.0, little-endian
Comment 1 Gerhard Sittig 2020-05-30 12:50:12 CEST
Can you retry with a recent version (nightly build)? To see if it got fixed 
in the meantime, or still affects a current development version. Thank you!

Did you check the bug tracker database for other rigol-ds related issues? 
Perhaps even previously closed items. There may have been something with 
enabling digital channels which kind of unbreaks analog channels, or so 
(don't have first hand experience myself, hence the guessing).

In the past the PC side often could not "solve" issues in Rigol firmware but 
could only deal with the situation as it is, or work around known issues.
Comment 2 Soeren Apel 2020-05-30 14:20:05 CEST
You could also try connecting via ethernet and see if that behaves better. Could be a USB issue.

https://sigrok.org/wiki/Connecting_to_a_scope_via_LXI
Comment 3 Thomas Santana 2020-05-30 16:51:11 CEST
I tried the nightly on my Mac but I think something is not ok in the build (launched it both from the DMG and terminal and I got):

Contents/MacOS/pulseview
dyld: Library not loaded: /usr/local/opt/gettext/lib/libintl.8.dylib
  Referenced from: /Volumes/PulseView NIGHTLY 1/PulseView.app/Contents/Frameworks/Python.framework/Versions/3.7/Python
  Reason: image not found
Abort trap: 6

With the stable release, I also attempted via TCPIP and things are different. It sampled 2.3 ms on both channels. But still no Sampling Rate drop down.

Log on TCP capture:

sr: session: Using thread-default main context.
sr: session: Starting.
sr: hwdriver: rigol-ds: Starting acquisition.
sr: rigol-ds: handling channel CH1
sr: rigol-ds: handling channel CH2
sr: rigol-ds: handling channel CH3
sr: rigol-ds: handling channel CH4
sr: session: bus: Received SR_DF_HEADER packet.
sr: rigol-ds: Starting data capture for frameset 1
sr: session: bus: Received SR_DF_FRAME_BEGIN packet.
pv: "Timing": "No input data"
sr: rigol-ds: Starting reading data from channel 1
sr: rigol-ds: New block header expected
sr: rigol-ds: Received data block header: '#9000001200' -> block length 1200
sr: rigol-ds: Requesting read of 1200 bytes
sr: rigol-ds: Received 1200 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (1200 samples).
sr: rigol-ds: Block has been completed
sr: rigol-ds: Starting reading data from channel 2
"Timing": Error cleared
sr: rigol-ds: New block header expected
sr: rigol-ds: Received data block header: '#9000001200' -> block length 1200
sr: rigol-ds: Requesting read of 1200 bytes
sr: rigol-ds: Received 1200 bytes.
sr: session: bus: Received SR_DF_ANALOG packet (1200 samples).
sr: rigol-ds: Block has been completed
sr: rigol-ds: Read should have been completed
sr: session: bus: Received SR_DF_FRAME_END packet.
sr: hwdriver: rigol-ds: Stopping acquisition.
sr: session: bus: Received SR_DF_END packet.
sr: session: fd_source_finalize: key 0x15
sr: session: Stopped.
pv: Acquisition took 0.00 s


From the looks of it the capture is smoother over IP. However I will need to get a better setup for IP (I have only one cable and in a poor location). USB would be better. 

I think I'll need to build the images on my machine. 

Just to confirm my assumption if the driver is working correctly I should be able to set sample rate and duration (as in https://sigrok.org/doc/pulseview/unstable/images/pv_after_startup.png)?
Comment 4 Thomas Santana 2020-05-30 22:45:49 CEST
I tried to run Nightly on a Linux VM on my Mac. For this I only tried the USB connection, but the result are the same, single channel short interval no Rate dropdown. I tried to build on the Mac, but I did not have much success, is there a specific set of instruction on how to setup my machine for building?

My impression is that I have to issues:

* Somehow I think the Rate is caused by something like https://sigrok.org/bugzilla/show_bug.cgi?id=1076 
* On USB there is a failure while reading the packages that abort the capture
* On IP I don't have the sample rate, thus can't capture longer sample sets.

Again I'm willing to try to build it. A proper linux setup and being able to test over IP may take a while to happen.
Comment 5 Thomas Santana 2020-05-31 15:33:32 CEST
Created attachment 629 [details]
./sigrok-cli -l 5 -d rigol-ds --channels CH1 --frames 1 -o sample.s
Comment 6 Thomas Santana 2020-05-31 17:52:32 CEST
I managed to get to compile both libsigrok and sigrok-cli on my Mac. And captured a few things:

/sigrok-cli  —version
sigrok-cli 0.8.0-git-0171a4a

Libraries and features:
- libsigrok 0.6.0-git-7718f3ca/4:0:0 (rt: 0.6.0-git-7718f3ca/4:0:0).
 - Libs:
  - glib 2.64.3 (rt: 2.64.3/6403:3)
  - libzip 1.6.1
  - libserialport 0.1.1/1:0:1 (rt: 0.1.1/1:0:1)
  - libusb-1.0 1.0.23.11397 API 0x01000107
  - Host: x86_64-apple-darwin19.4.0, little-endian.
  - SCPI backends: TCP, RPC, serial, USBTMC.

See what it can scan...


 ./sigrok-cli  -l 3 —scan
sr: scpi_usbtmc: USBTMC bulk in transfer error: LIBUSB_ERROR_TIMEOUT.
sr: scpi_usbtmc: USBTMC bulk in transfer error: LIBUSB_ERROR_TIMEOUT.
sr: scpi-pps: Couldn’t get IDN response.
sr: serial-libsp: Error opening port (2): No such file or directory.
sr: serial-libsp: Error opening port (2): No such file or directory.
The following devices were found:
demo - Demo device with 13 channels: D0 D1 D2 D3 D4 D5 D6 D7 A0 A1 A2 A3 A4
rigol-ds - Rigol DS1054Z 00.04.04.SP4 [S/N: DS1ZA220801351] with 4 channels: CH1 CH2 CH3 CH4
GLib (gthread-posix.c): Unexpected error from C library during ‘pthread_mutex_lock’: Invalid argument.  Aborting.
Abort trap: 6

The pthread_mutex happens in other commands. Also funny the LIBUSB_ERROR_TIMEOUT.


./sigrok-cli -d rigol-ds --show
Driver functions:
    Oscilloscope
Scan options:
    conn
    serialcomm
rigol-ds - Rigol DS1054Z 00.04.04.SP4 [S/N: DS1ZA220801351] with 4 channels: CH1 CH2 CH3 CH4
Channel groups:
    CH1: channel CH1
    CH2: channel CH2
    CH3: channel CH3
    CH4: channel CH4
Supported configuration options across all channel groups:
    limit_frames: 
    samplerate
    timebase - supported values:
      5 ns
      10 ns
      20 ns
      50 ns
      100 ns
      500 ns
      1 us
      2 us
      5 us
      10 us
      20 us
      50 us
      100 us
      200 us
      500 us
      1 ms
      2 ms
      5 ms
      10 ms (current)
      20 ms
      50 ms
      100 ms
      200 ms
      500 ms
      1 s
      2 s
      5 s
      10 s
      20 s
      50 s
    num_hdiv
    horiz_triggerpos: 
    triggersource: CH1 (current), CH2, CH3, CH4, EXT, AC Line
    triggerslope: r (current), f
    triggerlevel: 14.400000 (current)
    data_source: Live (current), Memory, Segmented
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_mutex_lock': Invalid argument.  Aborting.
Abort trap: 6


I tried a couple of captures: 

./sigrok-cli -d rigol-ds --samples 100 --channels CH1
Failed to configure sample limit.

./sigrok-cli -d rigol-ds --time 1000 
sr: hwdriver: Option 'limit_samples' not available for this device instance.
Failed to configure time-based sample limit.


I'm attaching the log output of ./sigrok-cli -l 5 -d rigol-ds --channels CH1 --frames 1 -o sample.sr

Result is the same. But now I can attempt to add more logging. Have to admit it's been year since I used gdb. Is there any way to build more instrumentation into the USB part of the code? 

I'm also ok to split the bug ticket if necessary.
Comment 7 Thomas Santana 2020-05-31 18:10:56 CEST
After adding a bunch of debug statements to the libsigrok the error seems to come from the following:

When the last scpi_usbtmc_bulkin_start is called, the following happens: 
    1. libusb_bulk_transfer returns 0 bytes read and transferred == 0
    2. Call to usbtmc_bulk_in_header_read has header bytes 0x02:0x52:0xad with message size 0x7fffffff. This seems to comply to the spec since it's > 0.
    3. Then message_size += USBTMC_BULK_HEADER_SIZE; overflows to negative
    4. The final return value is negative and thus an Error.

I tried bailing out when ret==0 but that just caused another error.

The feeling I have is that we are reading beyond the EOM. But the bulkin_attribute == 0. So I feel it's not an end of message.

The Rigol says it has 1200 samples:

rigol-ds: Received data block header: '#9000001200' -> block length 1200

But the first use message has HEADER: 0x02:0x51:0xae 0x1f4 attrib 0
So it has 500 bytes available. This kind of makes sense sense usblib seems to have 512 as default message size. 


Unfortunately I out of ideas...
Comment 8 Thomas Santana 2020-05-31 18:13:12 CEST
If I bail out when ret == 0, I get the following error:

sr: [00:02.098003] scpi_usbtmc: USBTMC bulk in transfer error: LIBUSB_ERROR_TIMEOUT.
sr: [00:02.098007] scpi_usbtmc: HERE: Enter scpi_usbtmc_bulkin_start return -1
sr: [00:02.098012] scpi: Incompletely read SCPI response.
sr: [00:02.098018] rigol-ds: Couldn't get IDN response.
sr: [00:02.098022] scpi_usbtmc: Returning local control.

So not even the basic handshake works.
Comment 9 Thomas Santana 2020-06-01 11:54:39 CEST
After a lot of poking around I can confirm that there is some level of segmentation on USB.

The following change allows me to collect multiple channel both on CLI and PulssView (which I managed to build). 

diff --git a/src/scpi/scpi_usbtmc_libusb.c b/src/scpi/scpi_usbtmc_libusb.c
index d45d1f4c..945e49eb 100644
--- a/src/scpi/scpi_usbtmc_libusb.c
+++ b/src/scpi/scpi_usbtmc_libusb.c
@@ -534,6 +534,11 @@ static int scpi_usbtmc_bulkin_start(struct scpi_usbtmc_libusb *uscpi,
                return SR_ERR;
        }
 
+       if (transferred ==0 ) {
+               ret = libusb_bulk_transfer(usb->devhdl, uscpi->bulk_in_ep, data, size,
+                                  &transferred, TRANSFER_TIMEOUT);
+       }
        if (usbtmc_bulk_in_header_read(data, msg_id, uscpi->bTag, &message_size,
                                       transfer_attributes) != SR_OK) {
                sr_err("USBTMC invalid bulk in header.");


This is a partial success since when I look at the signal there is an artefact in PulseView. I will send a .sr.
Comment 10 Thomas Santana 2020-06-01 12:04:03 CEST
Created attachment 630 [details]
Capture After patching usb-driver

This was captured using PulseView and the RS1054z using the calibration signal. Midway through the signal there is a dip that does not exist in the source. I suspect there may be some secondary header.
Comment 11 Thomas Santana 2020-06-01 13:34:49 CEST
Just checking and bulk transfer can return 0 bytes of data in a zero package. Apparently Rigol is doing that. 

https://stackoverflow.com/questions/30946858/why-does-libusb-bulk-transfer-return-0
Comment 12 Thomas Santana 2020-06-01 21:22:41 CEST
Little more debugging and trying to figure out why I have the weird value in the sample. I'm using no input so values are around 0x82 (up down a couple of bits for jitter).

However sample 689 is a suspicious 0x0a (LF):
CH1: -48.4 V

Maybe I'm over analysing, but 689 = 41 + 27 * 24. Each read get 64 bytes per block. 12 bytes TMC header, then 48 bytes of data. First message has 41 bytes of payload (64 - 12 TMC - 11 replay header).

Also with my debug messages the last read get 19 of 20 bytes:
sr: [00:01.444272] scpi_usbtmc: scpi_usbtmc_bulkin_continue: ret=0 transferred=20
sr: [00:01.444288] scpi_usbtmc: scpi_usbtmc_libusb_read_data: read_length 19

So I wonder if there is some weird break on the messages.
Comment 13 Thomas Santana 2020-06-01 23:09:37 CEST
I changed the sample on api.c to 20. This means that the full message fits into a frame. 

And the hex dum is as follow:

0000 02 51 ae 00 20 00 00 00 01 00 00 00 23 39 30 30 
0010 30 30 30 30 30 32 30 82 82 82 82 82 83 82 82 82 
0020 83 82 83 82 82 82 82 82 82 82 83 0a 

This include the #9000000020 the 20 bytes of data and 0x0a (\n). So somehow the blocks are not working correctly. 

I noticed that buffer[8] is set to 1. 
From my log:
scpi_usbtmc_bulkin_start return 32, bulkin_attributes=0x01
The total message length was 44. 12 + 11 + 20 = 43. So there is a trailing \n.

Not sure if this is expected.

Looking at this spec: http://sdpha2.ucsd.edu/Lab_Equip_Manuals/USBTMC_1_00.pdf

D1 = Which is zero in this case

1 – All of the following are true:
  • The USBTMC interface supports TermChar
  • The bmTransferAttributes. TermCharEnabled bit was set in the REQUEST_DEV_DEP_MSG_IN.
  • The last USBTMC message data byte in this transfer matches the TermChar in the REQUEST_DEV_DEP_MSG_IN.
0 – One or more of the above conditions is
not met.

D0 = Set on his message
EOM.
1 - The last USBTMC message data byte in the transfer is the last byte of the USBTMC message.
0 – The last USBTMC message data byte in the transfer is not the last byte of the
USBTMC message.

Now I can't tell if the RIGOL is incorrect or not.
Comment 14 Thomas Santana 2020-06-02 01:07:49 CEST
After way too much time with this puzzle I have narrowed it down to the following:

1) At least on the Mac Zero Packages happen over USBTMC. This is allowed but will break the capture. The fix a proposed seems to solve the issue. It's important not to finish the message so MsgID does not get bumped.

2) The glitch in the capture I think is a bug of the DS side. You can see the 0x0a  being added at the end of all stream. But this mid stream one only happens when the livesamples is 1200. I've tried 800, 1000, 1110 and all worked, capturing multiple frames with no issue. 


I think this is a bug. The value 689 comes from 700 - 11 (start of WAVeform). And it's exactly at the 700th byte in the stream that gets the LF. Now this does not make sense since the first package is 500 data bytes long and the second is 700 (and followed by a LF). I'm not sure the LF should be there, but why would it be 
in the middle of the second message? It would make sense if this came at the end of the first message. Looks likes someone at Rigol flipped the logic.  

I'm assuming the is a DS issue and will try to open a support ticket.
Comment 15 Filippo Volpe 2021-03-25 20:18:09 CET
Sir your investigation skill are on point.

I can confirm all of your findings on my setup DS1104Z and latest firmware version 00.04.04.SP4.

Latest stable pulseview toolchain still fails around sample 700 of the first channel but the git version has been patched I see (probably from you).

Will try to poke Rigol for a firmware fix.
Comment 16 Filippo Volpe 2021-04-30 08:52:30 CEST
Response from rigol:

"The DS1000Z is designed to work with the NI VISA package and it doesn't perform well with lower level USB call not on Windows especially. I have some code for pulling data via a socket connection that can be useful on Linux or Mac if that interests you."

I think being sigrok open source I have to decline any code they could send me on a license basis to avoid future repercussions Nintendo's style. 
But at least they are telling us it works in some configuration but now the question is: are they doing some manual magic tricks to recover damages packets in their windows driver? If so we need to implement those manual fiddling to compensate for possible firmware issues as well.