Bug 968 - start at line N might be off in the presence of empty lines
Summary: start at line N might be off in the presence of empty lines
Status: RESOLVED FIXED
Alias: None
Product: libsigrok
Classification: Unclassified
Component: Input: csv (show other bugs)
Version: unreleased development snapshot
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-05 19:44 CEST by Gerhard Sittig
Modified: 2019-12-22 14:53 CET (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gerhard Sittig 2017-06-05 19:44:53 CEST
The CSV input module has a "startline" option, to optionally skip the first 
number of lines in the input stream.  Source code inspection suggests that 
this feature might use a wrong line number when the input stream contains 
empty lines.  Because lines are split at "any run of delimiter chars" and 
then counted.

This suspicion was not verified, no test data is available right now.
Comment 1 Gerhard Sittig 2019-10-18 20:01:34 CEST
Further code inspection and local experiments suggest that line numbers in 
diagnostics and the start-at-line option suffer from general unreliability. 
A WorkInProgress implementation which extends the feature set also fixes 
the issue reported here. A fix will soon become available.
Comment 2 Gerhard Sittig 2019-10-19 10:24:30 CEST
There appear to be differences in glib versions. The official documentation 
states that

  https://developer.gnome.org/glib/stable/glib-String-Utility-Functions.html#g-strsplit

  ...
  the result of g_strsplit (":a:bc::d:", ":", -1) is a NULL-terminated 
  vector containing the six strings "", "a", "bc", "", "d" and ""
  ...

Though local tests suggest that when the input string starts with the 
separator then the first item need _not_ be in the result set. Intermediate 
empty parts do show up here.

Which means that rare edge cases may remain after the general fix of the 
current line termination handling, where line numbers in diagnostics and 
the start-at-line feature still may be off. This can happen when the input 
file immediately starts with an empty line, or when internal chunking of 
large input files results in one chunk starting with an empty line.

It's yet to get determined how complex a workaround for glib's undocumented 
behaviour is, and how reliable it will be. Short of re-inventing string 
split logic in libsigrok code itself, which is undesirable.
Comment 3 Gerhard Sittig 2019-10-19 10:31:49 CEST
Bug 1431 attachment 555 [details] triggers the issue (here, locally, may depend on the 
specific glib version).
Comment 4 Gerhard Sittig 2019-10-19 21:39:30 CEST
I was wrong. It's not glib's fault. No version dependency exists and the 
split routine works as expected. It was an innocent looking strip call 
which swallowed the leading whitespace. A fix is on its way to mainline.
Comment 5 Uwe Hermann 2019-12-22 14:53:10 CET
Fixed in fbefa03f58e9fa6be58024e4df913602a4eb8ab5, thanks!