From: Gerhard Sittig Date: Mon, 5 Jun 2017 16:24:52 +0000 (+0200) Subject: input/csv: Add developer comment with TODO items X-Git-Tag: libsigrok-0.5.0~16 X-Git-Url: https://sigrok.org/gitaction?a=commitdiff_plain;h=ccff468b5e1b841a0c4f0426f502b9dad23dd9be;p=libsigrok.git input/csv: Add developer comment with TODO items "Document" the current state of the implementation in the CSV input module's source code. Discuss how text handling is non-trivial, which approaches are available and how they have drawbacks. Mention the lack of support for the import of analog data as well. --- diff --git a/src/input/csv.c b/src/input/csv.c index d58e0031..013d7a47 100644 --- a/src/input/csv.c +++ b/src/input/csv.c @@ -67,6 +67,45 @@ * than 0. The default line number to start processing is 1. */ +/* + * TODO + * + * - Determine how the text line handling can get improved, regarding + * all of robustness and flexibility and correctness. + * - The current implementation splits on "any run of CR and LF". Which + * translates to: Line numbers are wrong in the presence of empty + * lines in the input stream. + * - The current implementation insists in the presence of end-of-line + * markers on _every_ line in the input stream. "Incomplete" text + * files that are so typical on the Windows platform get rejected as + * invalid. + * - Dropping support for CR style end-of-line markers could improve + * the situation a lot. Code could search for and split on LF, and + * trim optional trailing CR. This would result in proper support + * for CRLF (Windows) as well as LF (Unix), and allow for correct + * line number counts. + * - When support for CR-only line termination cannot get dropped, + * then the current implementation is inappropriate. Currently the + * input stream is scanned for the first occurance of either of the + * supported termination styles (which is good). For the remaining + * session a consistent encoding of the text lines is assumed (which + * is acceptable). Potential absence of the terminator for the last + * line is orthogonal, and can get handled by a "force" flag when + * the end() routine calls the process_buffer() routine. + * - When line numbers need to be correct and reliable, _and_ the full + * set of previously supported line termination sequences are required, + * and potentially more are to get added for improved compatibility + * with more platforms or generators, then the current approach of + * splitting on runs of termination characters needs to get replaced, + * by the more expensive approach to scan for and count the initially + * determined termination sequence. + * + * - Add support for analog input data? (optional) + * - Needs a syntax first for user specs which channels (columns) are + * logic and which are analog. May need heuristics(?) to guess from + * input data in the absence of user provided specs. + */ + /* Single column formats. */ enum { FORMAT_BIN,