Bug 1433 - LTO miscompiles libsigrok
Summary: LTO miscompiles libsigrok
Status: CONFIRMED
Alias: None
Product: libsigrok
Classification: Unclassified
Component: Build system (show other bugs)
Version: unreleased development snapshot
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: Nobody
URL:
Keywords:
: 1849 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-22 10:15 CEST by Jiri Slaby
Modified: 2023-11-23 18:32 CET (History)
6 users (show)



Attachments
working fix (4.92 KB, patch)
2019-10-22 10:16 CEST, Jiri Slaby
Details
Simpler fix (2.02 KB, patch)
2020-11-03 11:04 CET, Ivan Mironov
Details
conditional no_reorder linker flag application (2.00 KB, patch)
2020-11-03 18:34 CET, Gerhard Sittig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Slaby 2019-10-22 10:15:01 CEST
After fix from bug 1416 (commit 02a8c07d89ae), libsigrok is miscompiled when LTO is enabled.

It finds no devices:
$ sigrok-cli --scan
<no output>

If I disable LTO:
$ sigrok-cli --scan
The following devices were found:
demo - Demo device with 12 channels: D0 D1 D2 D3 D4 D5 D6 D7 A0 A1 A2 A3

The problem are sr_driver_list__start and sr_driver_list__stop symbols:
(gdb) p &sr_driver_list__start
$1 = (const struct sr_dev_driver *(*)[1]) 0x7ffff7dd7f80 <sr_driver_list.start>
(gdb) p &sr_driver_list__stop
$2 = (const struct sr_dev_driver *(*)[1]) 0x7ffff7dd7f88 <sr_driver_list.stop>


With LTO, the array contains only the dummy NULL entry (1 pointer/8 bytes). On the other hand, when LTO is disabled:
(gdb) p &sr_driver_list__start
$1 = (const struct sr_dev_driver *(*)[]) 0x7ffff7dd81f0 <sr_driver_list.start>
(gdb) p &sr_driver_list__stop
$2 = (const struct sr_dev_driver *(*)[]) 0x7ffff7dd86b8 <sr_driver_list.stop>
(gdb) p (0x7ffff7dd86b8-0x7ffff7dd81f0)/8
$9 = 153

Note that in both cases, __sr_driver_list is correct:

$ objdump -h -j __sr_driver_list  /usr/lib64/libsigrok.so.4

/usr/lib64/libsigrok.so.4:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
 24 __sr_driver_list 000004d0  00000000001071f0  00000000001071f0  001061f0  2**3
                  CONTENTS, ALLOC, LOAD, DATA


So with LTO, the hack for sr_driver_list__start and stop does not work.
Comment 1 Jiri Slaby 2019-10-22 10:16:00 CEST
Created attachment 556 [details]
working fix

If I use this linker script instead of the hack, it works.
Comment 2 Gerhard Sittig 2019-12-17 12:30:56 CET
agreed on the idea to remove the head and tail source files, and to use 
linker provided symbols instead for start and stop -- good idea!

suggest minor adjustments (though these are low prio):
- add the driver list after .text or .rodata and before .data, because this
  is constant data
- will -Wc,-T,filename work? keeps the -T option and its argument together 
  and increases readability/awareness/maintainability

these are more severe:
- it's essential to terminate the list with a NULL item, can the linker 
  script do that? like ". += 16;" after the stop symbol and before the 
  end of the new output section? libsigrok (internal library code, and 
  bindings) do rely on the NULL termination, its absence results in 
  undetermined behaviour (may or may not work, by coincidence)
- is the linker script's collecting the list items portable across the 
  set of sigrok supported platforms? not all of them use __sr_driver_list 
  as the items' section name -- is something wider needed? like 
  "*(*__sr_driver_list)" perhaps? with a wildcard in the section name
Comment 3 Jiri Slaby 2019-12-17 12:42:30 CET
(In reply to Gerhard Sittig from comment #2)
> - will -Wc,-T,filename work? keeps the -T option and its argument together 
>   and increases readability/awareness/maintainability

I am not sure, it actually might work, I will try.

> these are more severe:
> - it's essential to terminate the list with a NULL item, can the linker 
>   script do that?

Simply adding QUAD(0) after the sections content should do the job.

> - is the linker script's collecting the list items portable across the 
>   set of sigrok supported platforms? not all of them use __sr_driver_list 
>   as the items' section name -- is something wider needed? like 
>   "*(*__sr_driver_list)" perhaps? with a wildcard in the section name

If there are other sections names, sure, wildcards would work. BTW do you have some test farm cross-compiling on all the various metal, so that you can test whether it compiles at least?
Comment 4 Jiri Slaby 2019-12-17 13:00:30 CET
(In reply to Jiri Slaby from comment #3)
> (In reply to Gerhard Sittig from comment #2)
> > - will -Wc,-T,filename work? keeps the -T option and its argument together 
> >   and increases readability/awareness/maintainability
> 
> I am not sure, it actually might work, I will try.

It works.
Comment 5 Gerhard Sittig 2019-12-18 10:04:26 CET
I'm not aware of a publicly accessible build farm (can't tell whether an OBS 
user created a configuration for the sigrok software). But the sigrok project 
certainly has a CI setup for the mainline codebase, if this was your question.

Thank you for checking the -Wc,-T,file and the section wildcards syntax, and 
for the QUAD(0) hint which is more reliable than FILL().

Have also noticed that the linker script needs a $(srcdir) prefix to unbreak 
out-of-source builds. Build dependencies may be incomplete, modifications to 
the linker script don't always trigger rebuilds or take effect here.

Have yet to check the resulting binaries, static and dynamic, and the symbols 
and terminators which they contain. Ideally your LTO motivated change would 
improve robustness of the build process for all other build configurations, 
too. And a partial link of driver sources and list items to one libdrivers.o 
file which contains the driver code and the complete list of drivers including 
the terminator and the iteration helper would be most preferrable.
Comment 6 Gerhard Sittig 2019-12-20 19:08:16 CET
I have created an isolated experiment to tinker with autotools and libtool and 
partial linking and different build configurations (LTO and non-LTO). Turns out 
to be rather non-trivial. The autotools seem to like getting in the way, but 
I cannot see a better approach to the driver list collection than this approach 
of partial linking. So I followed that road. There are several subtleties 
involved, like amending vs overriding linker scripts, order of command line 
words and turning special options on and off again, static vs dynamic libs,
the different times of compilation and linking of LTO and non-LTO setups, etc.

This implementation tried to use official .am syntax, but failed and then 
fell back to custom make(1) rules which hook up to libtool and automake use.
The idea is to isolate the intermediate step between driver source compilation 
and creating the complete libsigrok library, so that the adjustment to the 
existing sigrok build process remains least intrusive.

  git://repo.or.cz/autotool-experiments.git

The current implementation does work here for empty and non-empty sets of user 
specified drivers, with LTO enabled and disabled, on Linux. It has not yet been 
tested on other platforms and may need adjustment of filename extensions there.

@Jiri: The patch that you provided was useful to get there, but needs more 
adjustment before it will work in the sigrok project for more build configs. 
Unless you provide another version, I will port my experiment (once it has 
seen wider tests) to the libsigrok environment, and will add your Reported-By: 
for the record.
Comment 7 Ivan Mironov 2020-11-02 19:11:10 CET
Hi! Is there any news about this issue?

FYI: Fedora recently switched on LTO globally for all packages. This resulted in missing HW drivers in Sigrok in Fedora 33: https://bugzilla.redhat.com/show_bug.cgi?id=1877485.
Comment 8 Gerhard Sittig 2020-11-02 20:07:46 CET
See comment 6. When I asked for review and/or test, I got zero feedback 
as usual (got this on several occassions before). Seems like no other users 
care about this issue, or maybe they rather wait for somebody else to do 
all of what's involved. So the situation has not changed since then.

There is (independent) work of moving from autotools to cmake. Can't tell 
what the status is on that front, and how it affects the drivers list topic.
Comment 9 Ivan Mironov 2020-11-03 11:04:05 CET
Created attachment 697 [details]
Simpler fix

I think I found a simpler fix for this issue. It does not require any build system modifications at all.
Comment 10 Gerhard Sittig 2020-11-03 18:08:23 CET
I like the approach in comment 9 a lot for its simplicity. Easy to reason 
about, low risk. Unfortunately the feature appears to depend on the 
toolchain version, and causes lots of warnings on some of the platforms 
that are supported by sigrok (one message per driver registration).

  ...
  CC       src/driver_list_start.lo
./source/src/driver_list_start.c:33:3: warning: 'no_reorder' attribute directive ignored [-Wattributes]
   = { NULL /* Dummy item, as zero length arrays are not allowed by C99 */ };
   ^
  CCLD     src/libdrivers_head.la
  CC       src/driver_list_stop.lo
./source/src/driver_list_stop.c:33:3: warning: 'no_reorder' attribute directive ignored [-Wattributes]
   = { NULL /* Dummy item, as zero length arrays are not allowed by C99 */ };
   ^
  CCLD     src/libdrivers_tail.la
  CCLD     src/libdrivers.o
  ...

https://www.gnu.org/software//gcc/gcc-5/changes.html suggests the feature 
was introduced in GCC 5. Can you add a feature check (preferred) or a 
version check? That'd allow me to pick up your fix for mainline, and 
give you credit for the complete fix.
Comment 11 Gerhard Sittig 2020-11-03 18:34:22 CET
Created attachment 698 [details]
conditional no_reorder linker flag application

Found an inspiration in the above mentioned release notes. See the attached 
diff. If this works for you, I can take this into mainline and give you a 
Submitted-By. Can you agree?

[ am rushing this diff to avoid duplicate effort, hence the dirty form ]
Comment 12 Ivan Mironov 2020-11-03 18:45:13 CET
(In reply to Gerhard Sittig from comment #11)

I tested you patch, and it works. So yes, I agree.
Comment 13 Gerhard Sittig 2020-11-05 05:37:58 CET
Fixed in libsigrok 3decd3b1f0cb. Phrased such that other similar setups can 
get handled by future commits, and that developers/packagers can specify 
what _they_ need in their specific situation.
Comment 14 Dan Horák 2021-02-17 10:06:48 CET
Passing an update from our toolchain person.

https://bugzilla.redhat.com/show_bug.cgi?id=1877485#c14

Fundamentally this code from src/drivers.c is broken:

        for (const struct sr_dev_driver **drivers = sr_driver_list__start + 1;
             drivers < sr_driver_list__stop; drivers++)
                g_array_append_val(array, *drivers);

The patch referenced in this BZ may work now, but I wouldn't at all be surprised if this breaks again one day.

Depending on two distinct objects being ordered in a particular way in memory is just wrong and I'm confident if I had the time to dig through the ISO C/C++ standards that it's got undefined behavior.
Comment 15 Gerhard Sittig 2021-02-17 19:53:46 CET
Re comment 14: That's rather bold a claim, especially in these general terms.

I can agree that the location of mere C language variables or routines in 
memory is not defined and shall not be relied upon. But the code in question 
is different, and the "just wrong" comment may miss its context.

A linker script arranges for the sr_driver_list items to form a NULL 
terminated array of references to driver item descriptors. If this is 
considered unreliable, then .data copies from flash to RAM, .bss 
initialization, constructor and destructor calls, the separation of 
early discardable init code from persistent .text content, and similar 
constructs were broken, too. If some LTO configuration manages to bodge 
these essential mechanisms, then I would not want to use it. :)

Granted, future maintenance needs to remain aware of the necessity to order 
these registered items and terminate the list properly. So if LTO based 
setups ignore the linker script, some other enforcement needs to be put 
in place. That was the point of using declaration helpers for all items 
such that the list construction can remain transparent as well as reliable. 
Comment 13 suggests that we are aware, _and_ have prepared for future 
adjustment as the need arises.

Can you agree with that point of view, or am I missing something critical? 
Was comment 14 referring to a different patch than commit 3decd3b1f0cb 
perhaps?
Comment 16 Gerhard Sittig 2021-02-17 20:07:46 CET
Call for feedback: If the current implementation of the driver list item 
declaration is considered incomplete or unreliable, got suggestions how 
to improve its reliability and/or portability? That'd be very much 
appreciated. Would be better than waiting for the next breakage to get 
reported after having a user suffer from the issue again. We don't want 
that. Thank you for helping improve the implementation.
Comment 17 Gerhard Sittig 2021-03-06 14:17:03 CET
I still feel that the linker section with the list of pointers to driver 
items _is_ the most appropriate way of implementing the drivers list 
contruction. Though I can also agree that the linker section approach 
depends on the separate compiler and linker tools and how external build 
infrastructure combines these, or even adds more complexity and fragility 
to the setup. The recent trend to more aggressively optimize without much 
regard for other valid use cases beyond application development just 
reveals the fragility.

But I have yet to understand what is considered _wrong_ about the approach 
of collecting the pointers in a linker section. In the sense of "incorrect". 
Beyond the potential "rather complex" or "not intuitive" concerns, which 
do apply but don't relate to correctness, only express some expectation. 
Is there a convincing reason why the linker section of only pointers is 
_wrong_? The more I think about this, the better I like comparing the 
drivers list to constructor calls, which also happens to be implemented 
by means of a list of pointers in a linker section. What am I missing when 
I consider the drivers list and the constructors list as something similar?

IRC conversation suggests that "C language constructors" and runtime 
registration of items in the drivers list could work. It's unfortunate 
that these have to re-invent the drivers list collection that was done 
at compile time before. Cost would be a little higher, but acceptable. The 
complexity would remain at a similar level: The iteration in calling code 
remains as simple as it is now. Individual drivers don't need adjustment 
as the specifically chosen approach remains transparent. All adjustment 
is concentrated in a single spot which declares the registration helper. 
Are constructor attributes considered more reliable than linker section 
attributes? Are constructors considered "used" and will be kept and not 
dropped by aggressive optimizers?

Have updated the isolated experiment which eliminates the complexity of 
libsigrok build rules out of the picture. The recent version implements 
the constructor approach and registers driver items at runtime. A linked 
list is used to eliminate the necessity for a compile time constant for 
the maximum list size. Local experiments on a small group of platforms 
look promising so far.

  git://repo.or.cz/autotool-experiments.git
Comment 18 fenugrec 2023-05-05 14:52:13 CEST
*** Bug 1849 has been marked as a duplicate of this bug. ***
Comment 19 FJH 2023-07-07 01:00:50 CEST
This has also recently cropped up in ubuntu 23.04:
https://bugs.launchpad.net/ubuntu/+source/libsigrok/+bug/2025248

I kind of blame them for packaging the old 0.5.2 (which predates http://sigrok.org/gitweb/?p=libsigrok.git;a=commit;h=3decd3b1f0cbb3a035f72e9eade42279d0507b89) but I kind of don't because it is the last release tag.
So it might be time to tag another release?

As long as I'm asking dumb questions, was there a reason not to make each driver a separate shared library then try to dlopen all of them (ignoring ones that aren't found) instead of arcane linker tricks?
Comment 20 fenugrec 2023-11-23 18:28:25 CET
updating status based on https://bugs.launchpad.net/ubuntu/+source/libsigrok/+bug/2025248
Comment 21 fenugrec 2023-11-23 18:32:00 CET
Proposed fix (WIP), https://github.com/sigrokproject/libsigrok/pull/233

Instead of using a special section that holds a pointer to each of the
driver descriptors, use "__attribute__((constructor))" to run a certain
function at runtime just before main() entry.
That is also before glib init and possibly (unclear) before heap is
usable, so it may not be possible to directly build a GArray or GSList.
A simple linked-list provides the intermediarey step, and
sr_drivers_init() still builds the final driver list via a GArray.

This compiler attribute is well-supported by gcc and clang.

The SR_REGISTER_DEV_DRIVER_LIST() implementation is also not very
elegant, but is a drop-in replacement.

Comments & tests welcome, currently just $worksforme