Re: gdb built with musl libc segfault


 

On 02/04/2019 14:51, Lluis Campos wrote:
Hi Paul,
On 02.04.2019 14:49, Paul Barker wrote:
On 02/04/2019 12:45, Lluis Campos wrote:
Hi all,

This is my very first question in the Yocto mailing list. Very exited! Please let me know if I should use other list for this.


I am building an image using musl libc instead of gnu libc. I am not using yocto-tiny distro, instead I achieve this by setting on my local.conf:

TCLIBC = "musl"


My app (mender) got a segfault just starting. See output from strace:

root@raspberrypi3:~# strace mender
execve("/usr/bin/mender", ["mender"], 0x7ee65e10 /* 13 vars */) = 0
set_tls(0x76f1bffc)                     = 0
set_tid_address(0x76f1bfa0)             = 3020
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x530bc8} ---
+++ killed by SIGSEGV +++
Segmentation fault


To be able to debug the process, I added gdb to my image adding to my local.conf:

CORE_IMAGE_EXTRA_INSTALL += "packagegroup-core-buildessential packagegroup-core-tools-debug"


Then, ironically, gdb itself also segfaults:

root@raspberrypi3:~# strace gdb 2>&1 | tail
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
getdents64(3, /* 6 entries */, 2048)    = 144
getdents64(3, /* 0 entries */, 2048)    = 0
close(3)                                = 0
ioctl(0, TIOCGWINSZ, {ws_row=25, ws_col=74, ws_xpixel=0, ws_ypixel=0}) = 0
getcwd("/home/root", 4096)              = 11
access("/usr/local/bin/gdb", X_OK)      = -1 ENOENT (No such file or directory)
access("/usr/bin/gdb", X_OK)            = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7e35aff0} ---
+++ killed by SIGSEGV +++


So, what is going on here? My guess is that some recipes are being wrongly linked with gnu libc instead of musl, and then cannot run in my device.

Any ideas on how to debug the issue?
Hi Lluis,

This is an issue I've seen before with runc and gdb.

In runc we saw SIGILL which I tracked down to some hideous setjmp/longjmp magic written in C. cgo is used to include this C code in with the Go code that comprises the rest of the application.
Our application is written in Go and we use CGO as well. So it sounds quite similar.
Does your application use setjmp/longjmp or C++ exceptions in the sections built with CGO?

I've dug into the runc disassembly as well as adding extra prints and can see that it's at calls to those functions that the program counter jumps off into the weeds resulting in SIGILL. For gdb there's usage of C++ exception handling around gdb_main() and strace shows that the crash is very very early in execution so I suspect it's setjmp/longjmp again, used by C++ exception handling.


In gdb we saw SIGSEGV which is what you've got above.

I think things are being correctly linked against musl but then there's some runtime issue in recent musl versions, possibly in conjunction with recent kernel headers.

Are you using the thud or master branch?
I am using thud branch. I haven't actually tried with master but I will do it later today
I've reproduced the issue on master. I've also ruled out the issue in sumo branch even if we uprev musl to the same version used on master. It's likely a gcc/musl incompatibility of some kind introduced in a recent gcc version.

I'm passing this one to a colleague for now but I'll try to have another look myself next week. It's captured in our issue tracker for Oryx here: https://gitlab.com/oryx/oryx/issues/14.

Thanks,

--
Paul Barker
Managing Director & Principal Engineer
Beta Five Ltd

Join yocto@lists.yoctoproject.org to automatically receive all group messages.