Date   

Re: SWAT statistics for week 19

Randy MacLeod
 

On 2021-05-18 6:26 p.m., Alexandre Belloni wrote:
Hi,

On 18/05/2021 23:21:53+0100, Ross Burton wrote:
Quick idea for swatbot: a top ten list of open bugs which have the
highest number of instances.
I'm maintaining a spreadsheet that goes a bit beyond that. I'm also
tracking the frequency of the bugs in the last months and we started to
close few of the older AB-INT issues. I'll share that publicly soon.
Hi Alexandre,

Any update on the list/spreadsheet?

Tony is getting going on valgrind and he'll start with:
   https://bugzilla.yoctoproject.org/show_bug.cgi?id=14294

   [Bug 14294] valgrind memcheck/tests/linux/timerfd-syscall ptest intermittent failure

unless there's another ptest issue that is more urgent.


../Randy


Ross

On Tue, 18 May 2021 at 23:19, Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:
Hello,

Here are the statistics for last week. Chee Yang was on SWAT duty.

160 failures were triaged:

* 119 by Chee Yang
- 38 for meson changes
- 24 for an issue in meta-arm after an upgrade of u-boot
- 11 for the btrfs-tools upgrade
- 6 for ovmf reproducibility issues
- 2 for meta-oe YP compatibility issues
- 4 new occurrences of bug 14310
- 4 new occurrences of bug 14251
- 3 new occurrences of bug 13802
- 3 new occurrences of bug 14273
- 2 new occurrences of bug 14208
- 2 new occurrences of bug 14381
- 1 new occurrence of bug 14145
- 1 new occurrence of bug 14163
- 1 new occurrence of bug 14165
- 1 new occurrence of bug 14177
- 1 new occurrence of bug 14197
- 1 new occurrence of bug 14201
- 1 new occurrence of bug 14250
- 1 new occurrence of bug 14294
- 1 new occurrence of bug 14296
- 1 new occurrence of bug 14311
- 4 occurrences of new bug 14388
- 2 occurrences of new bug 14393
- 1 occurrence of new bug 14389
- 1 occurrence of new bug 14390
- 1 occurrence of new bug 14391

* 41 by Richard
- 20 for an issue in meta-arm after an upgrade of u-boot
- 10 for issues he fixed
- 4 for the libepoxy upgrade
- 2 for YP compatibility issues in meta-AGL
- 2 for patches merged out of order
- 2 for branch names changed upstream
- 1 because gitlab was down

Regards,

--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com






--
# Randy MacLeod
# Wind River Linux


ubuntu2004-arm-1 load increase and possible instablity

Michael Halstead <mhalstead@...>
 

The ubuntu2004-arm-1 worker has been unstable in the past and we reduced the number of simultaneous builds from 3 to 1 to see if that would stop the crashes. It didn't at first but now the crashes have stopped. Perhaps due to kernel updates. I'm planning to increase the simultaneous builds back to 3 when the controller is next idle. This may cause the crashes to begin again and I want the SWAT team to be aware of the change.

--
Michael Halstead
Linux Foundation / Yocto Project
Systems Operations Engineer


Re: Further rcu stall on autobuilder

Richard Purdie
 

On Mon, 2021-05-24 at 15:29 +0100, Richard Purdie via lists.yoctoproject.org wrote:
On Mon, 2021-05-24 at 09:21 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:56 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 12:51 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:47 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
A set of SRCREVs sounds like the best plan, I think it might be worth testing
to see if things improve or not.
I created the attached recipes. Built and booted on qemux86-64 with no
issues.

I assume you'll do the appropriate preferred version in the test
branches to make
sure they are used instead of 5.10 ?
About the time you were writing this, I'd hacked up:

http://git.yoctoproject.org/cgit.cgi/poky/commit/?h=master-next&id=de3e2253482b6d9df1137128a9fde35dec8fd915

and put it into a build on the autobuilder. It caused meta-arm to blow up
and I suspect there may be other fallout but we'll see...

FWIW, I checked with Alexandre and it seems all the rcu failure issues
are on qemuXXX builds but not qemuXXX-alt. The former is 5.10, the latter 
5.4.

I'm starting to strongly suspect there is some issue with 5.10 as we don't
see this with dunfell or with poky-alt :/. I'd wonder why nobody else has
noticed though...
I switched to Bruce's 5.12 patches. Unfortunately even with 5.12:

https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/2118/steps/12/logs/stdio

:(

Also,
https://autobuilder.yoctoproject.org/typhoon/#/builders/110/builds/2362
and the corresponding:
https://autobuilder.yocto.io/pub/non-release/20210523-10/testresults/qemuarm-alt/2021-05-24--01-52/host_stats_1_top.txt
is interesting. That was a qemuarm-alt image (5.4 kernel) which could be a genuine load 
issue. It is getting 300% cpu though so hardly resource starved.

Ideas welcome at this point.

Cheers,

Richard


Re: Further ltp hang - kernel issue?

Bruce Ashfield <bruce.ashfield@...>
 

On Mon, May 24, 2021 at 11:31 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 07:42 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 6:36 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 11:33 +0100, Richard Purdie via lists.yoctoproject.org wrote:
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1932
Oddly enough,
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1933
on centos7-ty-4 (master build) is locked up with pretty much exactly
the same issue/ps output/tests/dmesg.

The first one above was debian10-ty-1 with master-next.

Recent kernel version bump?
I can't think of anything specific that would cause those issues, but the
Wind River guys did report some bad iommu patches that were part of
5.10.37

I've merged .38, which has the fixes, but I haven't sent the bumps yet.
It is worth trying the attached SRCREV patch, to see if there's any
change in behaviour.
Thanks for the patch, I ran with it for a number of runs. I have not seen .38
break in the way master or master-next with .37 did. I've ran several and 50%
of the time .37 would hang in ltp.

Can we upgrade to .38 ASAP please? :)
sent. I cherry picked it from my queue and sent it individually.

I'll continue testing the rest of my updates.

Bruce

This is obviously a separate issue to the rcu stalls but I also think that
is 5.10 related.

Cheers,

Richard


--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II


Re: Further ltp hang - kernel issue?

Richard Purdie
 

On Sun, 2021-05-23 at 07:42 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 6:36 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 11:33 +0100, Richard Purdie via lists.yoctoproject.org wrote:
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1932
Oddly enough,
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1933
on centos7-ty-4 (master build) is locked up with pretty much exactly
the same issue/ps output/tests/dmesg.

The first one above was debian10-ty-1 with master-next.

Recent kernel version bump?
I can't think of anything specific that would cause those issues, but the
Wind River guys did report some bad iommu patches that were part of
5.10.37

I've merged .38, which has the fixes, but I haven't sent the bumps yet.
It is worth trying the attached SRCREV patch, to see if there's any
change in behaviour.
Thanks for the patch, I ran with it for a number of runs. I have not seen .38
break in the way master or master-next with .37 did. I've ran several and 50%
of the time .37 would hang in ltp.

Can we upgrade to .38 ASAP please? :)

This is obviously a separate issue to the rcu stalls but I also think that
is 5.10 related.

Cheers,

Richard


Re: Further rcu stall on autobuilder

Richard Purdie
 

On Mon, 2021-05-24 at 09:21 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:56 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 12:51 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:47 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
A set of SRCREVs sounds like the best plan, I think it might be worth testing
to see if things improve or not.
I created the attached recipes. Built and booted on qemux86-64 with no
issues.

I assume you'll do the appropriate preferred version in the test
branches to make
sure they are used instead of 5.10 ?
About the time you were writing this, I'd hacked up:

http://git.yoctoproject.org/cgit.cgi/poky/commit/?h=master-next&id=de3e2253482b6d9df1137128a9fde35dec8fd915

and put it into a build on the autobuilder. It caused meta-arm to blow up
and I suspect there may be other fallout but we'll see...

FWIW, I checked with Alexandre and it seems all the rcu failure issues
are on qemuXXX builds but not qemuXXX-alt. The former is 5.10, the latter 
5.4.

I'm starting to strongly suspect there is some issue with 5.10 as we don't
see this with dunfell or with poky-alt :/. I'd wonder why nobody else has
noticed though...

Cheers,

Richard


Re: Further rcu stall on autobuilder

Bruce Ashfield <bruce.ashfield@...>
 

On Sun, May 23, 2021 at 12:56 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 12:51 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:47 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

We've got yet another rcu stall failure on the autobuilder:

https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/2123/steps/15/logs/stdio

and looking at the dmesg in the qemu log:

[ 20.424033] Freeing unused kernel image (rodata/data gap) memory: 652K
[ 20.425229] Run /sbin/init as init process
INIT: version 2.99 booting
FBIOPUT_VSCREENINFO failed, double buffering disabledStarting udev
[ 20.547298] udevd[161]: starting version 3.2.10
[ 20.553329] udevd[162]: starting eudev-3.2.10
[ 20.751260] EXT4-fs (vda): re-mounted. Opts: (null)
[ 20.752548] ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
INIT: Entering runlevel: 5
Configuring network interfaces... RTNETLINK answers: File exists
Starting random number generator daemon.
Starting OpenBSD Secure Shell server: sshd
done.
Starting rpcbind daemon...done.
starting statd: done
Starting atd: OK
[ 21.921925] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
starting 8 nfsd kernel threads: [ 23.066283] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 23.068096] NFSD: Using legacy client tracking operations.
[ 23.069086] NFSD: starting 90-second grace period (net f0000098)
done
starting mountd: [ 45.272151] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 45.273423] rcu: 1-...0: (10 ticks this GP) idle=7ba/1/0x4000000000000000 softirq=598/612 fqs=5249
[ 45.274951] (detected by 2, t=21002 jiffies, g=-195, q=13)
[ 138.202149] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 332.762209] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

This is with the kvm clock source disabled (in master-next) and with Bruce's
5.10.38 upgrade so that kind of rules out either of those two things for this
issue. It also can't be the qemu platform or cpu emulation used since we've
changed that.

What is really odd is that it never actually prints the stalled tasks. That
seems really strange. It is obviously alive enough to print a stall message
later but stalls out and is terminated after 1500s.

Really open to ideas at this point. Should we try a newer kernel version
for testing in -next, see if we can isolate this to 5.10?
If you want to switch to linux-yocto-dev, it is on 5.12.x, and I have
a local 5.13-rcX version of -dev.

We could whip together a SRCREV recipe for it, if you don't want to
use the AUTOREV.

I'm not going to do a full versioned linux-yocto for 5.12, but we can
special case this if we want to go that route.
A set of SRCREVs sounds like the best plan, I think it might be worth testing
to see if things improve or not.
I created the attached recipes. Built and booted on qemux86-64 with no
issues.

I assume you'll do the appropriate preferred version in the test
branches to make
sure they are used instead of 5.10 ?

Bruce


What is also odd is that in that in that same build, another qemu instance
hung in syslinux loading bzImage. We've seen this before occasionally and
it seems to keep happening periodically. That would seem more like a qemu
bug yet we're on the latest qemu release :/.

In neither case did Randy's stall detector trigger as far as I can tell.

Cheers,

Richard

--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II


Re: SWAT Rotation

Alexandre Belloni
 

Hello Jaga,

Are you available for SWAT this week?

I'll be looking at some of the failures today.

On 22/05/2021 11:44:56+0900, 김민재 wrote:
Hi Alexandre


I am sorry. I can't work next week. Because my house is moving to another place
in this weekend. So, I can swat work on June 1st.

Can I delay my rotation by just one week?
Sure, no problem


--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: Further rcu stall on autobuilder

Richard Purdie
 

On Sun, 2021-05-23 at 12:51 -0400, Bruce Ashfield wrote:
On Sun, May 23, 2021 at 12:47 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

We've got yet another rcu stall failure on the autobuilder:

https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/2123/steps/15/logs/stdio

and looking at the dmesg in the qemu log:

[ 20.424033] Freeing unused kernel image (rodata/data gap) memory: 652K
[ 20.425229] Run /sbin/init as init process
INIT: version 2.99 booting
FBIOPUT_VSCREENINFO failed, double buffering disabledStarting udev
[ 20.547298] udevd[161]: starting version 3.2.10
[ 20.553329] udevd[162]: starting eudev-3.2.10
[ 20.751260] EXT4-fs (vda): re-mounted. Opts: (null)
[ 20.752548] ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
INIT: Entering runlevel: 5
Configuring network interfaces... RTNETLINK answers: File exists
Starting random number generator daemon.
Starting OpenBSD Secure Shell server: sshd
done.
Starting rpcbind daemon...done.
starting statd: done
Starting atd: OK
[ 21.921925] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
starting 8 nfsd kernel threads: [ 23.066283] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 23.068096] NFSD: Using legacy client tracking operations.
[ 23.069086] NFSD: starting 90-second grace period (net f0000098)
done
starting mountd: [ 45.272151] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 45.273423] rcu: 1-...0: (10 ticks this GP) idle=7ba/1/0x4000000000000000 softirq=598/612 fqs=5249
[ 45.274951] (detected by 2, t=21002 jiffies, g=-195, q=13)
[ 138.202149] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 332.762209] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

This is with the kvm clock source disabled (in master-next) and with Bruce's
5.10.38 upgrade so that kind of rules out either of those two things for this
issue. It also can't be the qemu platform or cpu emulation used since we've
changed that.

What is really odd is that it never actually prints the stalled tasks. That
seems really strange. It is obviously alive enough to print a stall message
later but stalls out and is terminated after 1500s.

Really open to ideas at this point. Should we try a newer kernel version
for testing in -next, see if we can isolate this to 5.10?
If you want to switch to linux-yocto-dev, it is on 5.12.x, and I have
a local 5.13-rcX version of -dev.

We could whip together a SRCREV recipe for it, if you don't want to
use the AUTOREV.

I'm not going to do a full versioned linux-yocto for 5.12, but we can
special case this if we want to go that route.
A set of SRCREVs sounds like the best plan, I think it might be worth testing
to see if things improve or not.

What is also odd is that in that in that same build, another qemu instance
hung in syslinux loading bzImage. We've seen this before occasionally and
it seems to keep happening periodically. That would seem more like a qemu
bug yet we're on the latest qemu release :/.

In neither case did Randy's stall detector trigger as far as I can tell.

Cheers,

Richard


Re: Further rcu stall on autobuilder

Bruce Ashfield <bruce.ashfield@...>
 

On Sun, May 23, 2021 at 12:47 PM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

We've got yet another rcu stall failure on the autobuilder:

https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/2123/steps/15/logs/stdio

and looking at the dmesg in the qemu log:

[ 20.424033] Freeing unused kernel image (rodata/data gap) memory: 652K
[ 20.425229] Run /sbin/init as init process
INIT: version 2.99 booting
FBIOPUT_VSCREENINFO failed, double buffering disabledStarting udev
[ 20.547298] udevd[161]: starting version 3.2.10
[ 20.553329] udevd[162]: starting eudev-3.2.10
[ 20.751260] EXT4-fs (vda): re-mounted. Opts: (null)
[ 20.752548] ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
INIT: Entering runlevel: 5
Configuring network interfaces... RTNETLINK answers: File exists
Starting random number generator daemon.
Starting OpenBSD Secure Shell server: sshd
done.
Starting rpcbind daemon...done.
starting statd: done
Starting atd: OK
[ 21.921925] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
starting 8 nfsd kernel threads: [ 23.066283] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 23.068096] NFSD: Using legacy client tracking operations.
[ 23.069086] NFSD: starting 90-second grace period (net f0000098)
done
starting mountd: [ 45.272151] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 45.273423] rcu: 1-...0: (10 ticks this GP) idle=7ba/1/0x4000000000000000 softirq=598/612 fqs=5249
[ 45.274951] (detected by 2, t=21002 jiffies, g=-195, q=13)
[ 138.202149] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 332.762209] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

This is with the kvm clock source disabled (in master-next) and with Bruce's
5.10.38 upgrade so that kind of rules out either of those two things for this
issue. It also can't be the qemu platform or cpu emulation used since we've
changed that.

What is really odd is that it never actually prints the stalled tasks. That
seems really strange. It is obviously alive enough to print a stall message
later but stalls out and is terminated after 1500s.

Really open to ideas at this point. Should we try a newer kernel version
for testing in -next, see if we can isolate this to 5.10?
If you want to switch to linux-yocto-dev, it is on 5.12.x, and I have
a local 5.13-rcX version of -dev.

We could whip together a SRCREV recipe for it, if you don't want to
use the AUTOREV.

I'm not going to do a full versioned linux-yocto for 5.12, but we can
special case this if we want to go that route.

Bruce



Cheers,

Richard

--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II


Further rcu stall on autobuilder

Richard Purdie
 

We've got yet another rcu stall failure on the autobuilder:

https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/2123/steps/15/logs/stdio

and looking at the dmesg in the qemu log:

[ 20.424033] Freeing unused kernel image (rodata/data gap) memory: 652K
[ 20.425229] Run /sbin/init as init process
INIT: version 2.99 booting
FBIOPUT_VSCREENINFO failed, double buffering disabledStarting udev
[ 20.547298] udevd[161]: starting version 3.2.10
[ 20.553329] udevd[162]: starting eudev-3.2.10
[ 20.751260] EXT4-fs (vda): re-mounted. Opts: (null)
[ 20.752548] ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
INIT: Entering runlevel: 5
Configuring network interfaces... RTNETLINK answers: File exists
Starting random number generator daemon.
Starting OpenBSD Secure Shell server: sshd
done.
Starting rpcbind daemon...done.
starting statd: done
Starting atd: OK
[ 21.921925] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
starting 8 nfsd kernel threads: [ 23.066283] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 23.068096] NFSD: Using legacy client tracking operations.
[ 23.069086] NFSD: starting 90-second grace period (net f0000098)
done
starting mountd: [ 45.272151] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 45.273423] rcu: 1-...0: (10 ticks this GP) idle=7ba/1/0x4000000000000000 softirq=598/612 fqs=5249
[ 45.274951] (detected by 2, t=21002 jiffies, g=-195, q=13)
[ 138.202149] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 332.762209] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

This is with the kvm clock source disabled (in master-next) and with Bruce's 
5.10.38 upgrade so that kind of rules out either of those two things for this
issue. It also can't be the qemu platform or cpu emulation used since we've
changed that.

What is really odd is that it never actually prints the stalled tasks. That
seems really strange. It is obviously alive enough to print a stall message
later but stalls out and is terminated after 1500s.

Really open to ideas at this point. Should we try a newer kernel version
for testing in -next, see if we can isolate this to 5.10?

Cheers,

Richard


Re: Further ltp hang - kernel issue?

Bruce Ashfield <bruce.ashfield@...>
 

On Sun, May 23, 2021 at 6:36 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

On Sun, 2021-05-23 at 11:33 +0100, Richard Purdie via lists.yoctoproject.org wrote:
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1932
Oddly enough,
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1933
on centos7-ty-4 (master build) is locked up with pretty much exactly
the same issue/ps output/tests/dmesg.

The first one above was debian10-ty-1 with master-next.

Recent kernel version bump?
I can't think of anything specific that would cause those issues, but the
Wind River guys did report some bad iommu patches that were part of
5.10.37

I've merged .38, which has the fixes, but I haven't sent the bumps yet.
It is worth trying the attached SRCREV patch, to see if there's any
change in behaviour.

Bruce


Cheers,

Richard

--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II


Re: Further ltp hang - kernel issue?

Richard Purdie
 

On Sun, 2021-05-23 at 11:33 +0100, Richard Purdie via lists.yoctoproject.org wrote:
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1932
Oddly enough, 
https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1933 
on centos7-ty-4 (master build) is locked up with pretty much exactly 
the same issue/ps output/tests/dmesg.

The first one above was debian10-ty-1 with master-next.

Recent kernel version bump?

Cheers,

Richard


Further ltp hang - kernel issue?

Richard Purdie
 

https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1932

We have another ltp hang on the autobuilder. ps ax output:

9261 ? I 0:00 [kworker/2:0-rcu_gp]
9298 ? I 0:01 [kworker/1:3-events]
11312 ? S 0:00 /sbin/syslogd -n -O /var/log/messages
11315 ? S 0:00 /sbin/klogd -n
13085 ? I 0:00 [kworker/2:4-mm_percpu_wq]
16291 ? S 0:00 /usr/sbin/dropbear -r /etc/dropbear/dropbear_rsa_host_key -p 22 -B
16292 ? S 0:00 /bin/sh /opt/ltp/runltp -f commands -p -q -r /opt/ltp -l /opt/ltp/results/commands
-I 1 -d /opt/ltp
16334 ? S 0:00 /opt/ltp/bin/ltp-pan -q -e -S -a 16292 -n 16292 -p -f /opt/ltp/ltp-
i2b5MCkGm9/alltests -l /opt/ltp/results/commands -C /opt/ltp/output/LTP_RUN_ON-commands.failed -T
/opt/ltp/output/LTP_RUN_ON-commands.t
17878 ? S< 0:00 [loop0]
17884 ? D 0:00 mount -t ext2 /dev/loop0 mntpoint
17885 ? I< 0:00 [ext4-rsv-conver]
17906 ? S< 0:00 [loop1]
17912 ? D 0:00 mount -t ext3 /dev/loop1 mntpoint
17913 ? S 0:00 [jbd2/loop1-8]
17914 ? I< 0:00 [ext4-rsv-conver]
17916 ? I 0:00 [kworker/u8:2-events_unbound]
17936 ? S< 0:00 [loop2]
17942 ? D 0:00 mount -t ext4 /dev/loop2 mntpoint
17943 ? S 0:00 [jbd2/loop2-8]
17944 ? I< 0:00 [ext4-rsv-conver]
17965 ? S< 0:00 [loop3]
17967 ? D 0:00 grep -q /dev/loop3 /proc/mounts
17988 ? S< 0:00 [loop4]
17991 ? D 0:00 mkfs.vfat /dev/loop4
18011 ? S< 0:00 [loop5]
18013 ? D 0:00 grep -q /dev/loop5 /proc/mounts
18033 ? S< 0:00 [loop6]
18035 ? D 0:00 grep -q /dev/loop6 /proc/mounts
18548 ? D 0:00 unshare --mount mount --bind dir_A dir_B
18606 ? R 0:00 /usr/sbin/dropbear -r /etc/dropbear/dropbear_rsa_host_key -p 22 -B

so we're in loopback mount tests but the small piece of kernel dmesg below 
suggests this one is a kernel issue. I only pasted a sample, the buffer is filled
with these. Added Bruce to cc:, any ideas?


[ 2131.374105] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.374650] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.376669] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.377255] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.378017] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.378813] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.379614] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.380391] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.381146] FS: 00007fd92f66b580(0000) GS:ffff88e8fec80000(0000) knlGS:0000000000000000
[ 2131.382038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.382702] CR2: 0000000000000008 CR3: 0000000035466000 CR4: 00000000001506e0
[ 2131.384305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.385060] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.423042] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.426829] #PF: supervisor read access in kernel mode
[ 2131.427630] #PF: error_code(0x0000) - not-present page
[ 2131.428337] PGD 0 P4D 0
[ 2131.428692] Oops: 0000 [#1018] PREEMPT SMP PTI
[ 2131.429302] CPU: 3 PID: 15773 Comm: grep Tainted: G D 5.10.37-yocto-standard #1
[ 2131.430453] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.431970] RIP: 0010:kernfs_sop_show_options+0x15/0x50
[ 2131.432674] Code: 10 48 c7 c0 f2 ff ff ff eb 97 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 48 8b 46 30 48 85 c0 74 07 48 8b 80 48 02 00 00 <48> 8b 50 08 48 85 d2 48 0f 44 d0 31 c0 48 8b 72 50 48 8b 56 30 48
[ 2131.435158] RSP: 0018:ffff8eb78439fd58 EFLAGS: 00010246
[ 2131.435885] RAX: 0000000000000000 RBX: ffff88e8c1328ca0 RCX: 0000000000000000
[ 2131.436842] RDX: ffffffff814d9640 RSI: ffff88e8c147f900 RDI: ffff88e8d20369d8
[ 2131.437803] RBP: ffff8eb78439fda0 R08: 0000000000000008 R09: ffffffff82627360
[ 2131.438772] R10: 000000000000022b R11: 000000000000000c R12: ffff88e8d20369d8
[ 2131.439733] R13: ffff88e8daa3f000 R14: ffffffff8242b970 R15: ffff88e8c1328c80
[ 2131.440507] FS: 00007f1246cd2740(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.441366] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.441973] CR2: 0000000000000008 CR3: 000000003d71a000 CR4: 00000000001506e0
[ 2131.442738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.443499] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.444266] Call Trace:
[ 2131.444555] ? show_vfsmnt+0x1b1/0x200
[ 2131.444943] m_show+0x1a/0x20
[ 2131.445274] seq_read_iter+0x2c8/0x490
[ 2131.445689] new_sync_read+0x10d/0x190
[ 2131.446078] vfs_read+0x128/0x180
[ 2131.446445] ksys_read+0x67/0xe0
[ 2131.446798] __x64_sys_read+0x19/0x20
[ 2131.447199] do_syscall_64+0x38/0x50
[ 2131.447596] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.448112] RIP: 0033:0x7f1246dc0052
[ 2131.449276] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.451250] RSP: 002b:00007fff2f611858 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.452043] RAX: ffffffffffffffda RBX: 0000559259536320 RCX: 00007f1246dc0052
[ 2131.452817] RDX: 0000000000000400 RSI: 0000559259536500 RDI: 0000000000000003
[ 2131.453591] RBP: 00007f1246e8f300 R08: 0000000000000003 R09: 00007f1246e8da60
[ 2131.454348] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000000
[ 2131.455090] R13: 0000000000000d68 R14: 00007f1246e8e700 R15: 0000000000000d68
[ 2131.455972] Modules linked in: bnep
[ 2131.456367] CR2: 0000000000000008
[ 2131.456766] ---[ end trace bec3d6eec9de38c7 ]---
[ 2131.457269] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.457791] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.459769] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.460332] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.461082] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.461865] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.462654] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.463417] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.464174] FS: 00007f1246cd2740(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.465052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.465686] CR2: 0000000000000008 CR3: 000000003d71a000 CR4: 00000000001506e0
[ 2131.466477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.467258] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.473108] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.475231] #PF: supervisor read access in kernel mode
[ 2131.476231] #PF: error_code(0x0000) - not-present page
[ 2131.477057] PGD 0 P4D 0
[ 2131.477479] Oops: 0000 [#1019] PREEMPT SMP PTI
[ 2131.478197] CPU: 0 PID: 15777 Comm: mount Tainted: G D 5.10.37-yocto-standard #1
[ 2131.479593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.481374] RIP: 0010:kernfs_sop_show_path+0x1c/0x60
[ 2131.482177] Code: b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f0 48 8b 76 30 48 89 e5 48 85 f6 74 07 48 8b b6 48 02 00 00 <48> 8b 46 08 48 85 c0 48 0f 44 c6 48 8b 50 50 48 8b 42 30 48 85 c0
[ 2131.485150] RSP: 0018:ffff8eb7843b7d48 EFLAGS: 00010246
[ 2131.485938] RAX: ffffffff814d9700 RBX: ffff88e8c1328ca0 RCX: 0000000000000001
[ 2131.486660] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88e8c3efa2d0
[ 2131.487398] RBP: ffff8eb7843b7d48 R08: ffff88e8c147f900 R09: 0000000000000001
[ 2131.488123] R10: ffffffff8263d97c R11: ffff88e8dc3e1317 R12: ffff88e8c3efa2d0
[ 2131.488845] R13: ffff88e8daa3f000 R14: ffff88e8c3efa2f8 R15: ffff88e8fd7e2600
[ 2131.489567] FS: 00007f659ae24580(0000) GS:ffff88e8fec00000(0000) knlGS:0000000000000000
[ 2131.490386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.490971] CR2: 0000000000000008 CR3: 000000001210e000 CR4: 00000000001506f0
[ 2131.491698] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.492420] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 2131.493140] Call Trace:
[ 2131.493401] show_mountinfo+0x8b/0x330
[ 2131.493799] m_show+0x1a/0x20
[ 2131.494108] seq_read_iter+0x2c8/0x490
[ 2131.494494] new_sync_read+0x10d/0x190
[ 2131.494880] vfs_read+0x128/0x180
[ 2131.495228] ksys_read+0x67/0xe0
[ 2131.495562] __x64_sys_read+0x19/0x20
[ 2131.495941] do_syscall_64+0x38/0x50
[ 2131.496312] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.496826] RIP: 0033:0x7f659af8d052
[ 2131.497209] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.499081] RSP: 002b:00007ffcce6434b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.499854] RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f659af8d052
[ 2131.500575] RDX: 0000000000004000 RSI: 0000563edb40d310 RDI: 0000000000000003
[ 2131.501296] RBP: 00007ffcce643540 R08: 0000563edb40d310 R09: 00007f659b05aa60
[ 2131.502017] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000003
[ 2131.502747] R13: 0000000000000000 R14: 00007f659ae24500 R15: 0000563edb40d310
[ 2131.503477] Modules linked in: bnep
[ 2131.503841] CR2: 0000000000000008
[ 2131.504206] ---[ end trace bec3d6eec9de38c8 ]---
[ 2131.504685] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.505169] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.507080] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.507630] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.508362] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.509089] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.509816] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.510549] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.511304] FS: 00007f659ae24580(0000) GS:ffff88e8fec00000(0000) knlGS:0000000000000000
[ 2131.512133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.512727] CR2: 0000000000000008 CR3: 000000001210e000 CR4: 00000000001506f0
[ 2131.514232] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.514962] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 2131.551760] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.553085] #PF: supervisor read access in kernel mode
[ 2131.554013] #PF: error_code(0x0000) - not-present page
[ 2131.554944] PGD 0 P4D 0
[ 2131.555432] Oops: 0000 [#1020] PREEMPT SMP PTI
[ 2131.556229] CPU: 3 PID: 15802 Comm: grep Tainted: G D 5.10.37-yocto-standard #1
[ 2131.557750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.559732] RIP: 0010:kernfs_sop_show_options+0x15/0x50
[ 2131.560667] Code: 10 48 c7 c0 f2 ff ff ff eb 97 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 48 8b 46 30 48 85 c0 74 07 48 8b 80 48 02 00 00 <48> 8b 50 08 48 85 d2 48 0f 44 d0 31 c0 48 8b 72 50 48 8b 56 30 48
[ 2131.563701] RSP: 0018:ffff8eb784427d58 EFLAGS: 00010246
[ 2131.564254] RAX: 0000000000000000 RBX: ffff88e8c1328ca0 RCX: 0000000000000000
[ 2131.565000] RDX: ffffffff814d9640 RSI: ffff88e8c147f900 RDI: ffff88e8d20369d8
[ 2131.565765] RBP: ffff8eb784427da0 R08: 0000000000000008 R09: ffffffff82627360
[ 2131.566514] R10: 000000000000022b R11: 000000000000000c R12: ffff88e8d20369d8
[ 2131.567264] R13: ffff88e8daa3f000 R14: ffffffff8242b970 R15: ffff88e8c1328c80
[ 2131.568013] FS: 00007f28130cc740(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.568873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.569486] CR2: 0000000000000008 CR3: 0000000004708000 CR4: 00000000001506e0
[ 2131.570243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.571002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.571782] Call Trace:
[ 2131.572062] ? show_vfsmnt+0x1b1/0x200
[ 2131.572466] m_show+0x1a/0x20
[ 2131.572799] seq_read_iter+0x2c8/0x490
[ 2131.573204] new_sync_read+0x10d/0x190
[ 2131.573627] vfs_read+0x128/0x180
[ 2131.573971] ksys_read+0x67/0xe0
[ 2131.574327] __x64_sys_read+0x19/0x20
[ 2131.574726] do_syscall_64+0x38/0x50
[ 2131.575098] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.575773] RIP: 0033:0x7f28131ba052
[ 2131.576143] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.578113] RSP: 002b:00007fffc6474718 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.579699] RAX: ffffffffffffffda RBX: 00005629054f3320 RCX: 00007f28131ba052
[ 2131.580455] RDX: 0000000000000400 RSI: 00005629054f3500 RDI: 0000000000000003
[ 2131.581214] RBP: 00007f2813289300 R08: 0000000000000003 R09: 00007f2813287a60
[ 2131.581962] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000000
[ 2131.582729] R13: 0000000000000d68 R14: 00007f2813288700 R15: 0000000000000d68
[ 2131.583489] Modules linked in: bnep
[ 2131.583869] CR2: 0000000000000008
[ 2131.584254] ---[ end trace bec3d6eec9de38c9 ]---
[ 2131.584752] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.585275] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.587273] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.587845] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.588620] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.589385] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.590137] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.590909] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.591706] FS: 00007f28130cc740(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.592577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.593210] CR2: 0000000000000008 CR3: 0000000004708000 CR4: 00000000001506e0
[ 2131.593989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.594785] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.600460] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.602355] #PF: supervisor read access in kernel mode
[ 2131.603527] #PF: error_code(0x0000) - not-present page
[ 2131.604353] PGD 0 P4D 0
[ 2131.604782] Oops: 0000 [#1021] PREEMPT SMP PTI
[ 2131.605619] CPU: 0 PID: 15806 Comm: mount Tainted: G D 5.10.37-yocto-standard #1
[ 2131.607001] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.608803] RIP: 0010:kernfs_sop_show_path+0x1c/0x60
[ 2131.609603] Code: b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f0 48 8b 76 30 48 89 e5 48 85 f6 74 07 48 8b b6 48 02 00 00 <48> 8b 46 08 48 85 c0 48 0f 44 c6 48 8b 50 50 48 8b 42 30 48 85 c0
[ 2131.612590] RSP: 0018:ffff8eb78443fd48 EFLAGS: 00010246
[ 2131.613432] RAX: ffffffff814d9700 RBX: ffff88e8c1328ca0 RCX: 0000000000000001
[ 2131.614578] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88e8c3efa2d0
[ 2131.615575] RBP: ffff8eb78443fd48 R08: ffff88e8c147f900 R09: 0000000000000001
[ 2131.616319] R10: ffffffff8263d97c R11: ffff88e8dc3e1317 R12: ffff88e8c3efa2d0
[ 2131.617042] R13: ffff88e8daa3f000 R14: ffff88e8c3efa2f8 R15: ffff88e8fd7e2600
[ 2131.617768] FS: 00007f5eb7d5c580(0000) GS:ffff88e8fec00000(0000) knlGS:0000000000000000
[ 2131.618589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.619207] CR2: 0000000000000008 CR3: 000000001131a000 CR4: 00000000001506f0
[ 2131.619955] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.620679] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 2131.621404] Call Trace:
[ 2131.621666] show_mountinfo+0x8b/0x330
[ 2131.622056] m_show+0x1a/0x20
[ 2131.622366] seq_read_iter+0x2c8/0x490
[ 2131.622755] new_sync_read+0x10d/0x190
[ 2131.623149] vfs_read+0x128/0x180
[ 2131.623540] ksys_read+0x67/0xe0
[ 2131.623875] __x64_sys_read+0x19/0x20
[ 2131.624258] do_syscall_64+0x38/0x50
[ 2131.624630] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.625147] RIP: 0033:0x7f5eb7ec5052
[ 2131.625519] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.627433] RSP: 002b:00007ffdd589ed58 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.628220] RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f5eb7ec5052
[ 2131.628947] RDX: 0000000000004000 RSI: 00005621afedf310 RDI: 0000000000000003
[ 2131.629672] RBP: 00007ffdd589ede0 R08: 00005621afedf310 R09: 00007f5eb7f92a60
[ 2131.630399] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000003
[ 2131.631122] R13: 0000000000000000 R14: 00007f5eb7d5c500 R15: 00005621afedf310
[ 2131.631890] Modules linked in: bnep
[ 2131.632255] CR2: 0000000000000008
[ 2131.632632] ---[ end trace bec3d6eec9de38ca ]---
[ 2131.633120] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.633613] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.635650] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.636194] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.636924] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.637661] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.638406] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.639136] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.639919] FS: 00007f5eb7d5c580(0000) GS:ffff88e8fec00000(0000) knlGS:0000000000000000
[ 2131.640750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.641353] CR2: 0000000000000008 CR3: 000000001131a000 CR4: 00000000001506f0
[ 2131.642090] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.642824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
[ 2131.678894] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.683398] #PF: supervisor read access in kernel mode
[ 2131.683942] #PF: error_code(0x0000) - not-present page
[ 2131.684492] PGD 0 P4D 0
[ 2131.684774] Oops: 0000 [#1022] PREEMPT SMP PTI
[ 2131.685247] CPU: 2 PID: 15831 Comm: grep Tainted: G D 5.10.37-yocto-standard #1
[ 2131.686238] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.687474] RIP: 0010:kernfs_sop_show_options+0x15/0x50
[ 2131.688025] Code: 10 48 c7 c0 f2 ff ff ff eb 97 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 48 8b 46 30 48 85 c0 74 07 48 8b 80 48 02 00 00 <48> 8b 50 08 48 85 d2 48 0f 44 d0 31 c0 48 8b 72 50 48 8b 56 30 48
[ 2131.689986] RSP: 0018:ffff8eb784497d58 EFLAGS: 00010246
[ 2131.690559] RAX: 0000000000000000 RBX: ffff88e8c1328ca0 RCX: 0000000000000000
[ 2131.691304] RDX: ffffffff814d9640 RSI: ffff88e8c147f900 RDI: ffff88e8c3e080f0
[ 2131.692045] RBP: ffff8eb784497da0 R08: 0000000000000008 R09: ffffffff82627360
[ 2131.692944] R10: 000000000000022b R11: 000000000000000c R12: ffff88e8c3e080f0
[ 2131.693799] R13: ffff88e8daa3f000 R14: ffffffff8242b970 R15: ffff88e8c1328c80
[ 2131.694580] FS: 00007fd2e715d740(0000) GS:ffff88e8fed00000(0000) knlGS:0000000000000000
[ 2131.695576] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.696176] CR2: 0000000000000008 CR3: 0000000014322000 CR4: 00000000001506e0
[ 2131.696937] DR0: 0000558661313290 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.697707] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 2131.698722] Call Trace:
[ 2131.699033] ? show_vfsmnt+0x1b1/0x200
[ 2131.699454] m_show+0x1a/0x20
[ 2131.699792] seq_read_iter+0x2c8/0x490
[ 2131.700180] new_sync_read+0x10d/0x190
[ 2131.700608] vfs_read+0x128/0x180
[ 2131.700954] ksys_read+0x67/0xe0
[ 2131.701312] __x64_sys_read+0x19/0x20
[ 2131.701713] do_syscall_64+0x38/0x50
[ 2131.702084] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.702645] RIP: 0033:0x7fd2e724b052
[ 2131.703015] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.705018] RSP: 002b:00007ffe0fbf7368 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.705826] RAX: ffffffffffffffda RBX: 00005633a5489320 RCX: 00007fd2e724b052
[ 2131.706656] RDX: 0000000000000400 RSI: 00005633a5489500 RDI: 0000000000000003
[ 2131.707696] RBP: 00007fd2e731a300 R08: 0000000000000003 R09: 00007fd2e7318a60
[ 2131.708451] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000000
[ 2131.709983] R13: 0000000000000d68 R14: 00007fd2e7319700 R15: 0000000000000d68
[ 2131.710763] Modules linked in: bnep
[ 2131.711127] CR2: 0000000000000008
[ 2131.711551] ---[ end trace bec3d6eec9de38cb ]---
[ 2131.712048] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.712553] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.714535] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.715097] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.715874] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.716655] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.717412] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.718169] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.718940] FS: 00007fd2e715d740(0000) GS:ffff88e8fed00000(0000) knlGS:0000000000000000
[ 2131.719820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.720438] CR2: 0000000000000008 CR3: 0000000014322000 CR4: 00000000001506e0
[ 2131.721210] DR0: 0000558661313290 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.721962] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 2131.728339] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 2131.730096] #PF: supervisor read access in kernel mode
[ 2131.731361] #PF: error_code(0x0000) - not-present page
[ 2131.732385] PGD 0 P4D 0
[ 2131.732907] Oops: 0000 [#1023] PREEMPT SMP PTI
[ 2131.733802] CPU: 3 PID: 15835 Comm: mount Tainted: G D 5.10.37-yocto-standard #1
[ 2131.735537] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2131.737738] RIP: 0010:kernfs_sop_show_path+0x1c/0x60
[ 2131.738721] Code: b6 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f0 48 8b 76 30 48 89 e5 48 85 f6 74 07 48 8b b6 48 02 00 00 <48> 8b 46 08 48 85 c0 48 0f 44 c6 48 8b 50 50 48 8b 42 30 48 85 c0
[ 2131.742383] RSP: 0018:ffff8eb78449fd48 EFLAGS: 00010246
[ 2131.743437] RAX: ffffffff814d9700 RBX: ffff88e8c1328ca0 RCX: 0000000000000001
[ 2131.744301] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88e8d20369d8
[ 2131.745041] RBP: ffff8eb78449fd48 R08: ffff88e8c147f900 R09: 0000000000000001
[ 2131.745803] R10: ffffffff8263d97c R11: ffff88e8f572f317 R12: ffff88e8d20369d8
[ 2131.746567] R13: ffff88e8daa3f000 R14: ffff88e8d2036a00 R15: ffff88e8ce719c00
[ 2131.747316] FS: 00007f4d5c6b7580(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.748152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.748776] CR2: 0000000000000008 CR3: 00000000122e0000 CR4: 00000000001506e0
[ 2131.749540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.750281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2131.751023] Call Trace:
[ 2131.751305] show_mountinfo+0x8b/0x330
[ 2131.751715] m_show+0x1a/0x20
[ 2131.752024] seq_read_iter+0x2c8/0x490
[ 2131.752434] new_sync_read+0x10d/0x190
[ 2131.752837] vfs_read+0x128/0x180
[ 2131.753180] ksys_read+0x67/0xe0
[ 2131.753555] __x64_sys_read+0x19/0x20
[ 2131.753933] do_syscall_64+0x38/0x50
[ 2131.754322] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2131.754860] RIP: 0033:0x7f4d5c820052
[ 2131.755251] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 c5 d3 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2131.757310] RSP: 002b:00007ffda90edce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2131.758096] RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f4d5c820052
[ 2131.758856] RDX: 0000000000004000 RSI: 0000561bd49c6310 RDI: 0000000000000003
[ 2131.759624] RBP: 00007ffda90edd70 R08: 0000561bd49c6310 R09: 00007f4d5c8eda60
[ 2131.760369] R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000003
[ 2131.761109] R13: 0000000000000000 R14: 00007f4d5c6b7500 R15: 0000561bd49c6310
[ 2131.761871] Modules linked in: bnep
[ 2131.762251] CR2: 0000000000000008
[ 2131.762652] ---[ end trace bec3d6eec9de38cc ]---
[ 2131.763146] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2131.763657] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2131.765645] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2131.766251] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2131.767005] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2131.767784] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2131.768569] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2131.769320] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2131.770077] FS: 00007f4d5c6b7580(0000) GS:ffff88e8fed80000(0000) knlGS:0000000000000000
[ 2131.770953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2131.771597] CR2: 0000000000000008 CR3: 00000000122e0000 CR4: 00000000001506e0
[ 2131.772352] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2131.773108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2136.795714] BUG: kernel NULL pointer dereference, address: 000000000000000d
[ 2136.798484] #PF: supervisor read access in kernel mode
[ 2136.799299] #PF: error_code(0x0000) - not-present page
[ 2136.799845] PGD 0 P4D 0
[ 2136.800129] Oops: 0000 [#1024] PREEMPT SMP PTI
[ 2136.800604] CPU: 1 PID: 15843 Comm: rm Tainted: G D 5.10.37-yocto-standard #1
[ 2136.801508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2136.802706] RIP: 0010:security_inode_getattr+0xd/0x50
[ 2136.803272] Code: 4d 85 f6 75 e3 31 c0 5b 41 5c 41 5d 41 5e 5d c3 31 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 08 48 8b 40 30 <f6> 40 0d 02 75 35 55 48 89 e5 41 54 49 89 fc 53 48 8b 1d dc 91 fe
[ 2136.805275] RSP: 0018:ffff8eb7844d7e20 EFLAGS: 00010246
[ 2136.805910] RAX: 0000000000000000 RBX: ffff8eb7844d7e80 RCX: 000000002a1ce201
[ 2136.806665] RDX: 000000002a1ce1c1 RSI: ffffffff8144b20c RDI: ffff8eb7844d7e30
[ 2136.807432] RBP: ffff8eb7844d7e70 R08: 0000000000000064 R09: 0000000000000000
[ 2136.808197] R10: ffff88e8c1481f00 R11: 000070756f726763 R12: 0000000000000000
[ 2136.808951] R13: 0000000000000005 R14: 0000564ec0f757e8 R15: 0000000000000900
[ 2136.809714] FS: 00007f4bdb1965c0(0000) GS:ffff88e8fec80000(0000) knlGS:0000000000000000
[ 2136.810575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2136.811204] CR2: 000000000000000d CR3: 000000002cc44000 CR4: 00000000001506e0
[ 2136.812083] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2136.812843] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2136.813615] Call Trace:
[ 2136.813896] ? vfs_statx+0x87/0x120
[ 2136.814289] __do_sys_newfstatat+0x36/0x70
[ 2136.814738] ? fsnotify_find_mark+0x16/0x80
[ 2136.815210] ? iterate_dir+0x121/0x1c0
[ 2136.815634] ? fput+0x13/0x20
[ 2136.815970] ? filp_close+0x60/0x70
[ 2136.816362] __x64_sys_newfstatat+0x1c/0x20
[ 2136.816824] do_syscall_64+0x38/0x50
[ 2136.817225] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2136.817785] RIP: 0033:0x7f4bdb0bb70e
[ 2136.818188] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b 15 29
[ 2136.820303] RSP: 002b:00007fffa2df5358 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[ 2136.821123] RAX: ffffffffffffffda RBX: 0000564ec0f756e0 RCX: 00007f4bdb0bb70e
[ 2136.821897] RDX: 0000564ec0f75758 RSI: 0000564ec0f757e8 RDI: 0000000000000005
[ 2136.822667] RBP: 0000564ec0f742f0 R08: 0000000000000000 R09: 00007f4bdb189a60
[ 2136.823445] R10: 0000000000000100 R11: 0000000000000246 R12: 0000564ec0f75758
[ 2136.824234] R13: 0000000000000000 R14: 0000000000000000 R15: 0000564ec0f7d870
[ 2136.825004] Modules linked in: bnep
[ 2136.825393] CR2: 000000000000000d
[ 2136.826208] ---[ end trace bec3d6eec9de38cd ]---
[ 2136.826731] RIP: 0010:d_alloc_parallel+0xd5/0x570
[ 2136.827320] Code: 00 48 89 83 a0 00 00 00 e8 28 9e 9c 00 4c 89 65 80 65 48 8b 04 25 00 6d 01 00 48 89 85 70 ff ff ff e8 3f f4 e7 ff 48 8b 43 30 <44> 8b a0 28 02 00 00 44 8b 3d ed 1c 3b 01 41 f6 c7 01 74 0f f3 90
[ 2136.829415] RSP: 0018:ffff8eb785587c50 EFLAGS: 00010202
[ 2136.829986] RAX: 0000000000000000 RBX: ffff88e8c147f900 RCX: 0000000000000000
[ 2136.830836] RDX: ffff88e8c147f9a0 RSI: ffffffff82433240 RDI: ffff88e8c147f958
[ 2136.831689] RBP: ffff8eb785587cf0 R08: 00000000000000c0 R09: ffff8eb785587dc0
[ 2136.832512] R10: ffff88e8c147f900 R11: ffffff8b919a989e R12: ffff88e8c147f900
[ 2136.833351] R13: 000000009655cd16 R14: ffffffff82bff370 R15: ffff88e8c147f958
[ 2136.834121] FS: 00007f4bdb1965c0(0000) GS:ffff88e8fec80000(0000) knlGS:0000000000000000
[ 2136.835062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2136.835757] CR2: 000000000000000d CR3: 000000002cc44000 CR4: 00000000001506e0
[ 2136.836574] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2136.837397] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2157.360870] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
[ 2157.363369] EXT4-fs (loop0): mounted filesystem without journal. Opts: (null)
[ 2157.364829] ext2 filesystem being mounted at /opt/ltp/ltp-i2b5MCkGm9/LTP_df01.8NtMZ3LBLa/mntpoint supports timestamps until 2038 (0x7fffffff)
[ 2457.550406] EXT4-fs (loop1): mounting ext3 file system using the ext4 subsystem
[ 2457.562617] EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
[ 2457.563892] ext3 filesystem being mounted at /opt/ltp/ltp-i2b5MCkGm9/LTP_df01.JKoAa3Ab5v/mntpoint supports timestamps until 2038 (0x7fffffff)
[ 2757.421401] EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: (null)
[ 2757.422417] ext4 filesystem being mounted at /opt/ltp/ltp-i2b5MCkGm9/LTP_df01.d8q4Q6tnah/mntpoint supports timestamps until 2038 (0x7fffffff)


How to provide info for a hung ltp build

Richard Purdie
 

Hi All,

When we encounter a hung ltp build I wanted to document what we need to do
as a best practise for debugging it. What we need to do is:

a) ssh to the worker where the build is hanging

b) Look at the output of "ps ax" or similar and determine the hung
process which is hanging. You can filter with "ps ax | grep /qemuarm64-ltp/"
since the path for an ltp build will contain it's name (changing to x86 where
appropriate).

c) From the qemu process commandline, spot it's IP address. Often it is 192.168.7.2
but the last digit can/will vary.

d) "ssh root@192.168.7.2" to attempt to login to the qemu VM. You may need to handle
host cert mismatches as normal for ssh.

e) Within the vm, spot where it is hanging. Often, "top" will show nothing actively
using the cpu. The output of "ps" is key, where we can attempt to spot which ltp
test is/was running. "cgroup_xattr" and "proc01" are two examples of test names 
which we've seen hang and have now disabled. If you can't see what is hanging,
save the ps output into the bug and ping me+Alexandre for further analysis.

f) Another tip if we know the process that is hanging is to run 
"ls -la /proc/<pid>/fd" which will list the open files the test has open.

I appreciate not everyone has worker ssh access so if you do not, please let 
someone who does (Alexandre, Ross, Micheal, Armin, Saul, myself) know if
you spot one of these.

Cheers,

Richard


arm worker pthread lockup state dump

Richard Purdie
 

Dumping state on this failure/lockup:

https://autobuilder.yoctoproject.org/typhoon/#/builders/97/builds/2753

Was locked up in eSDK testing of core-image-minimal in execution of
"devtool sdk-install meta-extsdk-toolchain".

python was sitting in pthread_cond_wait()

no other processes, no active fds that were interesting.

In the end I broke the lock in gdb which resulted in:

NOTE: Starting bitbake server...
Loading cache...done.
Loaded 0 entries from dependency cache.
Parsing recipes...done.
Parsing of 825 .bb files complete (0 cached, 825 parsed). 1459 targets, 53 skipped, 0 masked, 0 errors.
INFO: meta-extsdk-toolchain is already installed
Fatal Python error: drop_gil: PyCOND_WAIT(gil->switch_cond) failed
Python runtime state: finalizing (tstate=0xaaab08055150)
Current thread 0x0000ffffb6bae010 (most recent call first):
File "/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/lib/python3.9/_weakrefset.py", line 39 in _remove
Aborted (core dumped)


i.e. it was sitting waiting on the python GIL.

Looks like a pthread bug.

pthreads being used were:

/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libpthread.so.0

i.e. *our* pthreads (and our python3.9).

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello 민재,

I'm sorry for the late e-mail, SWAT duty will rotate from Jon to you at
the end of this Friday.

Please reply to let me know whether you will be able to work on this
task next week.

--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Another qemu hang

Richard Purdie
 

https://autobuilder.yoctoproject.org/typhoon/#/builders/45/builds/3463

which fails during an on-target compile during testimage:

pokybuild@debian8-ty-1:~/yocto-worker/musl-qemux86-64/build/build-renamed/tmp/work/qemux86_64-poky-linux-musl/core-image-sato-sdk/1.0-r0/testimage$ tail qemu_boot_log.20210521005453 -n 30
matchbox: Cant find a keycode for keysym 269025056
matchbox: ignoring key shortcut XF86Calendar=!$contacts

matchbox: Cant find a keycode for keysym 2809
matchbox: ignoring key shortcut telephone=!$dates

matchbox: Cant find a keycode for keysym 269025050
matchbox: ignoring key shortcut XF86Start=!matchbox-remote -desktop

[settings daemon] Forking. run with -n to prevent fork
dbus-daemon[987]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=0 pid=978 comm="connman-applet ")
matchbox-wm: X error warning (0xa00003): BadWindow (invalid Window parameter) (opcode: 12)
dbus-daemon[987]: Successfully activated service 'org.a11y.atspi.Registry'
SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry

** (matchbox-desktop:1019): [1;33mWARNING[0m **: [34m00:56:44.266[0m: Error loading icon: Icon 'net-48d24' not present in theme Sato

** (matchbox-desktop:1019): [1;33mWARNING[0m **: [34m00:56:44.267[0m: Error loading icon: Icon 'terminal' not present in theme Sato
[ 114.624630] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 21140 jiffies s: 221 root: 0x2/.
[ 119.752630] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 119.752636] rcu: 1-...0: (790 ticks this GP) idle=1ba/1/0x4000000000000000 softirq=11006/11008 fqs=5237
[ 119.752639] (detected by 0, t=21002 jiffies, g=24737, q=3)
[ 124.557907] rcu: blocking rcu_node structures:
[ 124.558693] Task dump for CPU 1:
[ 124.562791] task:cc1 state:R running task stack: 0 pid:10807 ppid: 10806 flags:0x00000008
[ 124.565007] Call Trace:
[ 124.565489] ? do_user_addr_fault+0x1dd/0x3d0
[ 124.566257] ? exc_page_fault+0x6a/0x150
[ 124.566983] ? asm_exc_page_fault+0x8/0x30
[ 124.567676] ? asm_exc_page_fault+0x1e/0x30

which is interesting in that it does say task:cc1

Couldn't find any further helpful debug info.

Cheers,

Richard


Some qemu hang debugging

Richard Purdie
 

I had a look at:

https://autobuilder.yoctoproject.org/typhoon/#/builders/44/builds/3486

which has the kernel failure:

https://www.rpsys.net/wp/rp/qemu_boot_log.20210521003454

I also looked for corresponding logs from Randy+team's work:

https://autobuilder.yocto.io/pub/non-release/20210520-16/testresults/multilib/2021-05-21--05-49/host_stats_6_top.txt

top - 00:50:02 up 6 days, 3:24, 1 user, load average: 53.63, 55.09, 51.52
Tasks: 798 total, 37 running, 467 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.8 us, 2.3 sy, 12.2 ni, 79.5 id, 4.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13192559+total, 45415784 free, 4200920 used, 82308888 buff/cache
KiB Swap: 8388604 total, 8261144 free, 127460 used. 12631652+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
54259 pokybui+ 15 -5 2378860 186996 15956 S 457.9 0.1 45:09.11 /home/pokybuild/yocto-worker/multilib/build/build/tmp/work/x86_64-linux/qemu-helper-native/1.0-r1/recipe-sysroot-native/usr/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02 -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -drive file=/home/pokybuild/tmp/core-image-sato-qemux86-64.ext4.54107,if=virtio,format=raw -usb -devi+
10809 pokybui+ 20 0 522052 307100 10756 R 115.8 0.2 6:08.96 bitbake-server /home/pokybuild/yocto-worker/no-x11/build/bitbake/bin/bitbake-server decafbad 3 5 /home/pokybuild/yocto-worker/no-x11/build/build/bitbake-cookerdaemon.log /home/pokybuild/yocto-worker/no-x11/build/build/bitbake.lock /home/pokybuild/yocto-worker/no-x11/build/build/bitbake.sock 0 None 0
5661 pokybui+ 25 5 73212 46356 20324 R 100.0 0.0 0:00.26 /home/pokybuild/yocto-worker/no-x11/build/build/tmp/work/core2-64-poky-linux/gstreamer1.0-plugins-bad/1.18.4-r0/recipe-sysroot-native/usr/bin/x86_64-poky-linux/../../libexec/x86_64-poky-linux/gcc/x86_64-poky-linux/11.1.0/cc1 -quiet -I tests/check/elements_pnm.p -I tests/check -I ../gst-plugins-bad-1.18.4/tests/check -I . -I ../gst-plugins-bad-1.18.4 -I gst-libs -I ../gst-plugins-bad-1.18.4/gst-libs -I gst-libs/gst/interfaces -I /home/poky+


The jumps in time in the kernel log are odd, as are the incomplete rcu 
stall detection traces.

The image started at about 00:35 and was timed out at 01:00 so at
the time above, it had been running for 15 mins, 45 mins of execution 
time and using 457% cpu.

The io detector didn't trigger most of the time this was running, the 
system is 80% idle and a load average of 50 on a 56 core system isn't bad.
This is not looking like a load problem...

I have a suggestion to try turning off kvm-clock which may have caused the time
jumps so that is the next thing I can try changing.

Cheers,

Richard


Re: SWAT statistics for week 19

Ross Burton
 

On Tue, 18 May 2021 at 23:26, Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:
I'm maintaining a spreadsheet that goes a bit beyond that. I'm also
tracking the frequency of the bugs in the last months and we started to
close few of the older AB-INT issues. I'll share that publicly soon.
Awesome, great.

Ross

81 - 100 of 226