Further rcu stall on autobuilder
Richard Purdie
We've got yet another rcu stall failure on the autobuilder:
https://autobuilder.yoctoproject.org/typhoon/#/builders/80/builds/2123/steps/15/logs/stdio and looking at the dmesg in the qemu log: [ 20.424033] Freeing unused kernel image (rodata/data gap) memory: 652K [ 20.425229] Run /sbin/init as init process INIT: version 2.99 booting FBIOPUT_VSCREENINFO failed, double buffering disabledStarting udev [ 20.547298] udevd[161]: starting version 3.2.10 [ 20.553329] udevd[162]: starting eudev-3.2.10 [ 20.751260] EXT4-fs (vda): re-mounted. Opts: (null) [ 20.752548] ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff) INIT: Entering runlevel: 5 Configuring network interfaces... RTNETLINK answers: File exists Starting random number generator daemon. Starting OpenBSD Secure Shell server: sshd done. Starting rpcbind daemon...done. starting statd: done Starting atd: OK [ 21.921925] Installing knfsd (copyright (C) 1996 okir@...). starting 8 nfsd kernel threads: [ 23.066283] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory [ 23.068096] NFSD: Using legacy client tracking operations. [ 23.069086] NFSD: starting 90-second grace period (net f0000098) done starting mountd: [ 45.272151] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 45.273423] rcu: 1-...0: (10 ticks this GP) idle=7ba/1/0x4000000000000000 softirq=598/612 fqs=5249 [ 45.274951] (detected by 2, t=21002 jiffies, g=-195, q=13) [ 138.202149] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 332.762209] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: This is with the kvm clock source disabled (in master-next) and with Bruce's 5.10.38 upgrade so that kind of rules out either of those two things for this issue. It also can't be the qemu platform or cpu emulation used since we've changed that. What is really odd is that it never actually prints the stalled tasks. That seems really strange. It is obviously alive enough to print a stall message later but stalls out and is terminated after 1500s. Really open to ideas at this point. Should we try a newer kernel version for testing in -next, see if we can isolate this to 5.10? Cheers, Richard |
|