Some qemu hang debugging


Richard Purdie
 

I had a look at:

https://autobuilder.yoctoproject.org/typhoon/#/builders/44/builds/3486

which has the kernel failure:

https://www.rpsys.net/wp/rp/qemu_boot_log.20210521003454

I also looked for corresponding logs from Randy+team's work:

https://autobuilder.yocto.io/pub/non-release/20210520-16/testresults/multilib/2021-05-21--05-49/host_stats_6_top.txt

top - 00:50:02 up 6 days, 3:24, 1 user, load average: 53.63, 55.09, 51.52
Tasks: 798 total, 37 running, 467 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.8 us, 2.3 sy, 12.2 ni, 79.5 id, 4.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13192559+total, 45415784 free, 4200920 used, 82308888 buff/cache
KiB Swap: 8388604 total, 8261144 free, 127460 used. 12631652+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
54259 pokybui+ 15 -5 2378860 186996 15956 S 457.9 0.1 45:09.11 /home/pokybuild/yocto-worker/multilib/build/build/tmp/work/x86_64-linux/qemu-helper-native/1.0-r1/recipe-sysroot-native/usr/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02 -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -drive file=/home/pokybuild/tmp/core-image-sato-qemux86-64.ext4.54107,if=virtio,format=raw -usb -devi+
10809 pokybui+ 20 0 522052 307100 10756 R 115.8 0.2 6:08.96 bitbake-server /home/pokybuild/yocto-worker/no-x11/build/bitbake/bin/bitbake-server decafbad 3 5 /home/pokybuild/yocto-worker/no-x11/build/build/bitbake-cookerdaemon.log /home/pokybuild/yocto-worker/no-x11/build/build/bitbake.lock /home/pokybuild/yocto-worker/no-x11/build/build/bitbake.sock 0 None 0
5661 pokybui+ 25 5 73212 46356 20324 R 100.0 0.0 0:00.26 /home/pokybuild/yocto-worker/no-x11/build/build/tmp/work/core2-64-poky-linux/gstreamer1.0-plugins-bad/1.18.4-r0/recipe-sysroot-native/usr/bin/x86_64-poky-linux/../../libexec/x86_64-poky-linux/gcc/x86_64-poky-linux/11.1.0/cc1 -quiet -I tests/check/elements_pnm.p -I tests/check -I ../gst-plugins-bad-1.18.4/tests/check -I . -I ../gst-plugins-bad-1.18.4 -I gst-libs -I ../gst-plugins-bad-1.18.4/gst-libs -I gst-libs/gst/interfaces -I /home/poky+


The jumps in time in the kernel log are odd, as are the incomplete rcu 
stall detection traces.

The image started at about 00:35 and was timed out at 01:00 so at
the time above, it had been running for 15 mins, 45 mins of execution 
time and using 457% cpu.

The io detector didn't trigger most of the time this was running, the 
system is 80% idle and a load average of 50 on a 56 core system isn't bad.
This is not looking like a load problem...

I have a suggestion to try turning off kvm-clock which may have caused the time
jumps so that is the next thing I can try changing.

Cheers,

Richard