Re: ltp failures on autobuilder

Richard Purdie

On Thu, 2021-06-10 at 14:52 -0400, Randy MacLeod wrote:
On 2021-06-10 1:02 p.m., Richard Purdie wrote:
Noting down what we know about the ltp issue:

We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.

I was able to isolate a patch which reproduces the issue for me:

with master-next, setting:

IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
It became clear there is also an:

IMAGE_CLASSES += "testimage"

but the INHERIT people are using shouldn't make any real difference to 
the test.


bitbake core-image-sato; bitbake core-image-sato -c testimage

where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*

The above patch runs the minimum of ltp tests I could find which replicate the issue.

I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.

My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
I tried to reproduce this on a Ubuntu-18.04.3 system with:
   Linux ala-lpggp3 5.4.0-72-generic #80~18.04.1-Ubuntu SMP
     Mon Apr 12 23:26:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Using poky-contrib:

$ git status
On branch rpurdie/t222
Your branch is up to date with 'origin/rpurdie/t222'.

nothing to commit, working tree clean

$ git log --oneline -3
d7d65aae10 (HEAD -> rpurdie/t222, origin/rpurdie/t222)
            ltp: Simplify for kernel crash reproducer
e175e2855d linx-yocto/5.10: re-import aufs to v5.10
753ae7dcd5 linux-yocto: test-only. override LINUX_VERSION for qemux86-64


My local.conf was generated, then I added:

IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
INHERIT += "testimage"

and with 11 runs of:

$ bitbake core-image-sato -c testimage

I did not see the error in any of the qemu logs.

(cd tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/; ls
qemu_boot_log.20210610155947 qemu_boot_log.20210610170535
qemu_boot_log.20210610172026 qemu_boot_log.20210610173508
qemu_boot_log.20210610174959 qemu_boot_log.20210610180444
qemu_boot_log.20210610165758 qemu_boot_log.20210610171302
qemu_boot_log.20210610172743 qemu_boot_log.20210610174235

$ rgrep BUG:

There is an OOM run as RP sees as well and the typical lead-up to that is:

[ 225.248350]
   hrtimer: interrupt took 4935186 ns
I've never seen that hrtimer message...

[ 249.250283]
   option changes via remount are deprecated (pid=3001 comm=mount)
[ 250.200586]
   option changes via remount are deprecated (pid=3019 comm=mount)
[ 283.695208]
   memcg_test_1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL),
     order=0, oom_score_adj=0
[ 283.702108]
   CPU: 1 PID: 3798 Comm: memcg_test_1 Not tainted
     5.10.42-yocto-standard #1
but I do see the above.

Note that I am running this on a server without x11 forwarding.
Is your testing done on a local machine Richard? I doubt it matters
but I just want to be sure we understand how you are testing.
Mine is a local machine and I am using X11 forwarding over ssh to my laptop.

I am going to try on another server running ubu-21.04.
You mean 20.04?

Paul can't reproduce either which makes me wonder what detail we're missing...

In the interests of full disclosure, my local.conf also has:

BB_DISKMON_DIRS (set to default I think)
BUILD_REPRODUCIBLE_BINARIES_pn-nativesdk-python3 = '0'
DISTRO ?= "poky"
EXTRA_IMAGE_FEATURES ?= "debug-tweaks"
IMAGE_INSTALL_append = " ssh-pregen-hostkeys"
PACKAGE_CLASSES = "package_rpm package_ipk package_deb"
PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"
PACKAGECONFIG_append_pn-qemu-system-native = " gtk+"
PACKAGECONFIG_append_pn-qemu-system-native = " sdl"
QEMU_USE_KVM_qemux86-64 = "True"
QEMU_USE_KVM_qemux86 = "True"
USER_CLASSES ?= "buildstats image-prelink"

and I have SSTATE_DIR and DL_DIR set.



Join { to automatically receive all group messages.