Date
1 - 8 of 8
ltp failures on autobuilder
Richard Purdie
Noting down what we know about the ltp issue:
We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.
I was able to isolate a patch which reproduces the issue for me:
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
with master-next, setting:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
then
bitbake core-image-sato; bitbake core-image-sato -c testimage
where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
The above patch runs the minimum of ltp tests I could find which replicate the issue.
I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.
My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
Cheers,
Richard
We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.
I was able to isolate a patch which reproduces the issue for me:
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
with master-next, setting:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
then
bitbake core-image-sato; bitbake core-image-sato -c testimage
where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
The above patch runs the minimum of ltp tests I could find which replicate the issue.
I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.
My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
Cheers,
Richard
On 2021-06-10 1:02 p.m., Richard Purdie wrote:
Linux ala-lpggp3 5.4.0-72-generic #80~18.04.1-Ubuntu SMP
Mon Apr 12 23:26:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Using poky-contrib:
$ git status
On branch rpurdie/t222
Your branch is up to date with 'origin/rpurdie/t222'.
nothing to commit, working tree clean
$ git log --oneline -3
d7d65aae10 (HEAD -> rpurdie/t222, origin/rpurdie/t222)
ltp: Simplify for kernel crash reproducer
e175e2855d linx-yocto/5.10: re-import aufs to v5.10
753ae7dcd5 linux-yocto: test-only. override LINUX_VERSION for qemux86-64
---
My local.conf was generated, then I added:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
INHERIT += "testimage"
and with 11 runs of:
$ bitbake core-image-sato -c testimage
I did not see the error in any of the qemu logs.
(cd tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/; ls qemu_boot_log.202106101*)
qemu_boot_log.20210610155947 qemu_boot_log.20210610170535 qemu_boot_log.20210610172026 qemu_boot_log.20210610173508 qemu_boot_log.20210610174959 qemu_boot_log.20210610180444
qemu_boot_log.20210610165758 qemu_boot_log.20210610171302 qemu_boot_log.20210610172743 qemu_boot_log.20210610174235 qemu_boot_log.20210610175720
$ rgrep BUG: tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/*
There is an OOM run as RP sees as well and the typical lead-up to that is:
[ 225.248350]
hrtimer: interrupt took 4935186 ns
[ 249.250283]
option changes via remount are deprecated (pid=3001 comm=mount)
[ 250.200586]
option changes via remount are deprecated (pid=3019 comm=mount)
[ 283.695208]
memcg_test_1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL),
order=0, oom_score_adj=0
[ 283.702108]
CPU: 1 PID: 3798 Comm: memcg_test_1 Not tainted
5.10.42-yocto-standard #1
Note that I am running this on a server without x11 forwarding.
Is your testing done on a local machine Richard? I doubt it matters
but I just want to be sure we understand how you are testing.
I am going to try on another server running ubu-21.04.
../Randy
--
# Randy MacLeod
# Wind River Linux
Noting down what we know about the ltp issue:I tried to reproduce this on a Ubuntu-18.04.3 system with:
We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.
I was able to isolate a patch which reproduces the issue for me:
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
with master-next, setting:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
then
bitbake core-image-sato; bitbake core-image-sato -c testimage
where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
The above patch runs the minimum of ltp tests I could find which replicate the issue.
I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.
My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
Linux ala-lpggp3 5.4.0-72-generic #80~18.04.1-Ubuntu SMP
Mon Apr 12 23:26:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Using poky-contrib:
$ git status
On branch rpurdie/t222
Your branch is up to date with 'origin/rpurdie/t222'.
nothing to commit, working tree clean
$ git log --oneline -3
d7d65aae10 (HEAD -> rpurdie/t222, origin/rpurdie/t222)
ltp: Simplify for kernel crash reproducer
e175e2855d linx-yocto/5.10: re-import aufs to v5.10
753ae7dcd5 linux-yocto: test-only. override LINUX_VERSION for qemux86-64
---
My local.conf was generated, then I added:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
INHERIT += "testimage"
and with 11 runs of:
$ bitbake core-image-sato -c testimage
I did not see the error in any of the qemu logs.
(cd tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/; ls qemu_boot_log.202106101*)
qemu_boot_log.20210610155947 qemu_boot_log.20210610170535 qemu_boot_log.20210610172026 qemu_boot_log.20210610173508 qemu_boot_log.20210610174959 qemu_boot_log.20210610180444
qemu_boot_log.20210610165758 qemu_boot_log.20210610171302 qemu_boot_log.20210610172743 qemu_boot_log.20210610174235 qemu_boot_log.20210610175720
$ rgrep BUG: tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/*
There is an OOM run as RP sees as well and the typical lead-up to that is:
[ 225.248350]
hrtimer: interrupt took 4935186 ns
[ 249.250283]
option changes via remount are deprecated (pid=3001 comm=mount)
[ 250.200586]
option changes via remount are deprecated (pid=3019 comm=mount)
[ 283.695208]
memcg_test_1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL),
order=0, oom_score_adj=0
[ 283.702108]
CPU: 1 PID: 3798 Comm: memcg_test_1 Not tainted
5.10.42-yocto-standard #1
Note that I am running this on a server without x11 forwarding.
Is your testing done on a local machine Richard? I doubt it matters
but I just want to be sure we understand how you are testing.
I am going to try on another server running ubu-21.04.
../Randy
Cheers,
Richard
--
# Randy MacLeod
# Wind River Linux
Richard Purdie
On Thu, 2021-06-10 at 14:52 -0400, Randy MacLeod wrote:
IMAGE_CLASSES += "testimage"
but the INHERIT people are using shouldn't make any real difference to
the test.
Paul can't reproduce either which makes me wonder what detail we're missing...
In the interests of full disclosure, my local.conf also has:
BB_DISKMON_DIRS (set to default I think)
BUILD_REPRODUCIBLE_BINARIES_pn-nativesdk-python3 = '0'
CONF_VERSION = "1"
DISTRO ?= "poky"
EXTRA_IMAGE_FEATURES ?= "debug-tweaks"
IMAGE_INSTALL_append = " ssh-pregen-hostkeys"
PACKAGE_CLASSES = "package_rpm package_ipk package_deb"
PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"
PACKAGECONFIG_append_pn-qemu-system-native = " gtk+"
PACKAGECONFIG_append_pn-qemu-system-native = " sdl"
PATCHRESOLVE = "noop"
QEMU_USE_KVM_qemux86-64 = "True"
QEMU_USE_KVM_qemux86 = "True"
SANITY_TESTED_DISTROS = ""
USER_CLASSES ?= "buildstats image-prelink"
and I have SSTATE_DIR and DL_DIR set.
Cheers,
Richard
On 2021-06-10 1:02 p.m., Richard Purdie wrote:It became clear there is also an:Noting down what we know about the ltp issue:
We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.
I was able to isolate a patch which reproduces the issue for me:
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
with master-next, setting:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
IMAGE_CLASSES += "testimage"
but the INHERIT people are using shouldn't make any real difference to
the test.
I've never seen that hrtimer message...thenI tried to reproduce this on a Ubuntu-18.04.3 system with:
bitbake core-image-sato; bitbake core-image-sato -c testimage
where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
The above patch runs the minimum of ltp tests I could find which replicate the issue.
I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.
My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
Linux ala-lpggp3 5.4.0-72-generic #80~18.04.1-Ubuntu SMP
Mon Apr 12 23:26:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Using poky-contrib:
$ git status
On branch rpurdie/t222
Your branch is up to date with 'origin/rpurdie/t222'.
nothing to commit, working tree clean
$ git log --oneline -3
d7d65aae10 (HEAD -> rpurdie/t222, origin/rpurdie/t222)
ltp: Simplify for kernel crash reproducer
e175e2855d linx-yocto/5.10: re-import aufs to v5.10
753ae7dcd5 linux-yocto: test-only. override LINUX_VERSION for qemux86-64
---
My local.conf was generated, then I added:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
INHERIT += "testimage"
and with 11 runs of:
$ bitbake core-image-sato -c testimage
I did not see the error in any of the qemu logs.
(cd tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/; ls
qemu_boot_log.202106101*)
qemu_boot_log.20210610155947 qemu_boot_log.20210610170535
qemu_boot_log.20210610172026 qemu_boot_log.20210610173508
qemu_boot_log.20210610174959 qemu_boot_log.20210610180444
qemu_boot_log.20210610165758 qemu_boot_log.20210610171302
qemu_boot_log.20210610172743 qemu_boot_log.20210610174235
qemu_boot_log.20210610175720
$ rgrep BUG:
tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/*
There is an OOM run as RP sees as well and the typical lead-up to that is:
[ 225.248350]
hrtimer: interrupt took 4935186 ns
[ 249.250283]but I do see the above.
option changes via remount are deprecated (pid=3001 comm=mount)
[ 250.200586]
option changes via remount are deprecated (pid=3019 comm=mount)
[ 283.695208]
memcg_test_1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL),
order=0, oom_score_adj=0
[ 283.702108]
CPU: 1 PID: 3798 Comm: memcg_test_1 Not tainted
5.10.42-yocto-standard #1
Note that I am running this on a server without x11 forwarding.Mine is a local machine and I am using X11 forwarding over ssh to my laptop.
Is your testing done on a local machine Richard? I doubt it matters
but I just want to be sure we understand how you are testing.
I am going to try on another server running ubu-21.04.You mean 20.04?
Paul can't reproduce either which makes me wonder what detail we're missing...
In the interests of full disclosure, my local.conf also has:
BB_DISKMON_DIRS (set to default I think)
BUILD_REPRODUCIBLE_BINARIES_pn-nativesdk-python3 = '0'
CONF_VERSION = "1"
DISTRO ?= "poky"
EXTRA_IMAGE_FEATURES ?= "debug-tweaks"
IMAGE_INSTALL_append = " ssh-pregen-hostkeys"
PACKAGE_CLASSES = "package_rpm package_ipk package_deb"
PACKAGECONFIG_append_pn-nativesdk-qemu = " sdl"
PACKAGECONFIG_append_pn-qemu-system-native = " gtk+"
PACKAGECONFIG_append_pn-qemu-system-native = " sdl"
PATCHRESOLVE = "noop"
QEMU_USE_KVM_qemux86-64 = "True"
QEMU_USE_KVM_qemux86 = "True"
SANITY_TESTED_DISTROS = ""
USER_CLASSES ?= "buildstats image-prelink"
and I have SSTATE_DIR and DL_DIR set.
Cheers,
Richard
Richard Purdie
On Thu, 2021-06-10 at 18:02 +0100, Richard Purdie via lists.yoctoproject.org wrote:
IMAGE_CLASSES += "testimage"
QEMU_USE_KVM_qemux86-64 = "True"
additional key pieces of config.
We have confirmed that the issue is present:
* with gcc 11.1.1 and 10.3
* in hardknott
* if QB_SMP is disabled (i.e. in a single processor qemu)
* on 18.04, 20.04 and 21.04 Ubuntu host distros which have varying 5.4 and 5.11
host kernels
I was not able to make the bug appear with in gatesgarth as yet
(gcc 10.2, 5.8 kernel, qemu 5.1.0) (had to hack -b /dev/null to the ltp commandline)
I did backport the qemu platform, smp and qemu commandline changes back to
gatesgarth and it still doesn't crash.
I also found that setting CONFIG_DEBUG_KERNEL makes the issue 'go away'.
Since that is a large hammer, I tried:
CONFIG_DEBUG_KERNEL=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
as a .cfg to the kernel and that still reproduced the crash. However:
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
doesn't seem to want to reproduce the crash so something about
those three options seems to make things 'work'.
What does that all mean? No idea.
Cheers,
Richard
Noting down what we know about the ltp issue:also:
We've seen intermittent issues on the autobuilder where some ltp tests fail or
hang. I've been trying to figure out how to reproduce the issue and narrow down
the cause.
I was able to isolate a patch which reproduces the issue for me:
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
with master-next, setting:
IMAGE_INSTALL_append = ' ltp'
TEST_SUITES = 'ping ssh ltp'
IMAGE_CLASSES += "testimage"
QEMU_USE_KVM_qemux86-64 = "True"
thenGood news (for me) is that Randy and Paul can now reproduce this with the above
bitbake core-image-sato; bitbake core-image-sato -c testimage
where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
The above patch runs the minimum of ltp tests I could find which replicate the issue.
I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
(and we've ruled out linux-yocto with plain kernels)
Also reproduced on both qemu 6.0.0 and 5.2.0.
My build machine is an Ubuntu 20.04.2 LTS with:
Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021
additional key pieces of config.
We have confirmed that the issue is present:
* with gcc 11.1.1 and 10.3
* in hardknott
* if QB_SMP is disabled (i.e. in a single processor qemu)
* on 18.04, 20.04 and 21.04 Ubuntu host distros which have varying 5.4 and 5.11
host kernels
I was not able to make the bug appear with in gatesgarth as yet
(gcc 10.2, 5.8 kernel, qemu 5.1.0) (had to hack -b /dev/null to the ltp commandline)
I did backport the qemu platform, smp and qemu commandline changes back to
gatesgarth and it still doesn't crash.
I also found that setting CONFIG_DEBUG_KERNEL makes the issue 'go away'.
Since that is a large hammer, I tried:
CONFIG_DEBUG_KERNEL=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
as a .cfg to the kernel and that still reproduced the crash. However:
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
doesn't seem to want to reproduce the crash so something about
those three options seems to make things 'work'.
What does that all mean? No idea.
Cheers,
Richard
Richard Purdie
On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:
the crash. I can enable all the above apart from that and we can reproduce
it.
Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.
Cheers,
Richard
as a .cfg to the kernel and that still reproduced the crash. However:Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes"
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
the crash. I can enable all the above apart from that and we can reproduce
it.
Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.
Cheers,
Richard
Paul Gortmaker <paul.gortmaker@...>
[Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard Purdie wrote:
research on IRC, we are hopeful that this fixes it.
https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/
Paul.
--
On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:For the mail archive trail, and for those not follwing the ongoingas a .cfg to the kernel and that still reproduced the crash. However:Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes"
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
the crash. I can enable all the above apart from that and we can reproduce
it.
Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.
research on IRC, we are hopeful that this fixes it.
https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/
Paul.
--
Cheers,
Richard
Richard Purdie
On Wed, 2021-06-16 at 08:56 -0400, Paul Gortmaker wrote:
Curious what upstream will make of it now...
Cheers,
Richard
[Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard Purdie wrote:Awesome work in tracking that down, much appreciated, thanks!On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:For the mail archive trail, and for those not follwing the ongoingas a .cfg to the kernel and that still reproduced the crash. However:Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes"
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
the crash. I can enable all the above apart from that and we can reproduce
it.
Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.
research on IRC, we are hopeful that this fixes it.
https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/
Curious what upstream will make of it now...
Cheers,
Richard
On 2021-06-16 10:17 a.m., Richard Purdie wrote:
Dropped the wider list but as I said to Richard, I wanted to
confirm that this bug is really gone.
I ran 100 tests overnight and it seems to be dead Jim.
$ bitbake core-image-sato && \
for i in `seq 100`; do \
echo "--- " $i " ---"; \
timeout --kill-after=2m 10m bitbake core-image-sato -c testimage \
&& echo GOOD || echo BAD; \
done
$ ls -l tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/qemu_boot_log.202* | wc -l
100
$ grep -m1 BUG: tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/qemu_boot_log.2021* | wc -l
0
All the log files are there and roughly the same size
with similar but not idnetical contents due to differing timestamps and
non-deterministic ordering of some of the output.
../Randy
# Randy MacLeod
# Wind River Linux
On Wed, 2021-06-16 at 08:56 -0400, Paul Gortmaker wrote:[Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard Purdie wrote:Awesome work in tracking that down, much appreciated, thanks!On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:For the mail archive trail, and for those not follwing the ongoingas a .cfg to the kernel and that still reproduced the crash. However:Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes"
CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set
the crash. I can enable all the above apart from that and we can reproduce
it.
Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.
research on IRC, we are hopeful that this fixes it.
https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/
Curious what upstream will make of it now...
Dropped the wider list but as I said to Richard, I wanted to
confirm that this bug is really gone.
I ran 100 tests overnight and it seems to be dead Jim.
$ bitbake core-image-sato && \
for i in `seq 100`; do \
echo "--- " $i " ---"; \
timeout --kill-after=2m 10m bitbake core-image-sato -c testimage \
&& echo GOOD || echo BAD; \
done
$ ls -l tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/qemu_boot_log.202* | wc -l
100
$ grep -m1 BUG: tmp/work/qemux86_64-poky-linux/core-image-sato/1.0-r0/testimage/qemu_boot_log.2021* | wc -l
0
All the log files are there and roughly the same size
with similar but not idnetical contents due to differing timestamps and
non-deterministic ordering of some of the output.
../Randy
--
Cheers,
Richard
# Randy MacLeod
# Wind River Linux