Re: qemux86-64 ltp kernel panic


Bruce Ashfield <bruce.ashfield@...>
 

Interesting.

The last two stalls/traps that you sent were both x86-64. Are we only
seeing this on x86 ?

I'll have another look at the patches we carry, but there really
shouldn't be anything that could trigger something like this (famous
last words).

If these are happening regularly enough, we could change the
qemux86-64 branch to v5.10/base temporarily. That would be a pure
upstream -stable, and would help rule out the yocto patches if the
issues still occur.

Bruce

On Thu, Jun 3, 2021 at 7:03 AM Richard Purdie
<richard.purdie@...> wrote:

https://autobuilder.yoctoproject.org/typhoon/#/builders/95/builds/1969

died with the following kernel panic:

[ 2650.301002] option changes via remount are deprecated (pid=15940 comm=mount)
[ 2652.496768] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 2652.508974] BUG: unable to handle page fault for address: ffff937f3d8de6b8
[ 2652.510710] #PF: supervisor instruction fetch in kernel mode
[ 2652.512028] #PF: error_code(0x0011) - permissions violation
[ 2652.513312] PGD 6001067 P4D 6001067 PUD 6002067 PMD 800000003d8000e3
[ 2652.514819] Oops: 0011 [#1] PREEMPT SMP PTI
[ 2652.515810] CPU: 3 PID: 16301 Comm: cat Not tainted 5.10.41-yocto-standard #1
[ 2652.517484] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 2652.520045] RIP: 0010:0xffff937f3d8de6b8
[ 2652.520951] Code: ff ff 88 e6 8d 3d 7f 93 ff ff 48 a5 f4 10 7f 93 ff ff 90 df 8d 3d 7f 93 ff ff a8 e6 8d 3d 7f 93 ff ff a8 e6 8d 3d 7f 93 ff ff <70> 57 39 02 7f 93 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 2652.526109] RSP: 0018:ffff9c9100120f28 EFLAGS: 00010282
[ 2652.527415] RAX: ffff937f3d8de6b8 RBX: ffff937f3edaacc0 RCX: ffff937f02395770
[ 2652.529173] RDX: 0000000000000000 RSI: ffff9c9100120f38 RDI: ffff937f02395770
[ 2652.530946] RBP: ffff9c9100120f80 R08: 0000026b359c768e R09: 7fffffffffffffff
[ 2652.532696] R10: 000002698b120200 R11: ffffffffb56060c0 R12: 000000000000000a
[ 2652.534440] R13: ffff9c9100120f38 R14: 0000000000000000 R15: ffff937f3edaad30
[ 2652.536221] FS: 00007fec565b55c0(0000) GS:ffff937f3ed80000(0000) knlGS:0000000000000000
[ 2652.538214] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2652.539621] CR2: ffff937f3d8de6b8 CR3: 0000000010ed6000 CR4: 00000000001506e0
[ 2652.541434] Call Trace:
[ 2652.542026] <IRQ>
[ 2652.542527] ? rcu_core+0x235/0x570
[ 2652.543383] rcu_core_si+0xe/0x10
[ 2652.544200] __do_softirq+0xd2/0x2bc
[ 2652.545096] asm_call_irq_on_stack+0x12/0x20
[ 2652.546123] </IRQ>
[ 2652.546674] do_softirq_own_stack+0x3d/0x50
[ 2652.547724] irq_exit_rcu+0x8f/0xc0
[ 2652.548597] sysvec_apic_timer_interrupt+0x35/0x80
[ 2652.549760] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2652.550944] RIP: 0010:syscall_enter_from_user_mode+0xd/0x30
[ 2652.552246] Code: f0 00 74 12 65 81 05 3e c1 3f 4b 00 00 ef ff e8 39 fd ff ff 5d c3 0f 0b 0f 1f 44 00 00 48 89 f0 fb 65 48 8b 14 25 00 6d 01 00 <48> 8b 32 f7 c6 c1 01 00 10 75 01 c3 55 48 89 e5 e8 be 54 4c ff 5d
[ 2652.556557] RSP: 0018:ffff9c910392ff38 EFLAGS: 00000246
[ 2652.557777] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2652.559440] RDX: ffff937f2eeb2880 RSI: 0000000000000000 RDI: ffff9c910392ff58
[ 2652.561096] RBP: ffff9c910392ff48 R08: 0000000000000000 R09: 0000000000000000
[ 2652.562780] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c910392ff58
[ 2652.564451] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2652.566116] ? do_syscall_64+0x14/0x50
[ 2652.567014] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2652.568201] RIP: 0033:0x7fec564db072
[ 2652.569062] Code: c0 e9 d2 fe ff ff 48 8d 3d 6b e1 09 00 50 e8 a5 d4 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 2652.573522] RSP: 002b:00007ffcd5e88a88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2652.575307] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fec564db072
[ 2652.577020] RDX: 0000000000020000 RSI: 00007fec563cd000 RDI: 0000000000000003
[ 2652.578738] RBP: 00007fec563cd000 R08: 00007fec563cc010 R09: 0000000000000000
[ 2652.580468] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020000
[ 2652.582097] R13: 0000000000000003 R14: 0000000000000002 R15: 0000000000020000
[ 2652.583725] Modules linked in: bnep
[ 2652.584555] CR2: ffff937f3d8de6b8
[ 2652.585360] ---[ end trace 9a47e1a3be6020d4 ]---
[ 2652.586434] RIP: 0010:0xffff937f3d8de6b8
[ 2652.587342] Code: ff ff 88 e6 8d 3d 7f 93 ff ff 48 a5 f4 10 7f 93 ff ff 90 df 8d 3d 7f 93 ff ff a8 e6 8d 3d 7f 93 ff ff a8 e6 8d 3d 7f 93 ff ff <70> 57 39 02 7f 93 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 2652.592461] RSP: 0018:ffff9c9100120f28 EFLAGS: 00010282
[ 2652.593716] RAX: ffff937f3d8de6b8 RBX: ffff937f3edaacc0 RCX: ffff937f02395770
[ 2652.595448] RDX: 0000000000000000 RSI: ffff9c9100120f38 RDI: ffff937f02395770
[ 2652.597180] RBP: ffff9c9100120f80 R08: 0000026b359c768e R09: 7fffffffffffffff
[ 2652.598888] R10: 000002698b120200 R11: ffffffffb56060c0 R12: 000000000000000a
[ 2652.600692] R13: ffff9c9100120f38 R14: 0000000000000000 R15: ffff937f3edaad30
[ 2652.602337] FS: 00007fec565b55c0(0000) GS:ffff937f3ed80000(0000) knlGS:0000000000000000
[ 2652.604223] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2652.605556] CR2: ffff937f3d8de6b8 CR3: 0000000010ed6000 CR4: 00000000001506e0
[ 2652.607091] Kernel panic - not syncing: Fatal exception in interrupt
[ 2652.608216] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2652.609892] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

ltp test log looks to be testimage/ltp_log.20210602224847/controllers-raw.log:

Checking for required user/group ids

'nobody' user id and group found.
'bin' user id and group found.
'daemon' user id and group found.
Users group found.
Sys group found.
Required users/groups exist.
no big block device was specified on commandline.
Tests which require a big block device are disabled.
You can specify it with option -z
INFO: Test start time: Wed Jun 2 23:32:06 UTC 2021
COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 18685 -n 18685 -p -f /opt/ltp/ltp-quKlbyFeTO/alltests -l
/opt/ltp/results/controllers -C /opt/ltp/output/LTP_RUN_ON-controllers.failed -T /opt/ltp/output/LTP_RUN_ON-
controllers.tconf
LOG File: /opt/ltp/results/controllers
FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-controllers.failed
TCONF COMMAND File: /opt/ltp/output/LTP_RUN_ON-controllers.tconf
Running tests.......
cgroup_regression_test 1 TINFO: timeout per run is 0h 5m 0s
cgroup_regression_test 1 TPASS: no kernel bug was found
cgroup_regression_test 2 TPASS: notify_on_release is inherited
cgroup_regression_test 3 TCONF: CONFIG_SCHED_DEBUG is not enabled
cgroup_regression_test 4 TCONF: CONFIG_LOCKDEP is not enabled
cgroup_regression_test 5 TPASS: no kernel bug was found
cgroup_regression_test 6 TCONF: CONFIG_CGROUP_NS is NOT supported in Kernels >= 3.0
mount: /opt/ltp/ltp-quKlbyFeTO/LTP_cgroup_regression_test.XlkWzxbviv/cgroup: xxx already mounted or mount
point busy.
cgroup_regression_test 7 TFAIL: failed to mount pids
cgroup_regression_test 7 TPASS: no kernel bug was found for test 1
cgroup_regression_test 7 TCONF: skip rest of testing due possible oops triggered by reading /proc/sched_debug
cgroup_regression_test 7 TPASS: no kernel bug was found for test 2
cgroup_regression_test 8 TPASS: no kernel bug was found
client_loop: send disconnect: Broken pipe

--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II

Join {swat@lists.yoctoproject.org to automatically receive all group messages.