We’ve discovered that (quite frequently) the kernel that we deploy doesn’t match the unstripped one that we’re saving for debug symbols. I’ve traced the issue to a combination of an sstate miss for the kernel do_deploy step combined with
an sstate hit for do_package_write_rpm. (side note: we know we have issues with sstate reuse/stamps including things they shouldn’t which is why we hit this so much. We’re working on that too)
The result is that when our debug rootfs is created (where we added the kernel symbols), it’s got the version of the kernel from the sstate cached rpm files, but since do_deploy had an sstate miss, the entire kernel gets rebuilt to satisfy
that dependency chain. Since the kernel doesn’t have reproducible builds working, the resulting pair of kernels don’t match each other for debug purposes.
So, I have two questions to start:
- What is the recommended way to be getting debug symbols for the kernel, since do_deploy doesn’t seem to have a debug counterpart (which is why we originally just set things up to add
the rpm to the generated debug rootfs)
- Does this seem like a bug that should be fixed? If so, what would be the recommended solution (more thoughts below)?
Even if there’s a task somewhere that does what I’m looking for, this seems like a bit of a bug. I generally feel like we want to be able to trust sstate, so the fact that forking dependencies that each generate their own sstate objects
can be out of sync is a bit scary.
I’ve thought of several ways around this, but I can’t say I like any of them.
- (extremely gross hack) Create a new task to use instead of do_deploy that depends on do_packagegroup_write_rpm. Unpack the restored (or built) RPMs and use those blobs to deploy the
kernel and symbols to the image directory.
- (gross hack with painful effects on build time) Disable sstate for do_package_write_rpm and do_deploy. Possibly replace with sstate logic for the kernel’s do_install step (side question
– why doesn’t do_install generate sstate? It seems like it should be able to, since the point is to drop everything into the image directory)
- (possibly better, but sounds hard) Change the sstate logic so that if anything downstream of a do_compile task needs to be rerun,
everything downstream of it needs to be rerun and sstate reuse for that recipe is not allowed (basically all or nothing sstate). Maybe with a flag that’s allowed in the bitbake file to indicate that a recipe
does have reproducible builds and that different pieces are allowed to come from sstate in that case.
- (fix the symptoms but not the problem) Figure out how to get linux-yocto building in a reproducible fashion and pretend the problem doesn’t exist.
If you’re interested, this is quite easy to reproduce – these are my repro steps
- Check out a clean copy of zeus (22.0.2)
- Add kernel-image to core-image-minimal in whatever fashion you choose (I just dumped it in the RDEPENDS for packagegroup-core-boot for testing)
- bitbake core-image-minimal
- bitbake -c clean core-image-minimal linux-yocto (or just wipe your whole build dir, since everything should come from sstate now)
- Delete the sstate object(s) for linux-yocto’s deploy task.
- bitbake core-image-minimal
- Compare the BuildID hashes for the kernel in the two locations using file (you’ll need to use the kernel’s extract-vmlinux script to get it out of the bzImage)
- file tmp/work/qemux86_64-poky-linux/core-image-minimal/1.0-r0/rootfs/boot/vmlinux-5.2.28-yocto-standard
- ./tmp/work-shared/qemux86-64/kernel-source/scripts/extract-vmlinux tmp/deploy/images/qemux86-64/bzImage > vmlinux-deploy && file vmlinux-deploy
Anyone have thoughts or suggestions?