Re: sstate causing stripped kernel vs symbols mismatch

Sean McKay

The simplest thing I've found is checking/comparing the BuildID that GCC embeds in the ELF file after I force it to recompile. Eg:
$ file tmp/work/qemux86_64-poky-linux/linux-yocto/5.2.28+gitAUTOINC+dd6019025c_992280855e-r0/linux-qemux86_64-standard-build/vmlinux | egrep -o "BuildID\[sha1\]=[0-9a-f]*"

Is that what you were asking for?
Presumably we could also actually hash the actual vmlinux file for comparisons at the do_compile stage, but I was originally comparing stripped vs unstripped, so I had to go by the BuildID.

Side note: from the kernel documentation, it looks like there are 4 main things that could affect reproducibility:
Timestamps, build directory, user account name, and hostname.
I assume they'd be easiest to tackle sequentially in that order.

This is the documentation I've been referencing:


-----Original Message-----
From: Bruce Ashfield <bruce.ashfield@...>
Sent: Thursday, April 9, 2020 10:52 AM
To: McKay, Sean <sean.mckay@...>
Cc: Joshua Watt <jpewhacker@...>; yocto@...
Subject: Re: [yocto] sstate causing stripped kernel vs symbols mismatch

On Thu, Apr 9, 2020 at 1:21 PM Sean McKay <sean.mckay@...> wrote:

I don’t know offhand, but the kernel documentation seems relatively straightforward.

I can start investigating in that direction and see how complex it looks like it’s going to be.
I can tweak linux-yocto in the direction of reproducibility without much trouble (for the build part). But I'm a bit out of my normal flow for testing that it really is reproducible. So if anyone can point me at what they are running to currently test that .. I can do the build part.


When you say that reproducible builds are turned on by default, is there a flag somewhere that can be used to turn that off that I need to gate these changes behind? Or can they be made globally so that the reproducibility can’t be turned off (easily)?

Do we expect to generally be okay with letting this sort of race condition remain in sstate? I concede that it’s probably okay, since I think the kernel is the only thing with this kind of forking task tree behavior after do_compile, and if we get 100% reproducible builds working, it’s not overly relevant… but it seems like it probably deserves a warning somewhere in the documentation.

I can also bring this question to the next technical meeting (I know I just missed one) if it seems the sort of thing we need to get consensus.



From: Joshua Watt <jpewhacker@...>
Sent: Thursday, April 9, 2020 10:00 AM
To: McKay, Sean <sean.mckay@...>; yocto@...
Subject: Re: [yocto] sstate causing stripped kernel vs symbols

On 4/9/20 11:42 AM, Sean McKay wrote:

Anyone have any thoughts or guidance on this?

It seems like a pretty major bug to me.

We’re willing to put the work in to fix it, and if it’s not something the upstream community is interested in, I’ll just pick a solution for us and go with it.

But if it’s something that we’d like me to upstream, I’d like some feedback on which path I should start walking down before I start taking things apart.

We have had a recent push for reproducible builds (and they are now enabled by default). Do you have any idea how much effort it would take to make the kernel build reproducibly? It's something we probably want anyway, and can add to the automated testing infrastructure to ensure it doesn't regress.



From: yocto@... <yocto@...> On
Behalf Of Sean McKay
Sent: Tuesday, April 7, 2020 12:03 PM
To: yocto@...
Subject: [yocto] sstate causing stripped kernel vs symbols mismatch

Hi all,

We’ve discovered that (quite frequently) the kernel that we deploy
doesn’t match the unstripped one that we’re saving for debug symbols.
I’ve traced the issue to a combination of an sstate miss for the
kernel do_deploy step combined with an sstate hit for
do_package_write_rpm. (side note: we know we have issues with sstate
reuse/stamps including things they shouldn’t which is why we hit this
so much. We’re working on that too)

The result is that when our debug rootfs is created (where we added the kernel symbols), it’s got the version of the kernel from the sstate cached rpm files, but since do_deploy had an sstate miss, the entire kernel gets rebuilt to satisfy that dependency chain. Since the kernel doesn’t have reproducible builds working, the resulting pair of kernels don’t match each other for debug purposes.

So, I have two questions to start:

What is the recommended way to be getting debug symbols for the
kernel, since do_deploy doesn’t seem to have a debug counterpart (which is why we originally just set things up to add the rpm to the generated debug rootfs) Does this seem like a bug that should be fixed? If so, what would be the recommended solution (more thoughts below)?

Even if there’s a task somewhere that does what I’m looking for, this seems like a bit of a bug. I generally feel like we want to be able to trust sstate, so the fact that forking dependencies that each generate their own sstate objects can be out of sync is a bit scary.

I’ve thought of several ways around this, but I can’t say I like any of them.

(extremely gross hack) Create a new task to use instead of do_deploy that depends on do_packagegroup_write_rpm. Unpack the restored (or built) RPMs and use those blobs to deploy the kernel and symbols to the image directory.
(gross hack with painful effects on build time) Disable sstate for
do_package_write_rpm and do_deploy. Possibly replace with sstate logic for the kernel’s do_install step (side question – why doesn’t do_install generate sstate? It seems like it should be able to, since the point is to drop everything into the image directory) (possibly better, but sounds hard) Change the sstate logic so that if anything downstream of a do_compile task needs to be rerun, everything downstream of it needs to be rerun and sstate reuse for that recipe is not allowed (basically all or nothing sstate). Maybe with a flag that’s allowed in the bitbake file to indicate that a recipe does have reproducible builds and that different pieces are allowed to come from sstate in that case.
(fix the symptoms but not the problem) Figure out how to get linux-yocto building in a reproducible fashion and pretend the problem doesn’t exist.

If you’re interested, this is quite easy to reproduce – these are my
repro steps

Check out a clean copy of zeus (22.0.2) Add kernel-image to
core-image-minimal in whatever fashion you choose (I just dumped it in
the RDEPENDS for packagegroup-core-boot for testing) bitbake
core-image-minimal bitbake -c clean core-image-minimal linux-yocto (or
just wipe your whole build dir, since everything should come from sstate now) Delete the sstate object(s) for linux-yocto’s deploy task.
bitbake core-image-minimal
Compare the BuildID hashes for the kernel in the two locations using
file (you’ll need to use the kernel’s extract-vmlinux script to get it
out of the bzImage)

tmp/deploy/images/qemux86-64/bzImage > vmlinux-deploy && file

Anyone have thoughts or suggestions?


-Sean McKay

- Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end
- "Use the force Harry" - Gandalf, Star Trek II

Join to automatically receive all group messages.