Re: Strange sporadic build issues (incremental builds in docker container)


Matthias Klein
 

Hi Trevor,

thank you very much for the detailed answer.

Yes, you are right, it is mostly the same recipes that fail. But they also change from time to time.
Today it happened to me even without Jenkins and Docker, normally in the console with the recipe keymaps_1.0.bb.

With the nighly builds over the Jenkins I help myself at the moment that I delete build/tmp before.
So far, the problem has not occurred again.

Many greetings,
Matthias

-----Ursprüngliche Nachricht-----
Von: Trevor Woerner <twoerner@...>
Gesendet: Dienstag, 29. März 2022 18:23
An: Alexander Kanavin <alex.kanavin@...>
Cc: Matthias Klein <matthias.klein@...>; yocto@...
Betreff: Re: [yocto] Strange sporadic build issues (incremental builds in docker container)

On Thu 2022-03-24 @ 09:31:25 AM, Alexander Kanavin wrote:
I don't. You need to inspect the build tree to find clues why the
patch was applied twice. Or simpy wipe tmp/ before builds, if your
sstate works properly that won't make the builds longer.

Alex

On Thu, 24 Mar 2022 at 07:43, Matthias Klein <matthias.klein@...> wrote:

Hello Alex,

it occurred again:

NOTE: recipe gawk-5.1.1-r0: task do_patch: Succeeded
NOTE: Running task 1673 of 4524
(/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recip
es-devtools/python/python3-six_1.16.0.bb:do_patch)
NOTE: recipe firstboot-1.0-r0: task do_populate_sysroot: Started
NOTE: recipe keymaps-1.0-r31: task do_patch: Started
NOTE: recipe python3-six-1.16.0-r0: task do_patch: Started
NOTE: recipe python3-six-1.16.0-r0: task do_patch: Succeeded
NOTE: Running task 1676 of 4524
(/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recip
es-devtools/perl/perl_5.34.1.bb:do_patch)
NOTE: recipe e2fsprogs-1.46.5-r0: task do_patch: Succeeded
ERROR: keymaps-1.0-r31 do_patch: Applying patch 'GPLv2.patch' on target directory '/var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31'
CmdError('quilt --quiltrc /var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31/recipe-sysroot-native/etc/quiltrc push', 0, 'stdout:
stderr: File series fully applied, ends at patch GPLv2.patch
')
ERROR: Logfile of failure stored in:
/var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspber
rypi3_64-poky-linux/keymaps/1.0-r31/temp/log.do_patch.353982
NOTE: recipe keymaps-1.0-r31: task do_patch: Failed
NOTE: Running task 1679 of 4524
(/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recip
es-bsp/alsa-state/alsa-state.bb:do_patch)
ERROR: Task (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-bsp/keymaps/keymaps_1.0.bb:do_patch) failed with exit code '1'

Do you have an idea?

Best regards,
Matthias

-----Ursprüngliche Nachricht-----
Von: Alexander Kanavin <alex.kanavin@...>
Gesendet: Dienstag, 22. März 2022 10:26
An: Matthias Klein <matthias.klein@...>
Cc: yocto@...
Betreff: Re: [yocto] Strange sporadic build issues (incremental
builds in docker container)

It's hard to say without the full error message, and the build directory of the affected recipe. The easy way out is to simply wipe tmp/ before each build.

Alex

On Tue, 22 Mar 2022 at 09:51, Matthias Klein <matthias.klein@...> wrote:

Hello together,

I am building various kirkstone/master yoctos every night via Jenkins inside a Debian Bullseye Docker container.
These are incremental builds, reusing the build directory and sstate-cache of the previous build. The different yoctos are built in order. Each time, a new Docker container is launched.
(The same environment builds dunfell yoctos without any problems).

Now it happens sporadically that one of the builds aborts with the following message:

stderr: The series file no longer matches the applied patches. Please run 'quilt pop -a'.

They are usually alternating packages where the patch step fails with the above message. Also different yoctos are affected. But it is always the above message.
If I then restart the failed build it usually builds cleanly.

Does anyone have an idea in which direction the problem goes?
Yes I've been seeing exactly these issues as well.

I'm not using any sort of virtualization, I'm using Jenkins to do nightly builds directly on my host. My host machine is openSUSE 15.3. These problems started on Feb 21 for me.

Each of my builds starts by doing a "git pull" on each of the repositories, then kicks off a build if any of the repositories changed. A fresh build will always succeed. Doing a "clean" and rebuilding will (I believe) always succeed. My gut feeling is that it somehow has something to do with having an existing build, refreshing the repositories, then rebuilding.

I spent weeks trying to find a reproducer. I wrote a script to checkout one version of the repositories (before), build, checkout a newer version of the repositories (after) and rebuilding. Even in cases where I used the exact same hashes that had failed on my Jenkins build and repeating 20 times, in some cases I wasn't able to reproduce the error. I was able to find 1 reproducer involving a build for an imx28evk MACHINE, but even then after 20 iterations
13 were bad and 7 were good. I repeated that set of 20 builds many times and it was never 100% bad.

My investigations led me to believe that it might be related to rm_work and/or BB_NUMBER_THREADS/PARALLEL_MAKE. In my Jenkins builds I enable 'INHERIT += "rm_work"' and I also limit the BB_NUMBER_THREADS and set PARALLEL_MAKE. On the cmdline I was able to reduce the number of failures (sometimes to none) by removing the rm_work and THREADS/PARALLEL, but never completely eliminate it.
In Jenkins the build failures still felt as random as they were without the change, so I can't say that it's having much effect in Jenkins, but seems to have some effect on the cmdline.

I can say this with certainty: Matthias says it seems that the specific recipe that fails is random, but it's not. In every case the recipe that fails is a recipe whose source files are contained in the meta layer itself. For me the failing recipes were always:
modutils-initscripts
initscripts

If you look at the recipes for those packages they do not have a SRC_URI that fetches code from some remote location then uses quilt to apply some patches.
In both cases all of the "source" code exists in the layer itself, and somehow quilt is involved in placing them in the build area.

I have dozens and dozens of these failures recorded and it is always with a recipe that follows that pattern. But 99%-ish percent of the failures are with the two packages I listed above.

The failures aren't related to days when those packages change. The failures are just... sporadic.

So the issue is related to:
- recipes with in-layer sources
- quilt (being run twice (?))
- updating layers, and rebuilding in a build area with an existing build
- Feb 21 2022 (or thereabouts)

The issue might be related to:
- jenkins?
- my build host?
- rm_work?
- BB_NUMBER_THREADS?
- PARALLEL_MAKE?

Join {yocto@lists.yoctoproject.org to automatically receive all group messages.