Re: Strange sporadic build issues (incremental builds in docker container)

Trevor Woerner

On Thu 2022-03-24 @ 09:31:25 AM, Alexander Kanavin wrote:
I don't. You need to inspect the build tree to find clues why the
patch was applied twice. Or simpy wipe tmp/ before builds, if your
sstate works properly that won't make the builds longer.


On Thu, 24 Mar 2022 at 07:43, Matthias Klein <matthias.klein@...> wrote:

Hello Alex,

it occurred again:

NOTE: recipe gawk-5.1.1-r0: task do_patch: Succeeded
NOTE: Running task 1673 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-devtools/python/
NOTE: recipe firstboot-1.0-r0: task do_populate_sysroot: Started
NOTE: recipe keymaps-1.0-r31: task do_patch: Started
NOTE: recipe python3-six-1.16.0-r0: task do_patch: Started
NOTE: recipe python3-six-1.16.0-r0: task do_patch: Succeeded
NOTE: Running task 1676 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-devtools/perl/
NOTE: recipe e2fsprogs-1.46.5-r0: task do_patch: Succeeded
ERROR: keymaps-1.0-r31 do_patch: Applying patch 'GPLv2.patch' on target directory '/var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31'
CmdError('quilt --quiltrc /var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31/recipe-sysroot-native/etc/quiltrc push', 0, 'stdout:
stderr: File series fully applied, ends at patch GPLv2.patch
ERROR: Logfile of failure stored in: /var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31/temp/log.do_patch.353982
NOTE: recipe keymaps-1.0-r31: task do_patch: Failed
NOTE: Running task 1679 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-bsp/alsa-state/
ERROR: Task (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-bsp/keymaps/ failed with exit code '1'

Do you have an idea?

Best regards,

-----Ursprüngliche Nachricht-----
Von: Alexander Kanavin <alex.kanavin@...>
Gesendet: Dienstag, 22. März 2022 10:26
An: Matthias Klein <matthias.klein@...>
Cc: yocto@...
Betreff: Re: [yocto] Strange sporadic build issues (incremental builds in docker container)

It's hard to say without the full error message, and the build directory of the affected recipe. The easy way out is to simply wipe tmp/ before each build.


On Tue, 22 Mar 2022 at 09:51, Matthias Klein <matthias.klein@...> wrote:

Hello together,

I am building various kirkstone/master yoctos every night via Jenkins inside a Debian Bullseye Docker container.
These are incremental builds, reusing the build directory and sstate-cache of the previous build. The different yoctos are built in order. Each time, a new Docker container is launched.
(The same environment builds dunfell yoctos without any problems).

Now it happens sporadically that one of the builds aborts with the following message:

stderr: The series file no longer matches the applied patches. Please run 'quilt pop -a'.

They are usually alternating packages where the patch step fails with the above message. Also different yoctos are affected. But it is always the above message.
If I then restart the failed build it usually builds cleanly.

Does anyone have an idea in which direction the problem goes?
Yes I've been seeing exactly these issues as well.

I'm not using any sort of virtualization, I'm using Jenkins to do nightly
builds directly on my host. My host machine is openSUSE 15.3. These problems
started on Feb 21 for me.

Each of my builds starts by doing a "git pull" on each of the repositories,
then kicks off a build if any of the repositories changed. A fresh build will
always succeed. Doing a "clean" and rebuilding will (I believe) always
succeed. My gut feeling is that it somehow has something to do with having an
existing build, refreshing the repositories, then rebuilding.

I spent weeks trying to find a reproducer. I wrote a script to checkout one
version of the repositories (before), build, checkout a newer version of the
repositories (after) and rebuilding. Even in cases where I used the exact same
hashes that had failed on my Jenkins build and repeating 20 times, in some
cases I wasn't able to reproduce the error. I was able to find 1 reproducer
involving a build for an imx28evk MACHINE, but even then after 20 iterations
13 were bad and 7 were good. I repeated that set of 20 builds many times and
it was never 100% bad.

My investigations led me to believe that it might be related to rm_work and/or
BB_NUMBER_THREADS/PARALLEL_MAKE. In my Jenkins builds I enable 'INHERIT +=
"rm_work"' and I also limit the BB_NUMBER_THREADS and set PARALLEL_MAKE. On
the cmdline I was able to reduce the number of failures (sometimes to none) by
removing the rm_work and THREADS/PARALLEL, but never completely eliminate it.
In Jenkins the build failures still felt as random as they were without the
change, so I can't say that it's having much effect in Jenkins, but seems to
have some effect on the cmdline.

I can say this with certainty: Matthias says it seems that the specific
recipe that fails is random, but it's not. In every case the recipe that fails
is a recipe whose source files are contained in the meta layer itself. For me
the failing recipes were always:

If you look at the recipes for those packages they do not have a SRC_URI that
fetches code from some remote location then uses quilt to apply some patches.
In both cases all of the "source" code exists in the layer itself, and somehow
quilt is involved in placing them in the build area.

I have dozens and dozens of these failures recorded and it is always with a
recipe that follows that pattern. But 99%-ish percent of the failures are with
the two packages I listed above.

The failures aren't related to days when those packages change. The failures
are just... sporadic.

So the issue is related to:
- recipes with in-layer sources
- quilt (being run twice (?))
- updating layers, and rebuilding in a build area with an existing build
- Feb 21 2022 (or thereabouts)

The issue might be related to:
- jenkins?
- my build host?
- rm_work?

Join { to automatically receive all group messages.