Date
1 - 9 of 9
Strange sporadic build issues (incremental builds in docker container)
Matthias Klein
Hello together,
I am building various kirkstone/master yoctos every night via Jenkins inside a Debian Bullseye Docker container. These are incremental builds, reusing the build directory and sstate-cache of the previous build. The different yoctos are built in order. Each time, a new Docker container is launched. (The same environment builds dunfell yoctos without any problems). Now it happens sporadically that one of the builds aborts with the following message: stderr: The series file no longer matches the applied patches. Please run 'quilt pop -a'. They are usually alternating packages where the patch step fails with the above message. Also different yoctos are affected. But it is always the above message. If I then restart the failed build it usually builds cleanly. Does anyone have an idea in which direction the problem goes? Many greetings, Matthias |
|
Alexander Kanavin
It's hard to say without the full error message, and the build
toggle quoted message
Show quoted text
directory of the affected recipe. The easy way out is to simply wipe tmp/ before each build. Alex On Tue, 22 Mar 2022 at 09:51, Matthias Klein <matthias.klein@...> wrote:
|
|
Matthias Klein
Hello Alex,
toggle quoted message
Show quoted text
it occurred again: NOTE: recipe gawk-5.1.1-r0: task do_patch: Succeeded NOTE: Running task 1673 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-devtools/python/python3-six_1.16.0.bb:do_patch) NOTE: recipe firstboot-1.0-r0: task do_populate_sysroot: Started NOTE: recipe keymaps-1.0-r31: task do_patch: Started NOTE: recipe python3-six-1.16.0-r0: task do_patch: Started NOTE: recipe python3-six-1.16.0-r0: task do_patch: Succeeded NOTE: Running task 1676 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-devtools/perl/perl_5.34.1.bb:do_patch) NOTE: recipe e2fsprogs-1.46.5-r0: task do_patch: Succeeded ERROR: keymaps-1.0-r31 do_patch: Applying patch 'GPLv2.patch' on target directory '/var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31' CmdError('quilt --quiltrc /var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31/recipe-sysroot-native/etc/quiltrc push', 0, 'stdout: stderr: File series fully applied, ends at patch GPLv2.patch ') ERROR: Logfile of failure stored in: /var/jenkins_home/workspace/yocto-raspberrypi/build/tmp/work/raspberrypi3_64-poky-linux/keymaps/1.0-r31/temp/log.do_patch.353982 NOTE: recipe keymaps-1.0-r31: task do_patch: Failed NOTE: Running task 1679 of 4524 (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-bsp/alsa-state/alsa-state.bb:do_patch) ERROR: Task (/var/jenkins_home/workspace/yocto-raspberrypi/yocto/poky/meta/recipes-bsp/keymaps/keymaps_1.0.bb:do_patch) failed with exit code '1' Do you have an idea? Best regards, Matthias -----Ursprüngliche Nachricht----- Von: Alexander Kanavin <alex.kanavin@...> Gesendet: Dienstag, 22. März 2022 10:26 An: Matthias Klein <matthias.klein@...> Cc: yocto@... Betreff: Re: [yocto] Strange sporadic build issues (incremental builds in docker container) It's hard to say without the full error message, and the build directory of the affected recipe. The easy way out is to simply wipe tmp/ before each build. Alex On Tue, 22 Mar 2022 at 09:51, Matthias Klein <matthias.klein@...> wrote:
|
|
Alexander Kanavin
I don't. You need to inspect the build tree to find clues why the
toggle quoted message
Show quoted text
patch was applied twice. Or simpy wipe tmp/ before builds, if your sstate works properly that won't make the builds longer. Alex On Thu, 24 Mar 2022 at 07:43, Matthias Klein <matthias.klein@...> wrote:
|
|
Trevor Woerner
On Thu 2022-03-24 @ 09:31:25 AM, Alexander Kanavin wrote:
I don't. You need to inspect the build tree to find clues why theYes I've been seeing exactly these issues as well. I'm not using any sort of virtualization, I'm using Jenkins to do nightly builds directly on my host. My host machine is openSUSE 15.3. These problems started on Feb 21 for me. Each of my builds starts by doing a "git pull" on each of the repositories, then kicks off a build if any of the repositories changed. A fresh build will always succeed. Doing a "clean" and rebuilding will (I believe) always succeed. My gut feeling is that it somehow has something to do with having an existing build, refreshing the repositories, then rebuilding. I spent weeks trying to find a reproducer. I wrote a script to checkout one version of the repositories (before), build, checkout a newer version of the repositories (after) and rebuilding. Even in cases where I used the exact same hashes that had failed on my Jenkins build and repeating 20 times, in some cases I wasn't able to reproduce the error. I was able to find 1 reproducer involving a build for an imx28evk MACHINE, but even then after 20 iterations 13 were bad and 7 were good. I repeated that set of 20 builds many times and it was never 100% bad. My investigations led me to believe that it might be related to rm_work and/or BB_NUMBER_THREADS/PARALLEL_MAKE. In my Jenkins builds I enable 'INHERIT += "rm_work"' and I also limit the BB_NUMBER_THREADS and set PARALLEL_MAKE. On the cmdline I was able to reduce the number of failures (sometimes to none) by removing the rm_work and THREADS/PARALLEL, but never completely eliminate it. In Jenkins the build failures still felt as random as they were without the change, so I can't say that it's having much effect in Jenkins, but seems to have some effect on the cmdline. I can say this with certainty: Matthias says it seems that the specific recipe that fails is random, but it's not. In every case the recipe that fails is a recipe whose source files are contained in the meta layer itself. For me the failing recipes were always: modutils-initscripts initscripts If you look at the recipes for those packages they do not have a SRC_URI that fetches code from some remote location then uses quilt to apply some patches. In both cases all of the "source" code exists in the layer itself, and somehow quilt is involved in placing them in the build area. I have dozens and dozens of these failures recorded and it is always with a recipe that follows that pattern. But 99%-ish percent of the failures are with the two packages I listed above. The failures aren't related to days when those packages change. The failures are just... sporadic. So the issue is related to: - recipes with in-layer sources - quilt (being run twice (?)) - updating layers, and rebuilding in a build area with an existing build - Feb 21 2022 (or thereabouts) The issue might be related to: - jenkins? - my build host? - rm_work? - BB_NUMBER_THREADS? - PARALLEL_MAKE? |
|
Matthias Klein
Hi Trevor,
thank you very much for the detailed answer. Yes, you are right, it is mostly the same recipes that fail. But they also change from time to time. Today it happened to me even without Jenkins and Docker, normally in the console with the recipe keymaps_1.0.bb. With the nighly builds over the Jenkins I help myself at the moment that I delete build/tmp before. So far, the problem has not occurred again. Many greetings, Matthias -----Ursprüngliche Nachricht----- Von: Trevor Woerner <twoerner@...> Gesendet: Dienstag, 29. März 2022 18:23 An: Alexander Kanavin <alex.kanavin@...> Cc: Matthias Klein <matthias.klein@...>; yocto@... Betreff: Re: [yocto] Strange sporadic build issues (incremental builds in docker container) On Thu 2022-03-24 @ 09:31:25 AM, Alexander Kanavin wrote: I don't. You need to inspect the build tree to find clues why theYes I've been seeing exactly these issues as well. I'm not using any sort of virtualization, I'm using Jenkins to do nightly builds directly on my host. My host machine is openSUSE 15.3. These problems started on Feb 21 for me. Each of my builds starts by doing a "git pull" on each of the repositories, then kicks off a build if any of the repositories changed. A fresh build will always succeed. Doing a "clean" and rebuilding will (I believe) always succeed. My gut feeling is that it somehow has something to do with having an existing build, refreshing the repositories, then rebuilding. I spent weeks trying to find a reproducer. I wrote a script to checkout one version of the repositories (before), build, checkout a newer version of the repositories (after) and rebuilding. Even in cases where I used the exact same hashes that had failed on my Jenkins build and repeating 20 times, in some cases I wasn't able to reproduce the error. I was able to find 1 reproducer involving a build for an imx28evk MACHINE, but even then after 20 iterations 13 were bad and 7 were good. I repeated that set of 20 builds many times and it was never 100% bad. My investigations led me to believe that it might be related to rm_work and/or BB_NUMBER_THREADS/PARALLEL_MAKE. In my Jenkins builds I enable 'INHERIT += "rm_work"' and I also limit the BB_NUMBER_THREADS and set PARALLEL_MAKE. On the cmdline I was able to reduce the number of failures (sometimes to none) by removing the rm_work and THREADS/PARALLEL, but never completely eliminate it. In Jenkins the build failures still felt as random as they were without the change, so I can't say that it's having much effect in Jenkins, but seems to have some effect on the cmdline. I can say this with certainty: Matthias says it seems that the specific recipe that fails is random, but it's not. In every case the recipe that fails is a recipe whose source files are contained in the meta layer itself. For me the failing recipes were always: modutils-initscripts initscripts If you look at the recipes for those packages they do not have a SRC_URI that fetches code from some remote location then uses quilt to apply some patches. In both cases all of the "source" code exists in the layer itself, and somehow quilt is involved in placing them in the build area. I have dozens and dozens of these failures recorded and it is always with a recipe that follows that pattern. But 99%-ish percent of the failures are with the two packages I listed above. The failures aren't related to days when those packages change. The failures are just... sporadic. So the issue is related to: - recipes with in-layer sources - quilt (being run twice (?)) - updating layers, and rebuilding in a build area with an existing build - Feb 21 2022 (or thereabouts) The issue might be related to: - jenkins? - my build host? - rm_work? - BB_NUMBER_THREADS? - PARALLEL_MAKE? |
|
Trevor Woerner
Hi Matthias,
On Wed 2022-03-30 @ 06:32:00 AM, Matthias Klein wrote: Yes, you are right, it is mostly the same recipes that fail. But they also change from time to time.And keymaps follows the exact same pattern as modutils-initscripts and initscripts; namely that their sources are entirely contained in-tree: keymaps/ ├── files │ ├── GPLv2.patch │ └── keymap.sh └── keymaps_1.0.bb keymaps/keymaps_1.0.bb 23 SRC_URI = "file://keymap.sh \ 24 file://GPLv2.patch" Any recipe that follows this pattern is susceptible, it's probably just a coincidence that most of my failures happened to be with the two recipes I mentioned. This issue has revealed a bug, and fixing that bug would be great. However, the thing is, keymap.sh is a shell program written 12 years ago which hasn't changed since. The GPL/COPYING file is only there for "reasons". The license file doesn't *need* to be moved into the build area for this recipe to get its job done (namely installing keymap.sh into the image's sysvinit). Best regards, Trevor |
|
Richard Purdie
On Wed, 2022-03-30 at 09:40 -0400, Trevor Woerner wrote:
Hi Matthias,The "good" news is I did work out how to reproduce this. bitbake keymaps -c clean bitbake keymaps bitbake keymaps -c unpack -f bitbake keymaps -c patch bitbake keymaps -c unpack -f bitbake keymaps -c patch I haven't looked at why but hopefully that helps us more forward with looking at the issue. The complications with S == WORKDIR were one of the reasons I did start work on patches to make it work better and maybe move fetching into a dedicated direction rather than WORKDIR and then symlink things. I never got that patch to work well enough to submit though (and it is too late for a major change like that in this release). Cheers, Richard |
|
Trevor Woerner
On Wed 2022-03-30 @ 04:08:31 PM, Richard Purdie wrote:
On Wed, 2022-03-30 at 09:40 -0400, Trevor Woerner wrote:Awesome! That is a very simple and quick reproducer!Hi Matthias,The "good" news is I did work out how to reproduce this. I haven't looked at why but hopefully that helps us more forward with looking atAs per our conversation I quickly tried the following (not that I expected this to be a final solution, but just a poking-around kind of thing): diff --git a/meta/classes/base.bbclass b/meta/classes/base.bbclass index cc81461473..503da61b3d 100644 --- a/meta/classes/base.bbclass +++ b/meta/classes/base.bbclass @@ -170,6 +170,7 @@ do_unpack[dirs] = "${WORKDIR}" do_unpack[cleandirs] = "${@d.getVar('S') if os.path.normpath(d.getVar('S')) != os.path.normpath(d.getVar('WORKDIR')) else os.path.join('${S}', 'patches')}" python base_do_unpack() { + bb.utils.remove(d.getVar('B') + "/.pc", recurse=True) src_uri = (d.getVar('SRC_URI') or "").split() if not src_uri: return And it changed the error message from: $ bitbake keymaps -c patch ... ERROR: keymaps-1.0-r31 do_patch: Applying patch 'GPLv2.patch' on target directory '/z/build-master/quilt-fix/qemux86/nodistro/build/tmp-glibc/work/qemux86-oe-linux/keymaps/1.0-r31' CmdError('quilt --quiltrc /z/build-master/quilt-fix/qemux86/nodistro/build/tmp-glibc/work/qemux86-oe-linux/keymaps/1.0-r31/recipe-sysroot-native/etc/quiltrc push', 0, 'stdout: stderr: File series fully applied, ends at patch GPLv2.patch ') to: $ bitbake keymaps -c patch ... ERROR: keymaps-1.0-r31 do_patch: Applying patch 'GPLv2.patch' on target directory '/z/build-master/quilt-fix/qemux86/nodistro/build/tmp-glibc/work/qemux86-oe-linux/keymaps/1.0-r31' CmdError('quilt --quiltrc /z/build-master/quilt-fix/qemux86/nodistro/build/tmp-glibc/work/qemux86-oe-linux/keymaps/1.0-r31/recipe-sysroot-native/etc/quiltrc push', 0, 'stdout: Applying patch GPLv2.patch The next patch would create the file COPYING, which already exists! Applying it anyway. patching file COPYING Hunk #1 FAILED at 1. 1 out of 1 hunk FAILED -- rejects in file COPYING Patch GPLv2.patch can be reverse-applied stderr: ') progress? https://www.reddit.com/r/ProgrammerHumor/comments/8j5qim/progress/ |
|