do_package unknown user build failure
Richard Purdie
Hi All,
There are a number of failures on the autobuilder in do_package with odd unknown user issues. My guess is that its related to the new buildtools tarball I configured in the helper. I'm going to guess we're missing a glibc syscall with the new glibc. I'll look into it as a priority. Cheers, Richard |
|
Re: Autobuilder reproducibility target changes
Joshua Watt <JPEWhacker@...>
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote: OK, I read through the code and unfortunately found a bug: when attempting to make sure the "B" build doesn't use sstate, I misspelled the SSTATE_MIRRORS, which means that the B build could have been pulling from the sstate mirror when it was not supposed to. This has a few implications: 1) It might explain why some of the reproducible results seem intermittent 2) It might explain why there is such a time disparity between the tests Unfortunately, while it probably will help the intermittent results, it probably means that the tests taking 9 hours is what is "supposed" to happen, and they happen to be shorter sometimes because the B build is pulling from sstate when it's not supposed to.
|
|
Re: Autobuilder reproducibility target changes
Joshua Watt <JPEWhacker@...>
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote: I'm not sure that diffoscope is the culprit here. If you look at the logs, you can see that there is only about 30 seconds between the "Running diffoscope" log message and the end of the test. I suspect something else is going wrong here. I can try to write up patch to try and add more logging so we can more accurately pinpoint where it's taking so long.
|
|
Re: Autobuilder reproducibility target changes
Alexander Kanavin <alex.kanavin@...>
On Sun, 14 Feb 2021 at 16:17, Richard Purdie <richard.purdie@...> wrote: On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote: I did a bit of work on the go reproducibility, that has been preserved here: I got there, but it took quite a bit of time (debugging go build process is extremely painful) and I'm not at all happy with the hacky/brittle things in the patch, so it's on hold for now - but anyone is welcome to take it and make it better, especially if they're go specialists. Alex |
|
Re: Autobuilder reproducibility target changes
Richard Purdie
On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
Cheers :) If there's something else I could help, tell.I'm going to try and get diffs of the remaining big package differences and see where we stand. The two big ones I know of are go as a language for reproducibility and perf. Perf will just be a shear pain to fix, hopefully the kernel will take patches. One item is getting to the bottom of why it takes diffoscope beyondThat would be really helpful to get to the bottom of. The vim-common difference is actually really simple: https://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20210213-0djxo1sn/packages/diff-html/ so I don't know why it took 5 hours to compute that. It suggests something really silly/stupid is going on. diffoscope should be amenable to fixes so it would be worth talking to them too... (I have a fix for the locale problem in vim brewing, it needs a new buildtools-extended-tarball) Cheers, Richard |
|
Re: Autobuilder reproducibility target changes
Alexander Kanavin <alex.kanavin@...>
Cheers :) If there's something else I could help, tell. One item is getting to the bottom of why it takes diffoscope beyond the heat death of the universe to render its verdict on some items. Alex On Sun, 14 Feb 2021 at 13:19, Richard Purdie <richard.purdie@...> wrote: Regular users of the autobuilder will note that I've split the |
|
Autobuilder reproducibility target changes
Richard Purdie
Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its own target build. This is because that test tends to run for a lot longer time period and it helps to see the result separately. I've only done this for master. If gatesgarth and dunfell want to follow, that should be straight forward with a change to the branch in autobuilder-helper. Obviously we should ensure this is working ok with master first but so far so good. It has already highlighted the difference between a successful run: https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2 https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2 (took 3-4 hours) and failing two failing runs: https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2 https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2 (took 9 hours) the time difference being the system trying to run diffoscope on vim- common :/. I'm aware I removed some recipes from the exclusions list after seeing multiple passing builds for all distros and we're now seeing test failures. My mistake was not waiting for the date to change and for builds to run on an autobuilder worker with a different umask. Meson is failing with a pyc file mismatch which diffoscope can't decode and despite trying for 5 hours, diffoscope hasn't given any data on why vim-common differs. I should have fixes in for quilt, valgrind, kernel- devsrc and cwautomacros. The umask fix may fix other issues too. Alex has improved the reporting so we can spot cases where exclusion is now longer needed. Cheers, Richard |
|
SWAT Rotation
Alexandre Belloni
Hello Jagadheesan,
You are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from Minjae to you at EOD 2012-02-12. Please reply to let me know whether you will be able to work on this task. I'll be available to walk you through the process on Monday, don't hesitate to contact me by email or on IRC. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
SWAT statistics for week 05
Alexandre Belloni
Hello,
Here are the statistics for last week. Jon Mason was on SWAT duty. 413 failures were reported, * 145 were triaged by Jon - 6 were not to be triaged - 7 were due to out of disk space issues - 65 for a binutils issue Richard already sent an email for - 21 were already fixed at the time they were triaged - 11 for 3 issues for which emails were sent - 8 were reoccurences of bugs 13802, 14002, 14170, 14181, 14200 - 10 occurences of new bug 14210 - 2 occurences of new bug 14212 - 3 occurences of new bug 14221 - 2 occurences of new bug 14222 - 3 occurences of new bug 14223 - 2 occurences of new bug 14224 - 1 for new bug 14225 - 4 occurences of new bug 14226 * 154 were triaged by Richard - 148 because of a broken glibc upgrade patch - 5 for a bitbake multiconfig change - 1 was added to bug 14201 * 113 were triaged by me - 106 were not to be triaged - 2 were meta-oe build failures due to the autconf update - 2 were cancelled with no other errors - 1 was added to bug 14029 - 1 new bug opened: 14213 - 1 was due to the qmp patch and being handled by Saul Again, the raw number of failures is not representative of the work that has been done. Swatbot is now filtering the failures that are not to be triaged or the cancellations without any other issues so we won't have those anymore. Regards, -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
Controller out of space
Richard Purdie
Hi,
The autobuilder controller (typhoon) ran out of disk space and I ended up having to restart buildbot after clearing some space with a temporary fix until Michael can look at it. That meant the running builds were interrupted and didn't restart automatically. Cheers, Richard |
|
Switch to swatbot
Richard Purdie
Hi All!
I just wanted to let people know that we've officially switched from using the wiki BuildLog over to "SwatBot". The most useful URL for SWAT is: https://swatbot.yoctoproject.org/mainindex/swat/ which lists all the "Pending" build issues which need triage. Anyone with an account can go in and edit these into the correct states, the idea of swat is to handle them and ensure we take actions against failures/warnings. The app has gone through quite a few improvements and tweaks over the last couple of weeks and should be more useful now. We're happy to take other improvement ideas too both to the app and process. Alexandre should be able to help with any questions and give out accounts as needed to SWAT members. Cheers, Richard |
|
SWAT Rotation
Alexandre Belloni
Hello Minjae;
This time, you are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from Jon to you at EOD 2012-02-05. Please reply to let me know whether you will be able to work on this task. I'll be available to walk you through the process on Monday from 9am to 11am CET, that should be 5pm to 7pm for you. If that doesn't work, I'll have more availability on Tuesday. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
SWAT statistics for week 04
Alexandre Belloni
Hello,
I've been looking at what happened this week, I'm pretty sure the number of triaged failures is not a proper metric because of the number of repetition of a single issue so I tried to dig a bit deeper: 953 failures were reported, * 347 were triaged by Richard - 62 because of oeqa priority patches breaking things, Richard sent a mail - 285 for 3 cancelled builds he handled himself * 606 were triaged by me - 96 were not to be triaged (swatbot filter those yet) - 100 were cancelled builds without any other issue (also, swatbot doesn't filter those yet) - 2 for misconfigured builds - 2 were new bugs: 14198 and 14208 - 89 were added to existing bugs as new occurrences of 13802, 13935, 14028, 14029, 14181 - 12 were handled by sending 7 replies to the mailing lists, 4 from Richard, 3 from me - 45 were fixed by 2 commits already in tree at the time there were triaged - 11 were due to the qmp patch and being handled by Saul - 55 were due to the autoconf upgrade and are/were handled by Richard - 14 were a warning for fuzz on the at patches solving the autoconf upgrade - 1 was handled in meta-oe - 3 were an issue with the sudo archive being 0 byte on the mirror, this has been handled by Michael Halstead - 176 were issues already being handled by Richard: mesa dependencies, multiple rdepends default value issues, ... -> in those I found a warning in meta-intel and I sent a patch to fix it Regards, -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
Re: SWAT Rotation
Jon Mason
Hello Alexandre,
I have received this email and will take over tomorrow morning. Thanks, Jon On Thu, Jan 28, 2021 at 7:07 PM Alexandre Belloni <alexandre.belloni@...> wrote:
|
|
SWAT Rotation
Alexandre Belloni
Jon,
Once again, you are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from me to you at EOD 01/29/2021. Please reply to let me know whether you will be able to work on this task. Also, please update me if during the week, you can't spend enough time on the topic. I know Ross will walk you through the process but don't hesitate to contact me if you have any question. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
Re: SWAT Rotation
Alexandre Belloni
On 22/01/2021 06:57:13-0500, Jon Mason wrote:
Hello Alexandre,Indeed, I will take this rotation and you'll get the next one. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
Re: SWAT Rotation
Jon Mason
Hello Alexandre,
I believe it was already discussed with you, but I'll take the next rotation. Thanks, Jon On Thu, Jan 21, 2021 at 5:38 PM Alexandre Belloni <alexandre.belloni@...> wrote:
|
|
Introducing "SwatBot"
Richard Purdie
Hi All,
I have some good news, the SWAT django app I've talked about is nearing completion and will soon be ready to use. We now have a live instance and are checking how it works with live data from the autobuilder. https://swatbot.yoctoproject.org/ It is a bit ugly and rough around the edges but it should be able to display a list of the things SWAT need to attend too and allow them to be marked as resolved, tracking the resolutions. This should mean that the process is easier for the person doing SWAT and for us to make sure nothing gets "lost" during handovers etc. Source code is available: http://git.yoctoproject.org/cgit.cgi/swatbot and the notifier plugin that connects to buildbot: http://git.yoctoproject.org/cgit.cgi/yocto-autobuilder2 I'm expecting that where it isn't doing what we need, we should be able to get it into shape in relatively short order. Alexandre and Bootlin are now coming up to speed and will be taking over leading the SWAT process and as part of that, helping get the app working for everyone. I think Alexandre will issue accounts to people on SWAT as we rotate through the list. Over the last week I've been worried some issues haven't been logged/handled so I made a note of the ones we need to check have been handled: meta-arm edk2 intermittent failure needs bug: https://autobuilder.yoctoproject.org/typhoon/#/builders/113/builds/629 update ltp bug: https://autobuilder.yoctoproject.org/typhoon/#/builders/96/builds/1403 need a bug for slow boot failure intermittent issue?: (was it slow, have we more logs, may be too late now?) https://autobuilder.yoctoproject.org/typhoon/#/builders/109/builds/1873 intermittent ptest warnings, need to updates to bugs so we can track frequency: https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1435 https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1716 https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1443 https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1723 https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1439 https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1719 bitbake timeout bug needs updating adding this? https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/1739 should be a multiprocess bug somewhere, add or reopen? https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/1729 qemuarm64 shutdown bug https://autobuilder.yoctoproject.org/typhoon/#/builders/42/builds/2943 ping test fail: https://autobuilder.yoctoproject.org/typhoon/#/builders/74/builds/2945 mips backtrace: https://autobuilder.yoctoproject.org/typhoon/#/builders/74/builds/2942 There were a ton more failures but I think I've responded/handled the others. I'll admit to focusing on getting the app done since I think it should improve this process so much and help everyone. Cheers, Richard |
|
SWAT Rotation
Alexandre Belloni
Jon,
You are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT will rotate from Lee Chee to you at EOD 01/22/2021. Please reply to let me know whether you will be able to work on this task. Also, please update me if during the week, you can't spedn enough time on the topic. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com |
|
Build and Integration help from Bootlin
Richard Purdie
Hi All,
I'm pleased to be able to announce that Alexandre Belloni and some of his colleagues from Bootlin are going to be helping us with the process for testing and integrating patches within the project. There are multiple reasons for this, not least that it helps not to have a single point of failure and that I could do with the help and a break! :) As such, they are are going to be leading the SWAT process and you'll therefore be seeing them giving initial responses on patches and helping with the autobuilder failures. I'd like to welcome them on board and hope everyone will support them as we figure out how the finer details of this will work. This is being made possible by support from the Yocto Project members which is very much appreciated, thanks! I will be continuing to do the final patch review and merge and am not going anywhere, this is some very much welcomed support to try and ensure we continue to scale and maintain our quality and testing. Cheers, Richard |
|