Re: Autobuilder reproducibility target changes
Richard Purdie
On Sun, 2021-02-14 at 13:17 -0600, Joshua Watt wrote:
On Sun, Feb 14, 2021 at 6:19 AM Richard PurdieThe "good" news is that this didn't affect the autobuilder as it sets SSTATE_DIR to a common directory and doesn't use SSTATE_MIRRORS. Unfortunately, while it probably will help the intermittent results,I don't think we're to the bottom of this. If its not spending the time in diffoscope, something seems to cause builds with differences to take much longer... Cheers, Richard
|
|
Re: Autobuilder reproducibility target changes
Joshua Watt <JPEWhacker@...>
On Mon, Feb 15, 2021 at 12:21 AM Alexander Kanavin
<alex.kanavin@...> wrote: I forgot to mention that I did run diffoscope locally with the offending vim packages and it took about 30 seconds (same as the AB logs showed)
|
|
SWAT Rotation
Alexandre Belloni
Hello Ross,
Since I didn't get any reply from Jagadheesan and you were the next one on the list (and we discussed that on IRC), you will be on SWAT duty this week. I know this is short notice and I'll try to assist as much as possible if you need so. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
Re: Autobuilder reproducibility target changes
Alexander Kanavin <alex.kanavin@...>
I’ve definitely seen diffoscope process take hours and hours and hours in local builds. Trying it with these vim packages locally should still be done. Alex
On Sun 14. Feb 2021 at 20.18, Joshua Watt <jpewhacker@...> wrote: On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
|
|
do_package unknown user build failure
Richard Purdie
Hi All,
There are a number of failures on the autobuilder in do_package with odd unknown user issues. My guess is that its related to the new buildtools tarball I configured in the helper. I'm going to guess we're missing a glibc syscall with the new glibc. I'll look into it as a priority. Cheers, Richard
|
|
Re: Autobuilder reproducibility target changes
Joshua Watt <JPEWhacker@...>
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote: OK, I read through the code and unfortunately found a bug: when attempting to make sure the "B" build doesn't use sstate, I misspelled the SSTATE_MIRRORS, which means that the B build could have been pulling from the sstate mirror when it was not supposed to. This has a few implications: 1) It might explain why some of the reproducible results seem intermittent 2) It might explain why there is such a time disparity between the tests Unfortunately, while it probably will help the intermittent results, it probably means that the tests taking 9 hours is what is "supposed" to happen, and they happen to be shorter sometimes because the B build is pulling from sstate when it's not supposed to.
|
|
Re: Autobuilder reproducibility target changes
Joshua Watt <JPEWhacker@...>
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote: I'm not sure that diffoscope is the culprit here. If you look at the logs, you can see that there is only about 30 seconds between the "Running diffoscope" log message and the end of the test. I suspect something else is going wrong here. I can try to write up patch to try and add more logging so we can more accurately pinpoint where it's taking so long.
|
|
Re: Autobuilder reproducibility target changes
Alexander Kanavin <alex.kanavin@...>
On Sun, 14 Feb 2021 at 16:17, Richard Purdie <richard.purdie@...> wrote: On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote: I did a bit of work on the go reproducibility, that has been preserved here: I got there, but it took quite a bit of time (debugging go build process is extremely painful) and I'm not at all happy with the hacky/brittle things in the patch, so it's on hold for now - but anyone is welcome to take it and make it better, especially if they're go specialists. Alex
|
|
Re: Autobuilder reproducibility target changes
Richard Purdie
On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
Cheers :) If there's something else I could help, tell.I'm going to try and get diffs of the remaining big package differences and see where we stand. The two big ones I know of are go as a language for reproducibility and perf. Perf will just be a shear pain to fix, hopefully the kernel will take patches. One item is getting to the bottom of why it takes diffoscope beyondThat would be really helpful to get to the bottom of. The vim-common difference is actually really simple: https://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20210213-0djxo1sn/packages/diff-html/ so I don't know why it took 5 hours to compute that. It suggests something really silly/stupid is going on. diffoscope should be amenable to fixes so it would be worth talking to them too... (I have a fix for the locale problem in vim brewing, it needs a new buildtools-extended-tarball) Cheers, Richard
|
|
Re: Autobuilder reproducibility target changes
Alexander Kanavin <alex.kanavin@...>
Cheers :) If there's something else I could help, tell. One item is getting to the bottom of why it takes diffoscope beyond the heat death of the universe to render its verdict on some items. Alex
On Sun, 14 Feb 2021 at 13:19, Richard Purdie <richard.purdie@...> wrote: Regular users of the autobuilder will note that I've split the
|
|
Autobuilder reproducibility target changes
Richard Purdie
Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its own target build. This is because that test tends to run for a lot longer time period and it helps to see the result separately. I've only done this for master. If gatesgarth and dunfell want to follow, that should be straight forward with a change to the branch in autobuilder-helper. Obviously we should ensure this is working ok with master first but so far so good. It has already highlighted the difference between a successful run: https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2 https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2 (took 3-4 hours) and failing two failing runs: https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2 https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2 (took 9 hours) the time difference being the system trying to run diffoscope on vim- common :/. I'm aware I removed some recipes from the exclusions list after seeing multiple passing builds for all distros and we're now seeing test failures. My mistake was not waiting for the date to change and for builds to run on an autobuilder worker with a different umask. Meson is failing with a pyc file mismatch which diffoscope can't decode and despite trying for 5 hours, diffoscope hasn't given any data on why vim-common differs. I should have fixes in for quilt, valgrind, kernel- devsrc and cwautomacros. The umask fix may fix other issues too. Alex has improved the reporting so we can spot cases where exclusion is now longer needed. Cheers, Richard
|
|
SWAT Rotation
Alexandre Belloni
Hello Jagadheesan,
You are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from Minjae to you at EOD 2012-02-12. Please reply to let me know whether you will be able to work on this task. I'll be available to walk you through the process on Monday, don't hesitate to contact me by email or on IRC. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
SWAT statistics for week 05
Alexandre Belloni
Hello,
Here are the statistics for last week. Jon Mason was on SWAT duty. 413 failures were reported, * 145 were triaged by Jon - 6 were not to be triaged - 7 were due to out of disk space issues - 65 for a binutils issue Richard already sent an email for - 21 were already fixed at the time they were triaged - 11 for 3 issues for which emails were sent - 8 were reoccurences of bugs 13802, 14002, 14170, 14181, 14200 - 10 occurences of new bug 14210 - 2 occurences of new bug 14212 - 3 occurences of new bug 14221 - 2 occurences of new bug 14222 - 3 occurences of new bug 14223 - 2 occurences of new bug 14224 - 1 for new bug 14225 - 4 occurences of new bug 14226 * 154 were triaged by Richard - 148 because of a broken glibc upgrade patch - 5 for a bitbake multiconfig change - 1 was added to bug 14201 * 113 were triaged by me - 106 were not to be triaged - 2 were meta-oe build failures due to the autconf update - 2 were cancelled with no other errors - 1 was added to bug 14029 - 1 new bug opened: 14213 - 1 was due to the qmp patch and being handled by Saul Again, the raw number of failures is not representative of the work that has been done. Swatbot is now filtering the failures that are not to be triaged or the cancellations without any other issues so we won't have those anymore. Regards, -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
Controller out of space
Richard Purdie
Hi,
The autobuilder controller (typhoon) ran out of disk space and I ended up having to restart buildbot after clearing some space with a temporary fix until Michael can look at it. That meant the running builds were interrupted and didn't restart automatically. Cheers, Richard
|
|
Switch to swatbot
Richard Purdie
Hi All!
I just wanted to let people know that we've officially switched from using the wiki BuildLog over to "SwatBot". The most useful URL for SWAT is: https://swatbot.yoctoproject.org/mainindex/swat/ which lists all the "Pending" build issues which need triage. Anyone with an account can go in and edit these into the correct states, the idea of swat is to handle them and ensure we take actions against failures/warnings. The app has gone through quite a few improvements and tweaks over the last couple of weeks and should be more useful now. We're happy to take other improvement ideas too both to the app and process. Alexandre should be able to help with any questions and give out accounts as needed to SWAT members. Cheers, Richard
|
|
SWAT Rotation
Alexandre Belloni
Hello Minjae;
This time, you are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from Jon to you at EOD 2012-02-05. Please reply to let me know whether you will be able to work on this task. I'll be available to walk you through the process on Monday from 9am to 11am CET, that should be 5pm to 7pm for you. If that doesn't work, I'll have more availability on Tuesday. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
SWAT statistics for week 04
Alexandre Belloni
Hello,
I've been looking at what happened this week, I'm pretty sure the number of triaged failures is not a proper metric because of the number of repetition of a single issue so I tried to dig a bit deeper: 953 failures were reported, * 347 were triaged by Richard - 62 because of oeqa priority patches breaking things, Richard sent a mail - 285 for 3 cancelled builds he handled himself * 606 were triaged by me - 96 were not to be triaged (swatbot filter those yet) - 100 were cancelled builds without any other issue (also, swatbot doesn't filter those yet) - 2 for misconfigured builds - 2 were new bugs: 14198 and 14208 - 89 were added to existing bugs as new occurrences of 13802, 13935, 14028, 14029, 14181 - 12 were handled by sending 7 replies to the mailing lists, 4 from Richard, 3 from me - 45 were fixed by 2 commits already in tree at the time there were triaged - 11 were due to the qmp patch and being handled by Saul - 55 were due to the autoconf upgrade and are/were handled by Richard - 14 were a warning for fuzz on the at patches solving the autoconf upgrade - 1 was handled in meta-oe - 3 were an issue with the sudo archive being 0 byte on the mirror, this has been handled by Michael Halstead - 176 were issues already being handled by Richard: mesa dependencies, multiple rdepends default value issues, ... -> in those I found a warning in meta-intel and I sent a patch to fix it Regards, -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
Re: SWAT Rotation
Jon Mason <jdmason@...>
Hello Alexandre,
I have received this email and will take over tomorrow morning. Thanks, Jon On Thu, Jan 28, 2021 at 7:07 PM Alexandre Belloni <alexandre.belloni@...> wrote:
|
|
SWAT Rotation
Alexandre Belloni
Jon,
Once again, you are the next one on the list (https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members) and SWAT duty will rotate from me to you at EOD 01/29/2021. Please reply to let me know whether you will be able to work on this task. Also, please update me if during the week, you can't spend enough time on the topic. I know Ross will walk you through the process but don't hesitate to contact me if you have any question. Thanks! -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|
Re: SWAT Rotation
Alexandre Belloni
On 22/01/2021 06:57:13-0500, Jon Mason wrote:
Hello Alexandre,Indeed, I will take this rotation and you'll get the next one. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
|
|