Date   

do_package unknown user build failure

Richard Purdie
 

Hi All,

There are a number of failures on the autobuilder in do_package with
odd unknown user issues. My guess is that its related to the new
buildtools tarball I configured in the helper. I'm going to guess we're
missing a glibc syscall with the new glibc.

I'll look into it as a priority.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

1) It might explain why some of the reproducible results seem intermittent
2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.


the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.
I'm not sure that diffoscope is the culprit here. If you look at the
logs, you can see that there is only about 30 seconds between the
"Running diffoscope" log message and the end of the test. I suspect
something else is going wrong here. I can try to write up patch to try
and add more logging so we can more accurately pinpoint where it's
taking so long.



I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Alexander Kanavin <alex.kanavin@...>
 

On Sun, 14 Feb 2021 at 16:17, Richard Purdie <richard.purdie@...> wrote:
On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
> Cheers :) If there's something else I could help, tell.

I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

I did a bit of work on the go reproducibility, that has been preserved here:

I got there, but it took quite a bit of time (debugging go build process is extremely painful) and I'm not at all happy with the hacky/brittle things in the patch, so it's on hold for now - but anyone is welcome to take it and make it better, especially if they're go specialists.

Alex


Re: Autobuilder reproducibility target changes

Richard Purdie
 

On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
Cheers :) If there's something else I could help, tell.
I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

One item is getting to the bottom of why it takes diffoscope beyond
the heat death of the universe to render its verdict on some items.
That would be really helpful to get to the bottom of. The vim-common
difference is actually really simple:

https://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20210213-0djxo1sn/packages/diff-html/

so I don't know why it took 5 hours to compute that. It suggests
something really silly/stupid is going on. diffoscope should be
amenable to fixes so it would be worth talking to them too...

(I have a fix for the locale problem in vim brewing, it needs a new
buildtools-extended-tarball)

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Alexander Kanavin <alex.kanavin@...>
 

Cheers :) If there's something else I could help, tell.

One item is getting to the bottom of why it takes diffoscope beyond the heat death of the universe to render its verdict on some items.

Alex


On Sun, 14 Feb 2021 at 13:19, Richard Purdie <richard.purdie@...> wrote:
Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Autobuilder reproducibility target changes

Richard Purdie
 

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello Jagadheesan,

You are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from Minjae to you at EOD 2012-02-12.

Please reply to let me know whether you will be able to work on this
task.

I'll be available to walk you through the process on Monday, don't
hesitate to contact me by email or on IRC.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT statistics for week 05

Alexandre Belloni
 

Hello,

Here are the statistics for last week. Jon Mason was on SWAT duty.

413 failures were reported,
* 145 were triaged by Jon
- 6 were not to be triaged
- 7 were due to out of disk space issues
- 65 for a binutils issue Richard already sent an email for
- 21 were already fixed at the time they were triaged
- 11 for 3 issues for which emails were sent
- 8 were reoccurences of bugs 13802, 14002, 14170, 14181, 14200
- 10 occurences of new bug 14210
- 2 occurences of new bug 14212
- 3 occurences of new bug 14221
- 2 occurences of new bug 14222
- 3 occurences of new bug 14223
- 2 occurences of new bug 14224
- 1 for new bug 14225
- 4 occurences of new bug 14226

* 154 were triaged by Richard
- 148 because of a broken glibc upgrade patch
- 5 for a bitbake multiconfig change
- 1 was added to bug 14201

* 113 were triaged by me
- 106 were not to be triaged
- 2 were meta-oe build failures due to the autconf update
- 2 were cancelled with no other errors
- 1 was added to bug 14029
- 1 new bug opened: 14213
- 1 was due to the qmp patch and being handled by Saul

Again, the raw number of failures is not representative of the work that
has been done. Swatbot is now filtering the failures that are not to be
triaged or the cancellations without any other issues so we won't have
those anymore.

Regards,

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Controller out of space

Richard Purdie
 

Hi,

The autobuilder controller (typhoon) ran out of disk space and I ended
up having to restart buildbot after clearing some space with a
temporary fix until Michael can look at it. That meant the running
builds were interrupted and didn't restart automatically.

Cheers,

Richard


Switch to swatbot

Richard Purdie
 

Hi All!

I just wanted to let people know that we've officially switched from
using the wiki BuildLog over to "SwatBot". The most useful URL for SWAT
is:

https://swatbot.yoctoproject.org/mainindex/swat/

which lists all the "Pending" build issues which need triage. Anyone
with an account can go in and edit these into the correct states, the
idea of swat is to handle them and ensure we take actions against
failures/warnings.

The app has gone through quite a few improvements and tweaks over the
last couple of weeks and should be more useful now. We're happy to take
other improvement ideas too both to the app and process.

Alexandre should be able to help with any questions and give out
accounts as needed to SWAT members.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello Minjae;

This time, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from Jon to you at EOD 2012-02-05.

Please reply to let me know whether you will be able to work on this
task.

I'll be available to walk you through the process on Monday from 9am to
11am CET, that should be 5pm to 7pm for you. If that doesn't work, I'll
have more availability on Tuesday.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT statistics for week 04

Alexandre Belloni
 

Hello,

I've been looking at what happened this week, I'm pretty sure the number
of triaged failures is not a proper metric because of the number of
repetition of a single issue so I tried to dig a bit deeper:

953 failures were reported,
* 347 were triaged by Richard
- 62 because of oeqa priority patches breaking things, Richard sent
a mail
- 285 for 3 cancelled builds he handled himself

* 606 were triaged by me
- 96 were not to be triaged (swatbot filter those yet)
- 100 were cancelled builds without any other issue (also, swatbot
doesn't filter those yet)
- 2 for misconfigured builds
- 2 were new bugs: 14198 and 14208
- 89 were added to existing bugs as new occurrences of 13802, 13935, 14028, 14029, 14181
- 12 were handled by sending 7 replies to the mailing lists, 4 from
Richard, 3 from me
- 45 were fixed by 2 commits already in tree at the time there were triaged
- 11 were due to the qmp patch and being handled by Saul
- 55 were due to the autoconf upgrade and are/were handled by
Richard
- 14 were a warning for fuzz on the at patches solving the autoconf upgrade
- 1 was handled in meta-oe
- 3 were an issue with the sudo archive being 0 byte on the mirror,
this has been handled by Michael Halstead
- 176 were issues already being handled by Richard: mesa
dependencies, multiple rdepends default value issues, ...
-> in those I found a warning in meta-intel and I sent a patch to fix it

Regards,

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Jon Mason
 

Hello Alexandre,
I have received this email and will take over tomorrow morning.

Thanks,
Jon

On Thu, Jan 28, 2021 at 7:07 PM Alexandre Belloni
<alexandre.belloni@...> wrote:

Jon,

Once again, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from me to you at EOD 01/29/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spend enough
time on the topic.

I know Ross will walk you through the process but don't hesitate to
contact me if you have any question.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT Rotation

Alexandre Belloni
 

Jon,

Once again, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from me to you at EOD 01/29/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spend enough
time on the topic.

I know Ross will walk you through the process but don't hesitate to
contact me if you have any question.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Alexandre Belloni
 

On 22/01/2021 06:57:13-0500, Jon Mason wrote:
Hello Alexandre,
I believe it was already discussed with you, but I'll take the next rotation.
Indeed, I will take this rotation and you'll get the next one.

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Jon Mason
 

Hello Alexandre,
I believe it was already discussed with you, but I'll take the next rotation.

Thanks,
Jon

On Thu, Jan 21, 2021 at 5:38 PM Alexandre Belloni
<alexandre.belloni@...> wrote:

Jon,

You are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT will rotate from Lee Chee to you at EOD 01/22/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spedn enough
time on the topic.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Introducing "SwatBot"

Richard Purdie
 

Hi All,

I have some good news, the SWAT django app I've talked about is nearing
completion and will soon be ready to use. We now have a live instance
and are checking how it works with live data from the autobuilder.

https://swatbot.yoctoproject.org/

It is a bit ugly and rough around the edges but it should be able to
display a list of the things SWAT need to attend too and allow them to
be marked as resolved, tracking the resolutions. This should mean that
the process is easier for the person doing SWAT and for us to make sure
nothing gets "lost" during handovers etc.

Source code is available:
http://git.yoctoproject.org/cgit.cgi/swatbot
and the notifier plugin that connects to buildbot:
http://git.yoctoproject.org/cgit.cgi/yocto-autobuilder2

I'm expecting that where it isn't doing what we need, we should be able
to get it into shape in relatively short order.

Alexandre and Bootlin are now coming up to speed and will be taking
over leading the SWAT process and as part of that, helping get the app
working for everyone. I think Alexandre will issue accounts to people
on SWAT as we rotate through the list.

Over the last week I've been worried some issues haven't been
logged/handled so I made a note of the ones we need to check have been
handled:

meta-arm edk2 intermittent failure needs bug:
https://autobuilder.yoctoproject.org/typhoon/#/builders/113/builds/629

update ltp bug:
https://autobuilder.yoctoproject.org/typhoon/#/builders/96/builds/1403

need a bug for slow boot failure intermittent issue?:
(was it slow, have we more logs, may be too late now?)
https://autobuilder.yoctoproject.org/typhoon/#/builders/109/builds/1873


intermittent ptest warnings, need to updates to bugs so we can track
frequency:
https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1435
https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1716
https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1443
https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1723
https://autobuilder.yoctoproject.org/typhoon/#/builders/82/builds/1439
https://autobuilder.yoctoproject.org/typhoon/#/builders/81/builds/1719

bitbake timeout bug needs updating adding this?
https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/1739

should be a multiprocess bug somewhere, add or reopen?
https://autobuilder.yoctoproject.org/typhoon/#/builders/86/builds/1729

qemuarm64 shutdown bug
https://autobuilder.yoctoproject.org/typhoon/#/builders/42/builds/2943

ping test fail:
https://autobuilder.yoctoproject.org/typhoon/#/builders/74/builds/2945

mips backtrace:
https://autobuilder.yoctoproject.org/typhoon/#/builders/74/builds/2942

There were a ton more failures but I think I've responded/handled the
others. I'll admit to focusing on getting the app done since I think it
should improve this process so much and help everyone.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Jon,

You are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT will rotate from Lee Chee to you at EOD 01/22/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spedn enough
time on the topic.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Build and Integration help from Bootlin

Richard Purdie
 

Hi All,

I'm pleased to be able to announce that Alexandre Belloni and some of
his colleagues from Bootlin are going to be helping us with the process
for testing and integrating patches within the project.

There are multiple reasons for this, not least that it helps not to
have a single point of failure and that I could do with the help and a
break! :)

As such, they are are going to be leading the SWAT process and you'll
therefore be seeing them giving initial responses on patches and
helping with the autobuilder failures.

I'd like to welcome them on board and hope everyone will support them
as we figure out how the finer details of this will work.

This is being made possible by support from the Yocto Project members
which is very much appreciated, thanks!

I will be continuing to do the final patch review and merge and am not
going anywhere, this is some very much welcomed support to try and
ensure we continue to scale and maintain our quality and testing.

Cheers,

Richard