Date   

Re: Autobuilder reproducibility target changes

Joshua Watt <JPEWhacker@...>
 

On Mon, Feb 15, 2021 at 12:21 AM Alexander Kanavin
<alex.kanavin@gmail.com> wrote:

I’ve definitely seen diffoscope process take hours and hours and hours in local builds. Trying it with these vim packages locally should still be done.
I forgot to mention that I did run diffoscope locally with the
offending vim packages and it took about 30 seconds (same as the AB
logs showed)


Alex

On Sun 14. Feb 2021 at 20.18, Joshua Watt <jpewhacker@gmail.com> wrote:

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

1) It might explain why some of the reproducible results seem intermittent
2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.


the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello Ross,

Since I didn't get any reply from Jagadheesan and you were the next one
on the list (and we discussed that on IRC), you will be on SWAT duty
this week.

I know this is short notice and I'll try to assist as much as possible if
you need so.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: Autobuilder reproducibility target changes

Alexander Kanavin <alex.kanavin@...>
 

I’ve definitely seen diffoscope process take hours and hours and hours in local builds. Trying it with these vim packages locally should still be done.

Alex

On Sun 14. Feb 2021 at 20.18, Joshua Watt <jpewhacker@...> wrote:
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote:
>
> Regular users of the autobuilder will note that I've split the
> reproducible builds test out of the main oe-selftest build and into its
> own target build. This is because that test tends to run for a lot
> longer time period and it helps to see the result separately.
>
> I've only done this for master. If gatesgarth and dunfell want to
> follow, that should be straight forward with a change to the branch in
> autobuilder-helper. Obviously we should ensure this is working ok with
> master first but so far so good.
>
> It has already highlighted the difference between a successful run:
>
> https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
> https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
> (took 3-4 hours)
>
> and failing two failing runs:
>
> https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
> https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
> (took 9 hours)

OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

 1) It might explain why some of the reproducible results seem intermittent
 2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.

>
> the time difference being the system trying to run diffoscope on vim-
> common :/.
>
> I'm aware I removed some recipes from the exclusions list after seeing
> multiple passing builds for all distros and we're now seeing test
> failures. My mistake was not waiting for the date to change and for
> builds to run on an autobuilder worker with a different umask.
>
> Meson is failing with a pyc file mismatch which diffoscope can't decode
> and despite trying for 5 hours, diffoscope hasn't given any data on why
> vim-common differs. I should have fixes in for quilt, valgrind, kernel-
> devsrc and cwautomacros. The umask fix may fix other issues too. Alex
> has improved the reporting so we can spot cases where exclusion is now
> longer needed.
>
> Cheers,
>
> Richard
>


do_package unknown user build failure

Richard Purdie
 

Hi All,

There are a number of failures on the autobuilder in do_package with
odd unknown user issues. My guess is that its related to the new
buildtools tarball I configured in the helper. I'm going to guess we're
missing a glibc syscall with the new glibc.

I'll look into it as a priority.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

1) It might explain why some of the reproducible results seem intermittent
2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.


the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.
I'm not sure that diffoscope is the culprit here. If you look at the
logs, you can see that there is only about 30 seconds between the
"Running diffoscope" log message and the end of the test. I suspect
something else is going wrong here. I can try to write up patch to try
and add more logging so we can more accurately pinpoint where it's
taking so long.



I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Alexander Kanavin <alex.kanavin@...>
 

On Sun, 14 Feb 2021 at 16:17, Richard Purdie <richard.purdie@...> wrote:
On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
> Cheers :) If there's something else I could help, tell.

I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

I did a bit of work on the go reproducibility, that has been preserved here:

I got there, but it took quite a bit of time (debugging go build process is extremely painful) and I'm not at all happy with the hacky/brittle things in the patch, so it's on hold for now - but anyone is welcome to take it and make it better, especially if they're go specialists.

Alex


Re: Autobuilder reproducibility target changes

Richard Purdie
 

On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
Cheers :) If there's something else I could help, tell.
I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

One item is getting to the bottom of why it takes diffoscope beyond
the heat death of the universe to render its verdict on some items.
That would be really helpful to get to the bottom of. The vim-common
difference is actually really simple:

https://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20210213-0djxo1sn/packages/diff-html/

so I don't know why it took 5 hours to compute that. It suggests
something really silly/stupid is going on. diffoscope should be
amenable to fixes so it would be worth talking to them too...

(I have a fix for the locale problem in vim brewing, it needs a new
buildtools-extended-tarball)

Cheers,

Richard


Re: Autobuilder reproducibility target changes

Alexander Kanavin <alex.kanavin@...>
 

Cheers :) If there's something else I could help, tell.

One item is getting to the bottom of why it takes diffoscope beyond the heat death of the universe to render its verdict on some items.

Alex


On Sun, 14 Feb 2021 at 13:19, Richard Purdie <richard.purdie@...> wrote:
Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Autobuilder reproducibility target changes

Richard Purdie
 

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello Jagadheesan,

You are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from Minjae to you at EOD 2012-02-12.

Please reply to let me know whether you will be able to work on this
task.

I'll be available to walk you through the process on Monday, don't
hesitate to contact me by email or on IRC.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT statistics for week 05

Alexandre Belloni
 

Hello,

Here are the statistics for last week. Jon Mason was on SWAT duty.

413 failures were reported,
* 145 were triaged by Jon
- 6 were not to be triaged
- 7 were due to out of disk space issues
- 65 for a binutils issue Richard already sent an email for
- 21 were already fixed at the time they were triaged
- 11 for 3 issues for which emails were sent
- 8 were reoccurences of bugs 13802, 14002, 14170, 14181, 14200
- 10 occurences of new bug 14210
- 2 occurences of new bug 14212
- 3 occurences of new bug 14221
- 2 occurences of new bug 14222
- 3 occurences of new bug 14223
- 2 occurences of new bug 14224
- 1 for new bug 14225
- 4 occurences of new bug 14226

* 154 were triaged by Richard
- 148 because of a broken glibc upgrade patch
- 5 for a bitbake multiconfig change
- 1 was added to bug 14201

* 113 were triaged by me
- 106 were not to be triaged
- 2 were meta-oe build failures due to the autconf update
- 2 were cancelled with no other errors
- 1 was added to bug 14029
- 1 new bug opened: 14213
- 1 was due to the qmp patch and being handled by Saul

Again, the raw number of failures is not representative of the work that
has been done. Swatbot is now filtering the failures that are not to be
triaged or the cancellations without any other issues so we won't have
those anymore.

Regards,

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Controller out of space

Richard Purdie
 

Hi,

The autobuilder controller (typhoon) ran out of disk space and I ended
up having to restart buildbot after clearing some space with a
temporary fix until Michael can look at it. That meant the running
builds were interrupted and didn't restart automatically.

Cheers,

Richard


Switch to swatbot

Richard Purdie
 

Hi All!

I just wanted to let people know that we've officially switched from
using the wiki BuildLog over to "SwatBot". The most useful URL for SWAT
is:

https://swatbot.yoctoproject.org/mainindex/swat/

which lists all the "Pending" build issues which need triage. Anyone
with an account can go in and edit these into the correct states, the
idea of swat is to handle them and ensure we take actions against
failures/warnings.

The app has gone through quite a few improvements and tweaks over the
last couple of weeks and should be more useful now. We're happy to take
other improvement ideas too both to the app and process.

Alexandre should be able to help with any questions and give out
accounts as needed to SWAT members.

Cheers,

Richard


SWAT Rotation

Alexandre Belloni
 

Hello Minjae;

This time, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from Jon to you at EOD 2012-02-05.

Please reply to let me know whether you will be able to work on this
task.

I'll be available to walk you through the process on Monday from 9am to
11am CET, that should be 5pm to 7pm for you. If that doesn't work, I'll
have more availability on Tuesday.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT statistics for week 04

Alexandre Belloni
 

Hello,

I've been looking at what happened this week, I'm pretty sure the number
of triaged failures is not a proper metric because of the number of
repetition of a single issue so I tried to dig a bit deeper:

953 failures were reported,
* 347 were triaged by Richard
- 62 because of oeqa priority patches breaking things, Richard sent
a mail
- 285 for 3 cancelled builds he handled himself

* 606 were triaged by me
- 96 were not to be triaged (swatbot filter those yet)
- 100 were cancelled builds without any other issue (also, swatbot
doesn't filter those yet)
- 2 for misconfigured builds
- 2 were new bugs: 14198 and 14208
- 89 were added to existing bugs as new occurrences of 13802, 13935, 14028, 14029, 14181
- 12 were handled by sending 7 replies to the mailing lists, 4 from
Richard, 3 from me
- 45 were fixed by 2 commits already in tree at the time there were triaged
- 11 were due to the qmp patch and being handled by Saul
- 55 were due to the autoconf upgrade and are/were handled by
Richard
- 14 were a warning for fuzz on the at patches solving the autoconf upgrade
- 1 was handled in meta-oe
- 3 were an issue with the sudo archive being 0 byte on the mirror,
this has been handled by Michael Halstead
- 176 were issues already being handled by Richard: mesa
dependencies, multiple rdepends default value issues, ...
-> in those I found a warning in meta-intel and I sent a patch to fix it

Regards,

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Jon Mason <jdmason@...>
 

Hello Alexandre,
I have received this email and will take over tomorrow morning.

Thanks,
Jon

On Thu, Jan 28, 2021 at 7:07 PM Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:

Jon,

Once again, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from me to you at EOD 01/29/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spend enough
time on the topic.

I know Ross will walk you through the process but don't hesitate to
contact me if you have any question.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT Rotation

Alexandre Belloni
 

Jon,

Once again, you are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT duty will rotate from me to you at EOD 01/29/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spend enough
time on the topic.

I know Ross will walk you through the process but don't hesitate to
contact me if you have any question.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Alexandre Belloni
 

On 22/01/2021 06:57:13-0500, Jon Mason wrote:
Hello Alexandre,
I believe it was already discussed with you, but I'll take the next rotation.
Indeed, I will take this rotation and you'll get the next one.

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation

Jon Mason <jdmason@...>
 

Hello Alexandre,
I believe it was already discussed with you, but I'll take the next rotation.

Thanks,
Jon

On Thu, Jan 21, 2021 at 5:38 PM Alexandre Belloni
<alexandre.belloni@bootlin.com> wrote:

Jon,

You are the next one on the list
(https://wiki.yoctoproject.org/wiki/Yocto_Build_Failure_Swat_Team#Members)
and SWAT will rotate from Lee Chee to you at EOD 01/22/2021.

Please reply to let me know whether you will be able to work on this
task. Also, please update me if during the week, you can't spedn enough
time on the topic.

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

161 - 180 of 226