Autobuilder reproducibility target changes


Richard Purdie
 

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Alexander Kanavin <alex.kanavin@...>
 

Cheers :) If there's something else I could help, tell.

One item is getting to the bottom of why it takes diffoscope beyond the heat death of the universe to render its verdict on some items.

Alex


On Sun, 14 Feb 2021 at 13:19, Richard Purdie <richard.purdie@...> wrote:
Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Richard Purdie
 

On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
Cheers :) If there's something else I could help, tell.
I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

One item is getting to the bottom of why it takes diffoscope beyond
the heat death of the universe to render its verdict on some items.
That would be really helpful to get to the bottom of. The vim-common
difference is actually really simple:

https://autobuilder.yocto.io/pub/repro-fail/oe-reproducible-20210213-0djxo1sn/packages/diff-html/

so I don't know why it took 5 hours to compute that. It suggests
something really silly/stupid is going on. diffoscope should be
amenable to fixes so it would be worth talking to them too...

(I have a fix for the locale problem in vim brewing, it needs a new
buildtools-extended-tarball)

Cheers,

Richard


Alexander Kanavin <alex.kanavin@...>
 

On Sun, 14 Feb 2021 at 16:17, Richard Purdie <richard.purdie@...> wrote:
On Sun, 2021-02-14 at 16:04 +0100, Alexander Kanavin wrote:
> Cheers :) If there's something else I could help, tell.

I'm going to try and get diffs of the remaining big package differences
and see where we stand. The two big ones I know of are go as a language
for reproducibility and perf. Perf will just be a shear pain to fix,
hopefully the kernel will take patches.

I did a bit of work on the go reproducibility, that has been preserved here:

I got there, but it took quite a bit of time (debugging go build process is extremely painful) and I'm not at all happy with the hacky/brittle things in the patch, so it's on hold for now - but anyone is welcome to take it and make it better, especially if they're go specialists.

Alex


Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)

the time difference being the system trying to run diffoscope on vim-
common :/.
I'm not sure that diffoscope is the culprit here. If you look at the
logs, you can see that there is only about 30 seconds between the
"Running diffoscope" log message and the end of the test. I suspect
something else is going wrong here. I can try to write up patch to try
and add more logging so we can more accurately pinpoint where it's
taking so long.



I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Joshua Watt <JPEWhacker@...>
 

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

1) It might explain why some of the reproducible results seem intermittent
2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.


the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Alexander Kanavin <alex.kanavin@...>
 

I’ve definitely seen diffoscope process take hours and hours and hours in local builds. Trying it with these vim packages locally should still be done.

Alex

On Sun 14. Feb 2021 at 20.18, Joshua Watt <jpewhacker@...> wrote:
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@...> wrote:
>
> Regular users of the autobuilder will note that I've split the
> reproducible builds test out of the main oe-selftest build and into its
> own target build. This is because that test tends to run for a lot
> longer time period and it helps to see the result separately.
>
> I've only done this for master. If gatesgarth and dunfell want to
> follow, that should be straight forward with a change to the branch in
> autobuilder-helper. Obviously we should ensure this is working ok with
> master first but so far so good.
>
> It has already highlighted the difference between a successful run:
>
> https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
> https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
> (took 3-4 hours)
>
> and failing two failing runs:
>
> https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
> https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
> (took 9 hours)

OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

 1) It might explain why some of the reproducible results seem intermittent
 2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.

>
> the time difference being the system trying to run diffoscope on vim-
> common :/.
>
> I'm aware I removed some recipes from the exclusions list after seeing
> multiple passing builds for all distros and we're now seeing test
> failures. My mistake was not waiting for the date to change and for
> builds to run on an autobuilder worker with a different umask.
>
> Meson is failing with a pyc file mismatch which diffoscope can't decode
> and despite trying for 5 hours, diffoscope hasn't given any data on why
> vim-common differs. I should have fixes in for quilt, valgrind, kernel-
> devsrc and cwautomacros. The umask fix may fix other issues too. Alex
> has improved the reporting so we can spot cases where exclusion is now
> longer needed.
>
> Cheers,
>
> Richard
>


Joshua Watt <JPEWhacker@...>
 

On Mon, Feb 15, 2021 at 12:21 AM Alexander Kanavin
<alex.kanavin@gmail.com> wrote:

I’ve definitely seen diffoscope process take hours and hours and hours in local builds. Trying it with these vim packages locally should still be done.
I forgot to mention that I did run diffoscope locally with the
offending vim packages and it took about 30 seconds (same as the AB
logs showed)


Alex

On Sun 14. Feb 2021 at 20.18, Joshua Watt <jpewhacker@gmail.com> wrote:

On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

1) It might explain why some of the reproducible results seem intermittent
2) It might explain why there is such a time disparity between the tests

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.


the time difference being the system trying to run diffoscope on vim-
common :/.

I'm aware I removed some recipes from the exclusions list after seeing
multiple passing builds for all distros and we're now seeing test
failures. My mistake was not waiting for the date to change and for
builds to run on an autobuilder worker with a different umask.

Meson is failing with a pyc file mismatch which diffoscope can't decode
and despite trying for 5 hours, diffoscope hasn't given any data on why
vim-common differs. I should have fixes in for quilt, valgrind, kernel-
devsrc and cwautomacros. The umask fix may fix other issues too. Alex
has improved the reporting so we can spot cases where exclusion is now
longer needed.

Cheers,

Richard


Richard Purdie
 

On Sun, 2021-02-14 at 13:17 -0600, Joshua Watt wrote:
On Sun, Feb 14, 2021 at 6:19 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:

Regular users of the autobuilder will note that I've split the
reproducible builds test out of the main oe-selftest build and into its
own target build. This is because that test tends to run for a lot
longer time period and it helps to see the result separately.

I've only done this for master. If gatesgarth and dunfell want to
follow, that should be straight forward with a change to the branch in
autobuilder-helper. Obviously we should ensure this is working ok with
master first but so far so good.

It has already highlighted the difference between a successful run:

https://autobuilder.yoctoproject.org/typhoon/#/builders/115/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/119/builds/2
(took 3-4 hours)

and failing two failing runs:

https://autobuilder.yoctoproject.org/typhoon/#/builders/116/builds/2
https://autobuilder.yoctoproject.org/typhoon/#/builders/118/builds/2
(took 9 hours)
OK, I read through the code and unfortunately found a bug: when
attempting to make sure the "B" build doesn't use sstate, I misspelled
the SSTATE_MIRRORS, which means that the B build could have been
pulling from the sstate mirror when it was not supposed to. This has a
few implications:

 1) It might explain why some of the reproducible results seem intermittent
 2) It might explain why there is such a time disparity between the tests
The "good" news is that this didn't affect the autobuilder as it sets
SSTATE_DIR to a common directory and doesn't use SSTATE_MIRRORS.

Unfortunately, while it probably will help the intermittent results,
it probably means that the tests taking 9 hours is what is "supposed"
to happen, and they happen to be shorter sometimes because the B build
is pulling from sstate when it's not supposed to.
I don't think we're to the bottom of this. If its not spending the time
in diffoscope, something seems to cause builds with differences to take
much longer...

Cheers,

Richard