Date   

Re: check-layer-nightly failure

Bruce Ashfield <bruce.ashfield@...>
 

On Thu, Jul 7, 2022 at 9:14 AM Alexandre Belloni
<alexandre.belloni@...> wrote:

Hello,

Since python3-colorama got upgraded to 0.4.5 in meta-python,
check-layer-nightly is failing for meta-virtualization:

AssertionError: Adding layer meta-virtualization changed signatures.
23 signatures changed, initial differences (first hash before, second after):
python3-colorama:do_fetch: 0ac42383a6557a119738c7fda6a53a08837ffe2686b34f8fa5c2db83213bfe39 -> e2cfb137b2ed19476cbfba1402316e3433f7f03d1065fc8c74837ff631ff89c4
bitbake-diffsigs --task python3-colorama do_fetch --signature 0ac42383a6557a119738c7fda6a53a08837ffe2686b34f8fa5c2db83213bfe39 e2cfb137b2ed19476cbfba1402316e3433f7f03d1065fc8c74837ff631ff89c4
NOTE: Starting bitbake server...
Task dependencies changed from:
['BPN', 'PN', 'PV', 'PYPI_ARCHIVE_NAME', 'PYPI_PACKAGE', 'PYPI_PACKAGE_EXT', 'PYPI_SRC_URI', 'SPECIAL_PKGSUFFIX', 'SRCREV', 'SRC_URI', 'SRC_URI[sha256sum]', 'base_do_fetch', 'do_fetch[network]', 'pypi_package', 'pypi_src_uri']
to:
['BPN', 'PN', 'PV', 'PYPI_ARCHIVE_NAME', 'PYPI_PACKAGE', 'PYPI_PACKAGE_EXT', 'PYPI_SRC_URI', 'SPECIAL_PKGSUFFIX', 'SRCREV', 'SRC_URI', 'SRC_URI[md5sum]', 'SRC_URI[sha256sum]', 'base_do_fetch', 'do_fetch[network]', 'pypi_package', 'pypi_src_uri']
basehash changed from e3e476f44ff3905e371e1451a0e6e301baf5bdfb2f39c4140b75925fb331cd57 to 998a2a96caa4512c9c77acab928aa7246b2e1cecf44e59babf5fa94bcce4a413
List of dependencies for variable SRC_URI changed from '{'SRC_URI[sha256sum]', 'PYPI_SRC_URI'}' to '{'SRC_URI[md5sum]', 'SRC_URI[sha256sum]', 'PYPI_SRC_URI'}'
changed items: {'SRC_URI[md5sum]'}
Dependency on variable SRC_URI[md5sum] was added
Variable PV value changed from '0.4.5' to '0.4.4'
Variable SRC_URI[sha256sum] value changed from 'e6c6b4334fc50988a639d9b98aa429a0b57da6e17b9a44f0451f930b6967b7a4' to '5941b2b48a20143d2267e95b1c2a7603ce057ee39fd88e7329b0c292aa16869b'
docker-compose has a history of being sensitive to changes in this package.

Once again, I'll put on the record that we continue to ignore the lack
of an elegant way to deal with language layers and the insistence that
somehow the whole world of dependencies can align on one version of a
package.

I'll do a version bump this time, since it does look ok .. but I won't
be dropping the recipe from the layer, and I may have to just fork it
into a version specific package and handle the upgrade cadence in
meta-virtualization.

Bruce



--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II


check-layer-nightly failure

Alexandre Belloni
 

Hello,

Since python3-colorama got upgraded to 0.4.5 in meta-python,
check-layer-nightly is failing for meta-virtualization:

AssertionError: Adding layer meta-virtualization changed signatures.
23 signatures changed, initial differences (first hash before, second after):
python3-colorama:do_fetch: 0ac42383a6557a119738c7fda6a53a08837ffe2686b34f8fa5c2db83213bfe39 -> e2cfb137b2ed19476cbfba1402316e3433f7f03d1065fc8c74837ff631ff89c4
bitbake-diffsigs --task python3-colorama do_fetch --signature 0ac42383a6557a119738c7fda6a53a08837ffe2686b34f8fa5c2db83213bfe39 e2cfb137b2ed19476cbfba1402316e3433f7f03d1065fc8c74837ff631ff89c4
NOTE: Starting bitbake server...
Task dependencies changed from:
['BPN', 'PN', 'PV', 'PYPI_ARCHIVE_NAME', 'PYPI_PACKAGE', 'PYPI_PACKAGE_EXT', 'PYPI_SRC_URI', 'SPECIAL_PKGSUFFIX', 'SRCREV', 'SRC_URI', 'SRC_URI[sha256sum]', 'base_do_fetch', 'do_fetch[network]', 'pypi_package', 'pypi_src_uri']
to:
['BPN', 'PN', 'PV', 'PYPI_ARCHIVE_NAME', 'PYPI_PACKAGE', 'PYPI_PACKAGE_EXT', 'PYPI_SRC_URI', 'SPECIAL_PKGSUFFIX', 'SRCREV', 'SRC_URI', 'SRC_URI[md5sum]', 'SRC_URI[sha256sum]', 'base_do_fetch', 'do_fetch[network]', 'pypi_package', 'pypi_src_uri']
basehash changed from e3e476f44ff3905e371e1451a0e6e301baf5bdfb2f39c4140b75925fb331cd57 to 998a2a96caa4512c9c77acab928aa7246b2e1cecf44e59babf5fa94bcce4a413
List of dependencies for variable SRC_URI changed from '{'SRC_URI[sha256sum]', 'PYPI_SRC_URI'}' to '{'SRC_URI[md5sum]', 'SRC_URI[sha256sum]', 'PYPI_SRC_URI'}'
changed items: {'SRC_URI[md5sum]'}
Dependency on variable SRC_URI[md5sum] was added
Variable PV value changed from '0.4.5' to '0.4.4'
Variable SRC_URI[sha256sum] value changed from 'e6c6b4334fc50988a639d9b98aa429a0b57da6e17b9a44f0451f930b6967b7a4' to '5941b2b48a20143d2267e95b1c2a7603ce057ee39fd88e7329b0c292aa16869b'


--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [OE-core] perl makefile race - any make experts who can help?

Richard Purdie
 

On Sun, 2022-06-05 at 04:26 +0200, Jacob Kroon wrote:
On Sat, 4 Jun 2022, 19:40 Richard Purdie,
<richard.purdie@...> wrote:
On Sat, 2022-06-04 at 17:12 +0200, Jacob Kroon wrote:
On 6/4/22 16:55, Khem Raj wrote:


On Sat, Jun 4, 2022 at 6:23 AM Richard Purdie
<richard.purdie@...
<mailto:richard.purdie@...>> wrote:

     On Sat, 2022-06-04 at 13:36 +0100, Richard Purdie via
     lists.yoctoproject.org <http://lists.yoctoproject.org>
wrote:
      > On Sat, 2022-06-04 at 13:51 +0200, Alexander Kanavin
wrote:
      > > Here's something I didn't think of before. Has this
occurred
     anywhere
      > > else except Ubuntu 18.04?
      >
      > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096
     <https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096>
      >
      > I'm struggling to get the data out from the old builds,
one mentions
      > ubuntu1604, there is an ubuntu1804 on both x86 and arm
hosts.
      >
      > It is possible this is an ubuntu specific make issue or
a make bug.

     Ubuntu 18.04 uses make 4.1 which is old (Oct 2014).

     I noticed these patches from 2016:

   
 https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e8
319c2b153cd3d6d61e2c2882895e7c3a
   
 <https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e
8319c2b153cd3d6d61e2c2882895e7c3a>
   
 https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480a
e9cb8df4878286411f178d32db14eff0
   
 <https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480
ae9cb8df4878286411f178d32db14eff0>

     I think we may want to mandate a modern make for both this
class of
     issues and also perhaps for better loadavg support to keep
load under
     control on the autobuilders.

     I'm torn, on the one hand we need to test the distros
people use, on
     the other we do need to remove sources of intermittent
issues. I think
     this bug must be some issue with make itself.

     Adding a make-native dependency to perl would "hurt"
people on modern
     distros...


Make perhaps does not have many complex dependency needs so it
might not
be as bad
My master build is already building make-native due to a
dependency from
glibc, since 2018:

https://git.openembedded.org/openembedded-core/commit/?id=0cd89e4af625941f8ab8c033f72f900a2979b304

Don't know if that dependency is still valid though.
It is a fair point. We may as well add it to perl/perl-native.
Centos7
still has make 3.82 but I think we now already require buildtools
tarball there so we could probably drop the glibc dependency on
make-
native now.

Would it be a bad idea to add make-native to DEPENDS depending on
whether the host version of make is new enough or not ? Would it
break sstate cache reuse in some way ? 
We can't have a conditional dependency like that, the task checksums as
implemented today wouldn't work and it would break ssttate reuse.

Cheers,

Richard


Re: [OE-core] perl makefile race - any make experts who can help?

Richard Purdie
 

On Sat, 2022-06-04 at 17:12 +0200, Jacob Kroon wrote:
On 6/4/22 16:55, Khem Raj wrote:


On Sat, Jun 4, 2022 at 6:23 AM Richard Purdie
<richard.purdie@...
<mailto:richard.purdie@...>> wrote:

On Sat, 2022-06-04 at 13:36 +0100, Richard Purdie via
lists.yoctoproject.org <http://lists.yoctoproject.org> wrote:
> On Sat, 2022-06-04 at 13:51 +0200, Alexander Kanavin wrote:
> > Here's something I didn't think of before. Has this occurred
anywhere
> > else except Ubuntu 18.04?
>
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096
<https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096>
>
> I'm struggling to get the data out from the old builds, one mentions
> ubuntu1604, there is an ubuntu1804 on both x86 and arm hosts.
>
> It is possible this is an ubuntu specific make issue or a make bug.

Ubuntu 18.04 uses make 4.1 which is old (Oct 2014).

I noticed these patches from 2016:

https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e8319c2b153cd3d6d61e2c2882895e7c3a
<https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e8319c2b153cd3d6d61e2c2882895e7c3a>
https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480ae9cb8df4878286411f178d32db14eff0
<https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480ae9cb8df4878286411f178d32db14eff0>

I think we may want to mandate a modern make for both this class of
issues and also perhaps for better loadavg support to keep load under
control on the autobuilders.

I'm torn, on the one hand we need to test the distros people use, on
the other we do need to remove sources of intermittent issues. I think
this bug must be some issue with make itself.

Adding a make-native dependency to perl would "hurt" people on modern
distros...


Make perhaps does not have many complex dependency needs so it might not
be as bad
My master build is already building make-native due to a dependency from
glibc, since 2018:

https://git.openembedded.org/openembedded-core/commit/?id=0cd89e4af625941f8ab8c033f72f900a2979b304

Don't know if that dependency is still valid though.
It is a fair point. We may as well add it to perl/perl-native. Centos7
still has make 3.82 but I think we now already require buildtools
tarball there so we could probably drop the glibc dependency on make-
native now.

Cheers,

Richard


Re: [OE-core] perl makefile race - any make experts who can help?

Khem Raj <raj.khem@...>
 



On Sat, Jun 4, 2022 at 6:23 AM Richard Purdie <richard.purdie@...> wrote:
On Sat, 2022-06-04 at 13:36 +0100, Richard Purdie via
lists.yoctoproject.org wrote:
> On Sat, 2022-06-04 at 13:51 +0200, Alexander Kanavin wrote:
> > Here's something I didn't think of before. Has this occurred anywhere
> > else except Ubuntu 18.04?
>
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096
>
> I'm struggling to get the data out from the old builds, one mentions
> ubuntu1604, there is an ubuntu1804 on both x86 and arm hosts.
>
> It is possible this is an ubuntu specific make issue or a make bug.

Ubuntu 18.04 uses make 4.1 which is old (Oct 2014).

I noticed these patches from 2016:

https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e8319c2b153cd3d6d61e2c2882895e7c3a
https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480ae9cb8df4878286411f178d32db14eff0

I think we may want to mandate a modern make for both this class of
issues and also perhaps for better loadavg support to keep load under
control on the autobuilders.

I'm torn, on the one hand we need to test the distros people use, on
the other we do need to remove sources of intermittent issues. I think
this bug must be some issue with make itself.

Adding a make-native dependency to perl would "hurt" people on modern
distros...

Make perhaps does not have many complex dependency needs so it might not be as bad 

Second option is to mandate buildtools on this distro 
And add make to tarball 


Cheers,

Richard





Re: [OE-core] perl makefile race - any make experts who can help?

Richard Purdie
 

On Sat, 2022-06-04 at 13:36 +0100, Richard Purdie via
lists.yoctoproject.org wrote:
On Sat, 2022-06-04 at 13:51 +0200, Alexander Kanavin wrote:
Here's something I didn't think of before. Has this occurred anywhere
else except Ubuntu 18.04?
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096

I'm struggling to get the data out from the old builds, one mentions
ubuntu1604, there is an ubuntu1804 on both x86 and arm hosts.

It is possible this is an ubuntu specific make issue or a make bug.
Ubuntu 18.04 uses make 4.1 which is old (Oct 2014).

I noticed these patches from 2016:

https://git.savannah.gnu.org/cgit/make.git/commit/?id=9bb994e8319c2b153cd3d6d61e2c2882895e7c3a
https://git.savannah.gnu.org/cgit/make.git/commit/?id=4762480ae9cb8df4878286411f178d32db14eff0

I think we may want to mandate a modern make for both this class of
issues and also perhaps for better loadavg support to keep load under
control on the autobuilders.

I'm torn, on the one hand we need to test the distros people use, on
the other we do need to remove sources of intermittent issues. I think
this bug must be some issue with make itself.

Adding a make-native dependency to perl would "hurt" people on modern
distros...

Cheers,

Richard


Re: [OE-core] perl makefile race - any make experts who can help?

Richard Purdie
 

On Sat, 2022-06-04 at 13:51 +0200, Alexander Kanavin wrote:
Here's something I didn't think of before. Has this occurred anywhere
else except Ubuntu 18.04?
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14096

I'm struggling to get the data out from the old builds, one mentions
ubuntu1604, there is an ubuntu1804 on both x86 and arm hosts.

It is possible this is an ubuntu specific make issue or a make bug.

Cheers,

Richard


Re: [OE-core] perl makefile race - any make experts who can help?

Alexander Kanavin <alex.kanavin@...>
 

Here's something I didn't think of before. Has this occurred anywhere
else except Ubuntu 18.04?

Alex

On Sat, 4 Jun 2022 at 13:07, Richard Purdie
<richard.purdie@...> wrote:

Hi,

Occasionally we see perl do_install failures on the autobuilder. It
looks like some kind of make race. With the last failure I saved off
the build directory and have spent a lot of time staring at it but I
don't understand how what the logs say happened actually happened.

The build failure is here:

https://autobuilder.yoctoproject.org/typhoon/#/builders/37/builds/5315/steps/11/logs/stdio

and you can see it failed in do_install. The first error is;

| Couldn't copy cpan/podlators/blib/script/pod2text to /home/pokybuild/yocto-worker/genericx86-64/build/build/tmp/work/core2-64-poky-linux/perl/5.34.1-r0/image/usr/bin/pod2text: No such file or directory
| Couldn't chmod 755 /home/pokybuild/yocto-worker/genericx86-64/build/build/tmp/work/core2-64-poky-linux/perl/5.34.1-r0/image/usr/bin/pod2text: No such file or directory

then

installman: Can't open cpan/podlators/blib/script/pod2text: No such file or directory
| ABORTED

Looking at the build directory, cpan/podlators/blib/script/pod2text
isn't there, cpan/podlators/script/pod2text is. I went digging in the
compile log which didn't fail but is interesting in what it doesn't say
nd the ordering. The full log:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/log.do_compile.25823

or my edited down version:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/compile-cutdown.log

and note it never actually builds cpan/podlators/blib/script/pod2text.

If I go into the failed build dir, it does build it:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/manual-compile.log

I tried a few variations of deleting and rebuilding files there just to show behaviour.

The makefiles from the podlators directory:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/Makefile
https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/Makefile.PL


What really puzzles me is the "Manifying 2 pod documents" message
*before* it then generates scripts/pod2text from scripts/pod2text.PL in
the do_compile log. It appears to run the scripts/pod2text late and
never runs the $(INST_SCRIPT)/pod2text at all (hence no cp message and
the missing file).

The Makefile has default targets of pure_all and manifypods, that
message comes from manifypods but manifypods depends on pure_all.

pure_all is a double colon rule and I'm on less certain ground with how
those behave.

Are there any make experts out there who can spot the race in this
makefile?

Cheers,

Richard




perl makefile race - any make experts who can help?

Richard Purdie
 

Hi,

Occasionally we see perl do_install failures on the autobuilder. It
looks like some kind of make race. With the last failure I saved off
the build directory and have spent a lot of time staring at it but I
don't understand how what the logs say happened actually happened.

The build failure is here:

https://autobuilder.yoctoproject.org/typhoon/#/builders/37/builds/5315/steps/11/logs/stdio

and you can see it failed in do_install. The first error is;

| Couldn't copy cpan/podlators/blib/script/pod2text to /home/pokybuild/yocto-worker/genericx86-64/build/build/tmp/work/core2-64-poky-linux/perl/5.34.1-r0/image/usr/bin/pod2text: No such file or directory
| Couldn't chmod 755 /home/pokybuild/yocto-worker/genericx86-64/build/build/tmp/work/core2-64-poky-linux/perl/5.34.1-r0/image/usr/bin/pod2text: No such file or directory

then

installman: Can't open cpan/podlators/blib/script/pod2text: No such file or directory
| ABORTED

Looking at the build directory, cpan/podlators/blib/script/pod2text
isn't there, cpan/podlators/script/pod2text is. I went digging in the
compile log which didn't fail but is interesting in what it doesn't say
nd the ordering. The full log:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/log.do_compile.25823

or my edited down version:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/compile-cutdown.log

and note it never actually builds cpan/podlators/blib/script/pod2text.

If I go into the failed build dir, it does build it:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/manual-compile.log

I tried a few variations of deleting and rebuilding files there just to show behaviour.

The makefiles from the podlators directory:

https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/Makefile
https://autobuilder.yocto.io/pub/failed-builds-data/perl-race/Makefile.PL


What really puzzles me is the "Manifying 2 pod documents" message
*before* it then generates scripts/pod2text from scripts/pod2text.PL in
the do_compile log. It appears to run the scripts/pod2text late and
never runs the $(INST_SCRIPT)/pod2text at all (hence no cp message and
the missing file).

The Makefile has default targets of pure_all and manifypods, that
message comes from manifypods but manifypods depends on pure_all.

pure_all is a double colon rule and I'm on less certain ground with how
those behave.

Are there any make experts out there who can spot the race in this
makefile?

Cheers,

Richard


Re: Investigating a hung job on the autobuilder - a HOWTO

Ross Burton
 

A little script I have locally which is useful for finding bitbake instances:

 

for PID in $(pgrep '^Cooker$'); do

                pstree -p -l $PID

                echo

done

 

Output from a build I just fired:

 

Cooker(2380890)──Cooker(2380899)

                └─Worker(2381485)──Worker(2402174)

                                  ─attr:patch(2401804)───sh(2402170)

                                  ─bash-completion(2401738)

                                  ─bzip2:patch(2401753)

                                  ─cracklib:patch(2401737)

                                  ─curl-native:unp(2401716)───sh(2401909)──tar(2401930)

                                  │                                        └─xz(2401929)

                                  ─cve-update-db-n(2390133)

                                  ─edk2-firmware:u(2401360)

                                  ─flex:patch(2401736)

                                  ─gcc-source-12.1(2401703)───sh(2401764)──tar(2401778)

                                  │                                        └─xz(2401777)

                                  ─glib-2.0-native(2401763)───sh(2402164)

                                  ─glibc:patch(2401705)───sh(2402159)───quilt(2402175)

                                  ─gnutls-native:u(2401717)───sh(2401938)──tar(2402023)

                                  │                                        └─xz(2402020)

                                  ─initscripts:pat(2401783)───sh(2402167)

                                  ─libcap-ng:patch(2401757)───sh(2402173)

                                  ─libffi:patch(2401755)───sh(2402160)───xargs(2402177)───sed(2402181)

                                  ─libpam:unpack(2401745)

                                  ─libtirpc:patch(2401750)

                                  ─libtool-cross:p(2401719)

                                  ─libxcrypt:patch(2401725)

                                  ─libxml2-native:(2401734)

                                  ─linux-libc-head(2401701)───sh(2401818)──tar(2401824)

                                  │                                        └─xz(2401823)

                                  ─linux-yocto:ker(2400552)───run.do_kernel_c(2400801)

                                  ─lz4-native:comp(2400716)───run.do_compile.(2400804)───make(2400842)───make(2400866)───gcc-10(2400925)──as(2401679)

                                  │                                                                                                       └─cc1(2401678)

                                  ─ncurses:patch(2401729)───sh(2402156)───quilt(2402178)

                                  ─openssl:patch(2401816)

                                  ─opkg-utils:patc(2401721)

                                  ─python3-native:(2401712)───sh(2402155)──tar(2402180)

                                  │                                        └─xz(2402179)

                                  ─update-rc.d:pat(2401914)

                                  ─util-linux-libu(2401747)

                                  ─xz:patch(2401774)───sh(2402165)

                                  ─zlib:unpack(2401723)

                                  └─{Worker}(2381486)

 

Ross

 

From: richard.purdie@... <richard.purdie@...>
Date: Thursday, 19 May 2022 at 13:43
To: swat <swat@...>, openembedded-core <openembedded-core@...>
Cc: Luca Ceresoli <luca.ceresoli@...>, Alexandre Belloni <alexandre.belloni@...>, Ross Burton <Ross.Burton@...>
Subject: Investigating a hung job on the autobuilder - a HOWTO

Investigating a hung job on the autobuilder

We've yet another hung job on the autobuilder. I thought I'd write down
what I did to investigate it so others can learn how to do it and so I
can remember next time I need to do it too.

The problem was this job:

https://autobuilder.yoctoproject.org/typhoon/#/builders/97/builds/4563

qemuarm64-armhost running on an aarch64 worker, hung in do_testsdkext
for core-image-minimal. To get further you need ssh access. We do give
out ssh access to people who need it.

ssh to ubuntu1804-arm-1.yocto.io and then "sudo -iu pokybuild" lets us
have a look at what is going on. Easy first step is:

$ ps ax | grep armhost
28429 ?        S      0:00 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/run-config qemuarm64-armhost /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build lucaceresoli/master-next ssh://git@.../poky-contrib --sstateprefix  --buildappsrcrev  --publish-dir  --build-type full --workername ubuntu1804-arm-1 --build-url https://autobuilder.yoctoproject.org/typhoon/#builders/97/builds/4563 --results-dir /srv/autobuilder/autobuilder.yocto.io/pub/non-release/20220517-8/testresults --quietlogging --stepname test-targets --phase 2
28430 ?        S      0:00 /bin/sh -c . ./oe-init-build-env; /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/checkvnc; DISPLAY=:1 bitbake core-image-minimal:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage core-image-sato:do_testsdk  core-image-minimal:do_testsdkext core-image-sato:do_testsdkext -k
28460 ?        Sl     0:32 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake core-image-minimal:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage core-image-sato:do_testsdk core-image-minimal:do_testsdkext core-image-sato:do_testsdkext -k
28462 ?        Sl     6:43 bitbake-server /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-server decafbad 3 5 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake-cookerdaemon.log /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake.lock /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake.sock 0 None 0
28504 ?        Sl     0:17 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker decafbad
28515 ?        SNs    0:00 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker decafbad
29659 pts/1    S+     0:00 grep armhost
38713 ?        SN     0:00 /bin/sh -c . /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/environment-setup-cortexa57-poky-linux > /dev/null; devtool sdk-install meta-extsdk-toolchain
38719 ?        SN     0:01 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/bin/python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/sysroots/aarch64-pokysdk-linux/usr/bin/devtool sdk-install meta-extsdk-toolchain

i.e. looking for processes that are running in that build directory.

The relevant part of the process tree from "ps axjf":

 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1  3501  3500  3500 ?           -1 Sl    6000   2:29 /usr/bin/python3 /usr/local/bin/buildbot-worker start /home/pokybuild/yocto-worker/
 3501 28429 28429  3500 ?           -1 S     6000   0:00  \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/
28429 28430 28429  3500 ?           -1 S     6000   0:00      \_ /bin/sh -c . ./oe-init-build-env; /home/pokybuild/yocto-worker/qemuarm64-armhost/yoc
28430 28460 28429  3500 ?           -1 Sl    6000   0:32          \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake
    1 28462 28461 28461 ?           -1 Sl    6000   6:43 bitbake-server /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-serv
28462 28504 28461 28461 ?           -1 Sl    6000   0:17  \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker
28504 28515 28515 28515 ?           -1 SNs   6000   0:00      \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-wor
28515 38713 28515 28515 ?           -1 SN    6000   0:00          \_ /bin/sh -c . /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work
38713 38719 28515 28515 ?           -1 SN    6000   0:01              \_ /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm6

so the hung process is 38719.

I did have a look at the log files in the build directory but in this
case I didn't find anything. The bitbake-cookerdaemon.log in particular
can sometimes help. I had a look at the open files for the process:

$ ls /proc/38719/fd -la
total 0
dr-x------ 2 pokybuild pokybuild  0 May 19 12:08 .
dr-xr-xr-x 9 pokybuild pokybuild  0 May 19 11:59 ..
lr-x------ 1 pokybuild pokybuild 64 May 19 12:08 0 -> /dev/null
l-wx------ 1 pokybuild pokybuild 64 May 19 12:08 1 -> 'pipe:[122175861]'
l-wx------ 1 pokybuild pokybuild 64 May 19 12:08 2 -> 'pipe:[122175861]'
lr-x------ 1 pokybuild pokybuild 64 May 19 12:08 4 -> 'pipe:[122140435]'

but nothing interesting there.

I couldn't use strace as the permissions weren't setup for that without
root. We don't normally give out that access but to debug this I went
ahead and looked with strace:

strace -p 38719
strace: Process 38719 attached
futex(0xffff9c2da2bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff

so it is stuck in a futex. I then went ahead and tried gdb:

gdb -p 38719

BFD: warning: /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/bin/python3.10.real: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
BFD: warning: /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
Reading symbols from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6...(no debugging symbols found)...done.

(gdb) bt
Python Exception <class 'gdb.error'> Call Frame Instruction op 45 in vendor extension space is not handled on this architecture.:
#0  0x0000ffff9bd1e16c in ?? ()
   from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
Call Frame Instruction op 45 in vendor extension space is not handled on this architecture.

I've abbreviated the initial output. We can see that it is using python
3.10 from within builtdools tarball within an eSDK and we have no debug
symbols. The lack of a backtrace was worrying, after consulting some
ARM experts it was concluded the version of gdb was too old for the
newer glibc in buildtools.

After much thinking, I decided to enable and build a gdb-native. I did
something horrible and moved the lock and sock files out the way and
then a "bitbake gdb-native" in the same build directory. We don't have
a gdb-native recipe however adding a BBCLASSEXTEND = "native", a
PACKAGECONFIG to enable python in the native case and disable LLTNGUST
for native. A "bitbake gdb-native -c addto_recipe_sysroot" made it
available in the gdb-native WORKDIR where I could then run it from:

/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/aarch64-linux/gdb-native/12.1-r0/recipe-sysroot-native/usr/bin/gdb -p 3871

I did try using gdb-cross-aarch64 but that wouldn't attach to local
processes.

Sadly there were no symbols so the backtrace output was still not
useful other than seeing we were in some libpthread function
(unsurprisingly). I realised I could go into the python3-nativesdk
workdir, find the nativesdk-python3-dbg package (I used the deb) and
then copy the .debug files from that into the
testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/lib/ directory
for the python libs.

With that, I got some more interesting output:

(gdb) bt
#0  0x0000ffff9bd1e16c in ?? ()
   from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#1  0x0000ffff9bd20d88 in pthread_cond_wait ()
   from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#2  0x0000ffff9c0db3a8 in drop_gil (ceval=ceval@entry=0xffff9c2da208 <_PyRuntime+344>, ceval2=ceval2@entry=0xaaab09af3f20,
    tstate=tstate@entry=0xaaab09ab9ba0) at ../Python-3.10.4/Python/ceval_gil.h:182
#3  0x0000ffff9bfc8708 in eval_frame_handle_pending (tstate=0xaaab09ab9ba0) at ../Python-3.10.4/Python/ceval.c:1185
#4  _PyEval_EvalFrameDefault (tstate=0xaaab09ab9ba0, f=0xffff9445edd0, throwflag=<optimized out>) at ../Python-3.10.4/Python/ceval.c:1775
#5  0x0000ffff9c0de0a8 in _PyEval_EvalFrame (throwflag=0, f=0xffff9445edd0, tstate=0xaaab09ab9ba0)
    at ../Python-3.10.4/Include/internal/pycore_ceval.h:46
#6  _PyEval_Vector (tstate=0xaaab09ab9ba0, con=0xffff95a20050, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>,
    kwnames=<optimized out>) at ../Python-3.10.4/Python/ceval.c:5065
#7  0x0000ffff9c13a35c in _PyObject_VectorcallTstate (nargsf=9223372036854775809, kwnames=0x0, args=0xffffd3ebd780, callable=0xffff95a20040,
    tstate=0xaaab09ab9ba0) at ../Python-3.10.4/Include/cpython/abstract.h:114
#8  PyObject_CallOneArg (arg=0xffff95ac8130, func=0xffff95a20040) at ../Python-3.10.4/Include/cpython/abstract.h:184
#9  handle_weakrefs (old=0xaaab09af4190, unreachable=0xffffd3ebd748) at ../Python-3.10.4/Modules/gcmodule.c:887
#10 gc_collect_main (tstate=tstate@entry=0xaaab09ab9ba0, generation=generation@entry=2, n_collected=n_collected@entry=0xffffd3ebd800,
    n_uncollectable=n_uncollectable@entry=0xffffd3ebd808, nofail=nofail@entry=0) at ../Python-3.10.4/Modules/gcmodule.c:1281
#11 0x0000ffff9c13abfc in gc_collect_with_callback (tstate=tstate@entry=0xaaab09ab9ba0, generation=generation@entry=2)
    at ../Python-3.10.4/Modules/gcmodule.c:1413
#12 0x0000ffff9c13b1e4 in PyGC_Collect () at ../Python-3.10.4/Modules/gcmodule.c:2099
#13 0x0000ffff9c117538 in Py_FinalizeEx () at ../Python-3.10.4/Python/pylifecycle.c:1781
#14 Py_FinalizeEx () at ../Python-3.10.4/Python/pylifecycle.c:1703
#15 0x0000ffff9c1181e8 in Py_Exit (sts=0) at ../Python-3.10.4/Python/pylifecycle.c:2858
#16 0x0000ffff9c11cbc0 in handle_system_exit () at ../Python-3.10.4/Python/pythonrun.c:775
#17 _PyErr_PrintEx (tstate=0xaaab09ab9ba0, set_sys_last_vars=set_sys_last_vars@entry=1) at ../Python-3.10.4/Python/pythonrun.c:785
#18 0x0000ffff9c11cbfc in PyErr_PrintEx (set_sys_last_vars=set_sys_last_vars@entry=1) at ../Python-3.10.4/Python/pythonrun.c:880
#19 0x0000ffff9c11cc0c in PyErr_Print () at ../Python-3.10.4/Python/pythonrun.c:886
#20 0x0000ffff9c11d280 in _PyRun_SimpleFileObject (fp=fp@entry=0xaaab09ab2540, filename=filename@entry=0xffff95cbe6b0, closeit=closeit@entry=1,
    flags=flags@entry=0xffffd3ebda48) at ../Python-3.10.4/Python/pythonrun.c:462
#21 0x0000ffff9c11d50c in _PyRun_AnyFileObject (fp=fp@entry=0xaaab09ab2540, filename=filename@entry=0xffff95cbe6b0, closeit=closeit@entry=1,
    flags=flags@entry=0xffffd3ebda48) at ../Python-3.10.4/Python/pythonrun.c:90
#22 0x0000ffff9c1389c0 in pymain_run_file_obj (skip_source_first_line=0, filename=0xffff95cbe6b0, program_name=0xffff95dce3d0)
    at ../Python-3.10.4/Modules/main.c:353
#23 pymain_run_file (config=0xaaab09af4290) at ../Python-3.10.4/Modules/main.c:372
#24 pymain_run_python (exitcode=0xffffd3ebda44) at ../Python-3.10.4/Modules/main.c:587
#25 Py_RunMain () at ../Python-3.10.4/Modules/main.c:666
#26 0x0000ffff9c138e6c in pymain_main (args=0xffffd3ebdb20) at ../Python-3.10.4/Modules/main.c:696
#27 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Python-3.10.4/Modules/main.c:720
#28 0x0000ffff9bccb234 in ?? ()
   from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#29 0x0000ffff9bccb30c in __libc_start_main ()
   from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#30 0x0000aaaaca8cd9b0 in _start ()

I did try "py-bt" but that wasn't present. This can be fixed by
ensuring the source for python3-native is present ("bitbake python3-
native -c patch"), then

source build/tmp/work/aarch64-linux/python3-native/3.10.4-r0/Python-3.10.4/Tools/gdb/libpython.py

within gdb will add the python extensions. We can then get:

(gdb) py-bt
Traceback (most recent call first):
  File "/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/lib/python3.10/_weakrefset.py", line 39, in _remove
    def _remove(item, selfref=ref(self)):
  Garbage-collecting

i.e. it is in python garbage collection. This probably means a
multiprocessing.Lock() or threading.Lock() was left in the locked state
by some other process/thread which exited and this has hung the build.

I still haven't worked out how to know *which* lock this is or how to
debug any further but I was quite pleased to even be able to debug this
to this point.

If anyone does have any further tips on this it is currently still
running in case we can figure out which lock it is/was.

Cheers,

Richard

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


Investigating a hung job on the autobuilder - a HOWTO

Richard Purdie
 

Investigating a hung job on the autobuilder

We've yet another hung job on the autobuilder. I thought I'd write down
what I did to investigate it so others can learn how to do it and so I
can remember next time I need to do it too.

The problem was this job:

https://autobuilder.yoctoproject.org/typhoon/#/builders/97/builds/4563

qemuarm64-armhost running on an aarch64 worker, hung in do_testsdkext
for core-image-minimal. To get further you need ssh access. We do give
out ssh access to people who need it.

ssh to ubuntu1804-arm-1.yocto.io and then "sudo -iu pokybuild" lets us
have a look at what is going on. Easy first step is:

$ ps ax | grep armhost
28429 ? S 0:00 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/run-config qemuarm64-armhost /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build lucaceresoli/master-next ssh://git@.../poky-contrib --sstateprefix --buildappsrcrev --publish-dir --build-type full --workername ubuntu1804-arm-1 --build-url https://autobuilder.yoctoproject.org/typhoon/#builders/97/builds/4563 --results-dir /srv/autobuilder/autobuilder.yocto.io/pub/non-release/20220517-8/testresults --quietlogging --stepname test-targets --phase 2
28430 ? S 0:00 /bin/sh -c . ./oe-init-build-env; /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/checkvnc; DISPLAY=:1 bitbake core-image-minimal:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage core-image-sato:do_testsdk core-image-minimal:do_testsdkext core-image-sato:do_testsdkext -k
28460 ? Sl 0:32 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake core-image-minimal:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage core-image-sato:do_testsdk core-image-minimal:do_testsdkext core-image-sato:do_testsdkext -k
28462 ? Sl 6:43 bitbake-server /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-server decafbad 3 5 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake-cookerdaemon.log /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake.lock /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/bitbake.sock 0 None 0
28504 ? Sl 0:17 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker decafbad
28515 ? SNs 0:00 python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker decafbad
29659 pts/1 S+ 0:00 grep armhost
38713 ? SN 0:00 /bin/sh -c . /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/environment-setup-cortexa57-poky-linux > /dev/null; devtool sdk-install meta-extsdk-toolchain
38719 ? SN 0:01 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/bin/python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/sysroots/aarch64-pokysdk-linux/usr/bin/devtool sdk-install meta-extsdk-toolchain

i.e. looking for processes that are running in that build directory.

The relevant part of the process tree from "ps axjf":

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 3501 3500 3500 ? -1 Sl 6000 2:29 /usr/bin/python3 /usr/local/bin/buildbot-worker start /home/pokybuild/yocto-worker/
3501 28429 28429 3500 ? -1 S 6000 0:00 \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/yocto-autobuilder-helper/scripts/
28429 28430 28429 3500 ? -1 S 6000 0:00 \_ /bin/sh -c . ./oe-init-build-env; /home/pokybuild/yocto-worker/qemuarm64-armhost/yoc
28430 28460 28429 3500 ? -1 Sl 6000 0:32 \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake
1 28462 28461 28461 ? -1 Sl 6000 6:43 bitbake-server /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-serv
28462 28504 28461 28461 ? -1 Sl 6000 0:17 \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-worker
28504 28515 28515 28515 ? -1 SNs 6000 0:00 \_ python3 /home/pokybuild/yocto-worker/qemuarm64-armhost/build/bitbake/bin/bitbake-wor
28515 38713 28515 28515 ? -1 SN 6000 0:00 \_ /bin/sh -c . /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work
38713 38719 28515 28515 ? -1 SN 6000 0:01 \_ /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm6

so the hung process is 38719.

I did have a look at the log files in the build directory but in this
case I didn't find anything. The bitbake-cookerdaemon.log in particular
can sometimes help. I had a look at the open files for the process:

$ ls /proc/38719/fd -la
total 0
dr-x------ 2 pokybuild pokybuild 0 May 19 12:08 .
dr-xr-xr-x 9 pokybuild pokybuild 0 May 19 11:59 ..
lr-x------ 1 pokybuild pokybuild 64 May 19 12:08 0 -> /dev/null
l-wx------ 1 pokybuild pokybuild 64 May 19 12:08 1 -> 'pipe:[122175861]'
l-wx------ 1 pokybuild pokybuild 64 May 19 12:08 2 -> 'pipe:[122175861]'
lr-x------ 1 pokybuild pokybuild 64 May 19 12:08 4 -> 'pipe:[122140435]'

but nothing interesting there.

I couldn't use strace as the permissions weren't setup for that without
root. We don't normally give out that access but to debug this I went
ahead and looked with strace:

strace -p 38719
strace: Process 38719 attached
futex(0xffff9c2da2bc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff

so it is stuck in a futex. I then went ahead and tried gdb:

gdb -p 38719

BFD: warning: /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/bin/python3.10.real: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
BFD: warning: /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
Reading symbols from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6...(no debugging symbols found)...done.

(gdb) bt
Python Exception <class 'gdb.error'> Call Frame Instruction op 45 in vendor extension space is not handled on this architecture.:
#0 0x0000ffff9bd1e16c in ?? ()
from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
Call Frame Instruction op 45 in vendor extension space is not handled on this architecture.

I've abbreviated the initial output. We can see that it is using python
3.10 from within builtdools tarball within an eSDK and we have no debug
symbols. The lack of a backtrace was worrying, after consulting some
ARM experts it was concluded the version of gdb was too old for the
newer glibc in buildtools.

After much thinking, I decided to enable and build a gdb-native. I did
something horrible and moved the lock and sock files out the way and
then a "bitbake gdb-native" in the same build directory. We don't have
a gdb-native recipe however adding a BBCLASSEXTEND = "native", a
PACKAGECONFIG to enable python in the native case and disable LLTNGUST
for native. A "bitbake gdb-native -c addto_recipe_sysroot" made it
available in the gdb-native WORKDIR where I could then run it from:

/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/aarch64-linux/gdb-native/12.1-r0/recipe-sysroot-native/usr/bin/gdb -p 3871

I did try using gdb-cross-aarch64 but that wouldn't attach to local
processes.

Sadly there were no symbols so the backtrace output was still not
useful other than seeing we were in some libpthread function
(unsurprisingly). I realised I could go into the python3-nativesdk
workdir, find the nativesdk-python3-dbg package (I used the deb) and
then copy the .debug files from that into the
testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/lib/ directory
for the python libs.

With that, I got some more interesting output:

(gdb) bt
#0 0x0000ffff9bd1e16c in ?? ()
from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#1 0x0000ffff9bd20d88 in pthread_cond_wait ()
from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#2 0x0000ffff9c0db3a8 in drop_gil (ceval=ceval@entry=0xffff9c2da208 <_PyRuntime+344>, ceval2=ceval2@entry=0xaaab09af3f20,
tstate=tstate@entry=0xaaab09ab9ba0) at ../Python-3.10.4/Python/ceval_gil.h:182
#3 0x0000ffff9bfc8708 in eval_frame_handle_pending (tstate=0xaaab09ab9ba0) at ../Python-3.10.4/Python/ceval.c:1185
#4 _PyEval_EvalFrameDefault (tstate=0xaaab09ab9ba0, f=0xffff9445edd0, throwflag=<optimized out>) at ../Python-3.10.4/Python/ceval.c:1775
#5 0x0000ffff9c0de0a8 in _PyEval_EvalFrame (throwflag=0, f=0xffff9445edd0, tstate=0xaaab09ab9ba0)
at ../Python-3.10.4/Include/internal/pycore_ceval.h:46
#6 _PyEval_Vector (tstate=0xaaab09ab9ba0, con=0xffff95a20050, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>,
kwnames=<optimized out>) at ../Python-3.10.4/Python/ceval.c:5065
#7 0x0000ffff9c13a35c in _PyObject_VectorcallTstate (nargsf=9223372036854775809, kwnames=0x0, args=0xffffd3ebd780, callable=0xffff95a20040,
tstate=0xaaab09ab9ba0) at ../Python-3.10.4/Include/cpython/abstract.h:114
#8 PyObject_CallOneArg (arg=0xffff95ac8130, func=0xffff95a20040) at ../Python-3.10.4/Include/cpython/abstract.h:184
#9 handle_weakrefs (old=0xaaab09af4190, unreachable=0xffffd3ebd748) at ../Python-3.10.4/Modules/gcmodule.c:887
#10 gc_collect_main (tstate=tstate@entry=0xaaab09ab9ba0, generation=generation@entry=2, n_collected=n_collected@entry=0xffffd3ebd800,
n_uncollectable=n_uncollectable@entry=0xffffd3ebd808, nofail=nofail@entry=0) at ../Python-3.10.4/Modules/gcmodule.c:1281
#11 0x0000ffff9c13abfc in gc_collect_with_callback (tstate=tstate@entry=0xaaab09ab9ba0, generation=generation@entry=2)
at ../Python-3.10.4/Modules/gcmodule.c:1413
#12 0x0000ffff9c13b1e4 in PyGC_Collect () at ../Python-3.10.4/Modules/gcmodule.c:2099
#13 0x0000ffff9c117538 in Py_FinalizeEx () at ../Python-3.10.4/Python/pylifecycle.c:1781
#14 Py_FinalizeEx () at ../Python-3.10.4/Python/pylifecycle.c:1703
#15 0x0000ffff9c1181e8 in Py_Exit (sts=0) at ../Python-3.10.4/Python/pylifecycle.c:2858
#16 0x0000ffff9c11cbc0 in handle_system_exit () at ../Python-3.10.4/Python/pythonrun.c:775
#17 _PyErr_PrintEx (tstate=0xaaab09ab9ba0, set_sys_last_vars=set_sys_last_vars@entry=1) at ../Python-3.10.4/Python/pythonrun.c:785
#18 0x0000ffff9c11cbfc in PyErr_PrintEx (set_sys_last_vars=set_sys_last_vars@entry=1) at ../Python-3.10.4/Python/pythonrun.c:880
#19 0x0000ffff9c11cc0c in PyErr_Print () at ../Python-3.10.4/Python/pythonrun.c:886
#20 0x0000ffff9c11d280 in _PyRun_SimpleFileObject (fp=fp@entry=0xaaab09ab2540, filename=filename@entry=0xffff95cbe6b0, closeit=closeit@entry=1,
flags=flags@entry=0xffffd3ebda48) at ../Python-3.10.4/Python/pythonrun.c:462
#21 0x0000ffff9c11d50c in _PyRun_AnyFileObject (fp=fp@entry=0xaaab09ab2540, filename=filename@entry=0xffff95cbe6b0, closeit=closeit@entry=1,
flags=flags@entry=0xffffd3ebda48) at ../Python-3.10.4/Python/pythonrun.c:90
#22 0x0000ffff9c1389c0 in pymain_run_file_obj (skip_source_first_line=0, filename=0xffff95cbe6b0, program_name=0xffff95dce3d0)
at ../Python-3.10.4/Modules/main.c:353
#23 pymain_run_file (config=0xaaab09af4290) at ../Python-3.10.4/Modules/main.c:372
#24 pymain_run_python (exitcode=0xffffd3ebda44) at ../Python-3.10.4/Modules/main.c:587
#25 Py_RunMain () at ../Python-3.10.4/Modules/main.c:666
#26 0x0000ffff9c138e6c in pymain_main (args=0xffffd3ebdb20) at ../Python-3.10.4/Modules/main.c:696
#27 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Python-3.10.4/Modules/main.c:720
#28 0x0000ffff9bccb234 in ?? ()
from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#29 0x0000ffff9bccb30c in __libc_start_main ()
from /home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/lib/libc.so.6
#30 0x0000aaaaca8cd9b0 in _start ()

I did try "py-bt" but that wasn't present. This can be fixed by
ensuring the source for python3-native is present ("bitbake python3-
native -c patch"), then

source build/tmp/work/aarch64-linux/python3-native/3.10.4-r0/Python-3.10.4/Tools/gdb/libpython.py

within gdb will add the python extensions. We can then get:

(gdb) py-bt
Traceback (most recent call first):
File "/home/pokybuild/yocto-worker/qemuarm64-armhost/build/build/tmp/work/qemuarm64-poky-linux/core-image-minimal/1.0-r0/testsdkext/buildtools/sysroots/aarch64-pokysdk-linux/usr/lib/python3.10/_weakrefset.py", line 39, in _remove
def _remove(item, selfref=ref(self)):
Garbage-collecting

i.e. it is in python garbage collection. This probably means a
multiprocessing.Lock() or threading.Lock() was left in the locked state
by some other process/thread which exited and this has hung the build.

I still haven't worked out how to know *which* lock this is or how to
debug any further but I was quite pleased to even be able to debug this
to this point.

If anyone does have any further tips on this it is currently still
running in case we can figure out which lock it is/was.

Cheers,

Richard


SWAT Rotation schedule

Alexandre Belloni
 

Hello everyone,

We are actually at the end of the previous rotation, here is the new
schedule:

┌───────────────────────────────┬──────┬────────────┐
│ │ Week │ Start │
├───────────────────────────────┼──────┼────────────┤
│ Anibal Limon │ 17 │ 29/04/2022 │
│ Saul Wold │ 18 │ 06/05/2022 │
│ Alejandro Hernandez Samaniego │ 19 │ 13/05/2022 │
│ Lee Chee Yang │ 20 │ 20/05/2022 │
│ Paul Eggleton │ 21 │ 27/05/2022 │
│ Christopher Larson │ 22 │ 03/06/2022 │
│ Jon Mason │ 23 │ 10/06/2022 │
│ Naveen Saini │ 24 │ 17/06/2022 │
│ Minjae Kim │ 25 │ 24/06/2022 │
│ Jaga │ 26 │ 01/07/2022 │
│ Leo Sandoval │ 27 │ 08/07/2022 │
│ Ross Burton │ 28 │ 15/07/2022 │
└───────────────────────────────┴──────┴────────────┘

Anibal, let me know whether your are available this week. Else, I'll
reschedule.

When filling in bugs or bug comments, please also add the worker name
and whether it has an SSD. Currently, we have:

┌──────────────────┬─────────┐
│ alma8-ty-1 │ SSD │
│ alma8-ty-2 │ SSD │
│ debian11-ty-3 │ SSD │
│ fedora35-ty-1 │ SSD │
│ fedora35-ty-2 │ SSD │
│ opensuse153-ty-1 │ SSD │
│ stream8-ty-1 │ SSD │
│ ubuntu2004-ty-1 │ SSD │
│ ubuntu2110-ty-2 │ SSD │
│ centos7-ty-4 │ Non-SSD │
│ debian11-ty-1 │ Non-SSD │
│ fedora34-ty-1 │ Non-SSD │
│ perf-centos7 │ Non-SSD │
│ perf-debian11 │ Non-SSD │
│ perf-ubuntu1604 │ Non-SSD │
│ tumbleweed-ty-3 │ Non-SSD │
│ ubuntu1604-ty-1 │ Non-SSD │
│ ubuntu1804-arm-1 │ Non-SSD │
│ ubuntu1804-ty-3 │ Non-SSD │
│ ubuntu2004-arm-1 │ Non-SSD │
│ centos8-ty-1 │ Non-SSD │
│ centos8-ty-2 │ Non-SSD │
│ debian10-ty-1 │ Non-SSD │
│ debian11-ty-2 │ Non-SSD │
│ debian9-ty-2 │ Non-SSD │
│ opensuse154-ty-1 │ Non-SSD │
└──────────────────┴─────────┘

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation schedule

Leonardo Sandoval
 

Sure Alex & team, thanks.

Leo


On Tue, 19 Apr 2022 at 16:14, Alexandre Belloni <alexandre.belloni@...> wrote:
Hello Leo,

This is a reminder that you are on SWAT duty this week:

┌───────────────────────────────┬──────┬────────────┐
│                               │ Week │   Start    │
├───────────────────────────────┼──────┼────────────┤
│ Anibal Limon                  │    4 │ 28/01/2022 │
│ Saul Wold                     │    5 │ 04/02/2022 │
│ Alejandro Hernandez Samaniego │    6 │ 11/02/2022 │
│ Oleksiy Obitotskyy            │    7 │ 18/02/2022 │
│ Naveen Saini                  │    8 │ 25/02/2022 │
│ Paul Eggleton                 │    9 │ 04/03/2022 │
│ Christopher Larson            │   10 │ 11/03/2022 │
│ Jon Mason                     │   11 │ 18/03/2022 │
│ Lee Chee Yang                 │   12 │ 25/03/2022 │
│ Minjae Kim                    │   13 │ 01/04/2022 │
│ Jaga                          │   14 │ 08/04/2022 │
│ Leo Sandoval                  │   15 │ 15/04/2022 │
│ Ross Burton                   │   16 │ 22/04/2022 │
└───────────────────────────────┴──────┴────────────┘

When filling in bugs or bug comments, please also add the worker name
and whether it has an SSD. Currently, we have:

┌──────────────────┬─────────┐
│ alma8-ty-1       │ SSD     │
│ alma8-ty-2       │ SSD     │
│ debian11-ty-3    │ SSD     │
│ fedora35-ty-1    │ SSD     │
│ fedora35-ty-2    │ SSD     │
│ opensuse153-ty-1 │ SSD     │
│ stream8-ty-1     │ SSD     │
│ ubuntu2004-ty-1  │ SSD     │
│ ubuntu2110-ty-2  │ SSD     │
│ centos7-ty-4     │ Non-SSD │
│ debian11-ty-1    │ Non-SSD │
│ fedora34-ty-1    │ Non-SSD │
│ perf-centos7     │ Non-SSD │
│ perf-debian11    │ Non-SSD │
│ perf-ubuntu1604  │ Non-SSD │
│ tumbleweed-ty-3  │ Non-SSD │
│ ubuntu1604-ty-1  │ Non-SSD │
│ ubuntu1804-arm-1 │ Non-SSD │
│ ubuntu1804-ty-3  │ Non-SSD │
│ ubuntu2004-arm-1 │ Non-SSD │
│ centos8-ty-1     │ Non-SSD │
│ centos8-ty-2     │ Non-SSD │
│ debian10-ty-1    │ Non-SSD │
│ debian11-ty-2    │ Non-SSD │
│ debian9-ty-2     │ Non-SSD │
│ opensuse154-ty-1 │ Non-SSD │
└──────────────────┴─────────┘

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


SWAT Rotation schedule

Alexandre Belloni
 

Hello Leo,

This is a reminder that you are on SWAT duty this week:

┌───────────────────────────────┬──────┬────────────┐
│ │ Week │ Start │
├───────────────────────────────┼──────┼────────────┤
│ Anibal Limon │ 4 │ 28/01/2022 │
│ Saul Wold │ 5 │ 04/02/2022 │
│ Alejandro Hernandez Samaniego │ 6 │ 11/02/2022 │
│ Oleksiy Obitotskyy │ 7 │ 18/02/2022 │
│ Naveen Saini │ 8 │ 25/02/2022 │
│ Paul Eggleton │ 9 │ 04/03/2022 │
│ Christopher Larson │ 10 │ 11/03/2022 │
│ Jon Mason │ 11 │ 18/03/2022 │
│ Lee Chee Yang │ 12 │ 25/03/2022 │
│ Minjae Kim │ 13 │ 01/04/2022 │
│ Jaga │ 14 │ 08/04/2022 │
│ Leo Sandoval │ 15 │ 15/04/2022 │
│ Ross Burton │ 16 │ 22/04/2022 │
└───────────────────────────────┴──────┴────────────┘

When filling in bugs or bug comments, please also add the worker name
and whether it has an SSD. Currently, we have:

┌──────────────────┬─────────┐
│ alma8-ty-1 │ SSD │
│ alma8-ty-2 │ SSD │
│ debian11-ty-3 │ SSD │
│ fedora35-ty-1 │ SSD │
│ fedora35-ty-2 │ SSD │
│ opensuse153-ty-1 │ SSD │
│ stream8-ty-1 │ SSD │
│ ubuntu2004-ty-1 │ SSD │
│ ubuntu2110-ty-2 │ SSD │
│ centos7-ty-4 │ Non-SSD │
│ debian11-ty-1 │ Non-SSD │
│ fedora34-ty-1 │ Non-SSD │
│ perf-centos7 │ Non-SSD │
│ perf-debian11 │ Non-SSD │
│ perf-ubuntu1604 │ Non-SSD │
│ tumbleweed-ty-3 │ Non-SSD │
│ ubuntu1604-ty-1 │ Non-SSD │
│ ubuntu1804-arm-1 │ Non-SSD │
│ ubuntu1804-ty-3 │ Non-SSD │
│ ubuntu2004-arm-1 │ Non-SSD │
│ centos8-ty-1 │ Non-SSD │
│ centos8-ty-2 │ Non-SSD │
│ debian10-ty-1 │ Non-SSD │
│ debian11-ty-2 │ Non-SSD │
│ debian9-ty-2 │ Non-SSD │
│ opensuse154-ty-1 │ Non-SSD │
└──────────────────┴─────────┘

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: SWAT Rotation schedule

Duraisamy, Jagadheesan
 

Hi Alexandre,

Yes, I'm on it.

Thanks
Jaga

-----Original Message-----
From: Alexandre Belloni <alexandre.belloni@...>
Sent: Monday, April 11, 2022 4:12 PM
To: swat@...; Duraisamy, Jagadheesan <Jagadheesan_Duraisamy@...>
Subject: [EXTERNAL] [swat] SWAT Rotation schedule

Hello Jaga,

This is a reminder that you are on SWAT duty this week:

┌───────────────────────────────┬──────┬────────────┐
│ │ Week │ Start │
├───────────────────────────────┼──────┼────────────┤
│ Anibal Limon │ 4 │ 28/01/2022 │
│ Saul Wold │ 5 │ 04/02/2022 │
│ Alejandro Hernandez Samaniego │ 6 │ 11/02/2022 │
│ Oleksiy Obitotskyy │ 7 │ 18/02/2022 │
│ Naveen Saini │ 8 │ 25/02/2022 │
│ Paul Eggleton │ 9 │ 04/03/2022 │
│ Christopher Larson │ 10 │ 11/03/2022 │
│ Jon Mason │ 11 │ 18/03/2022 │
│ Lee Chee Yang │ 12 │ 25/03/2022 │
│ Minjae Kim │ 13 │ 01/04/2022 │
│ Jaga │ 14 │ 08/04/2022 │
│ Leo Sandoval │ 15 │ 15/04/2022 │
│ Ross Burton │ 16 │ 22/04/2022 │
└───────────────────────────────┴──────┴────────────┘

When filling in bugs or bug comments, please also add the worker name and whether it has an SSD. Currently, we have:

┌──────────────────┬─────────┐
│ alma8-ty-1 │ SSD │
│ alma8-ty-2 │ SSD │
│ debian11-ty-3 │ SSD │
│ fedora35-ty-1 │ SSD │
│ fedora35-ty-2 │ SSD │
│ opensuse153-ty-1 │ SSD │
│ stream8-ty-1 │ SSD │
│ ubuntu2004-ty-1 │ SSD │
│ ubuntu2110-ty-2 │ SSD │
│ centos7-ty-4 │ Non-SSD │
│ debian11-ty-1 │ Non-SSD │
│ fedora34-ty-1 │ Non-SSD │
│ perf-centos7 │ Non-SSD │
│ perf-debian11 │ Non-SSD │
│ perf-ubuntu1604 │ Non-SSD │
│ tumbleweed-ty-3 │ Non-SSD │
│ ubuntu1604-ty-1 │ Non-SSD │
│ ubuntu1804-arm-1 │ Non-SSD │
│ ubuntu1804-ty-3 │ Non-SSD │
│ ubuntu2004-arm-1 │ Non-SSD │
│ centos8-ty-1 │ Non-SSD │
│ centos8-ty-2 │ Non-SSD │
│ debian10-ty-1 │ Non-SSD │
│ debian11-ty-2 │ Non-SSD │
│ debian9-ty-2 │ Non-SSD │
│ opensuse154-ty-1 │ Non-SSD │
└──────────────────┴─────────┘

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://urldefense.com/v3/__https://bootlin.com__;!!CQl3mcHX2A!WiNYBtRLnJAZ5bVrGReTuwh_a8Fh7D410_ngYIXbiz2SedpFJkmH6f6FTn9ZP-QcQucCqkct0A$


SWAT Rotation schedule

Alexandre Belloni
 

Hello Jaga,

This is a reminder that you are on SWAT duty this week:

┌───────────────────────────────┬──────┬────────────┐
│ │ Week │ Start │
├───────────────────────────────┼──────┼────────────┤
│ Anibal Limon │ 4 │ 28/01/2022 │
│ Saul Wold │ 5 │ 04/02/2022 │
│ Alejandro Hernandez Samaniego │ 6 │ 11/02/2022 │
│ Oleksiy Obitotskyy │ 7 │ 18/02/2022 │
│ Naveen Saini │ 8 │ 25/02/2022 │
│ Paul Eggleton │ 9 │ 04/03/2022 │
│ Christopher Larson │ 10 │ 11/03/2022 │
│ Jon Mason │ 11 │ 18/03/2022 │
│ Lee Chee Yang │ 12 │ 25/03/2022 │
│ Minjae Kim │ 13 │ 01/04/2022 │
│ Jaga │ 14 │ 08/04/2022 │
│ Leo Sandoval │ 15 │ 15/04/2022 │
│ Ross Burton │ 16 │ 22/04/2022 │
└───────────────────────────────┴──────┴────────────┘

When filling in bugs or bug comments, please also add the worker name
and whether it has an SSD. Currently, we have:

┌──────────────────┬─────────┐
│ alma8-ty-1 │ SSD │
│ alma8-ty-2 │ SSD │
│ debian11-ty-3 │ SSD │
│ fedora35-ty-1 │ SSD │
│ fedora35-ty-2 │ SSD │
│ opensuse153-ty-1 │ SSD │
│ stream8-ty-1 │ SSD │
│ ubuntu2004-ty-1 │ SSD │
│ ubuntu2110-ty-2 │ SSD │
│ centos7-ty-4 │ Non-SSD │
│ debian11-ty-1 │ Non-SSD │
│ fedora34-ty-1 │ Non-SSD │
│ perf-centos7 │ Non-SSD │
│ perf-debian11 │ Non-SSD │
│ perf-ubuntu1604 │ Non-SSD │
│ tumbleweed-ty-3 │ Non-SSD │
│ ubuntu1604-ty-1 │ Non-SSD │
│ ubuntu1804-arm-1 │ Non-SSD │
│ ubuntu1804-ty-3 │ Non-SSD │
│ ubuntu2004-arm-1 │ Non-SSD │
│ centos8-ty-1 │ Non-SSD │
│ centos8-ty-2 │ Non-SSD │
│ debian10-ty-1 │ Non-SSD │
│ debian11-ty-2 │ Non-SSD │
│ debian9-ty-2 │ Non-SSD │
│ opensuse154-ty-1 │ Non-SSD │
└──────────────────┴─────────┘

Thanks!

--
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific

Richard Purdie
 

On Fri, 2022-04-08 at 12:55 +0200, Jan-Simon Moeller wrote:
If that fixes a lot of intermittent issues - of course.

Just thinking loud:
What about making main.conf's 'Blacklist' changeable by a config
variable we can set ?
Autobuilder could have this as 'eth0' , we could ignore a different
interface easily then (or none).
Not sure if eth0 should be in the list by default for everyone ?

E.g. we set
meta-agl/meta-agl-core/recipes-connectivity/connman/files/main.conf:NetworkInterfaceBlacklist=vmnet,vboxnet,virbr,ifb,meth

In one case we grep or sed ...
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:if
! grep '^NetworkInterfaceBlacklist=' ${CONNMAN_CONF} | grep -q $1;
then
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:
sed -i "s/^\(NetworkInterfaceBlacklist=.*\)/\1,$1/" ${CONNMAN_CONF}
meta-agl-demo/recipes-connectivity/connman/connman_agldemo.inc:
sed -i 's/^\(NetworkInterfaceBlacklist=.*\)/\1,eth1/'
${D}${sysconfdir}/connman/main.conf

@Scott, whats your take ?
The AGL file[1] appears to have another entry in it for FallbackTimeservers. It
is also complicated by oe-core only doing this for qemu* machines. I'm not sure
a single variable is going to work well.

It may be easier for AGL to change to tweaking connman-conf and then you can
just override the OE-Core file?

[1]
https://git.automotivelinux.org/AGL/meta-agl/tree/meta-agl-core/recipes-connectivity/connman/files/main.conf?h=next

Cheers,

Richard


Re: [OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific

Jan Simon Moeller
 

BTW, in our network-booted lava tests, we use this in the initrd even:
The initrd mounts a nbd device as root. Any network blib (down/up)
will make things fail.
Alittle ugly, but works ...

# workaround for connman (avoid bringing down the network interface
used for booting, disable DNS proxy)
if [[ -f /lib/systemd/system/connman.service ]]; then
newopts="-r -n"
iface=$(find_active_interface)
[[ -n "$iface" ]] && newopts="$newopts -I $iface"

log_info "Adjusting Connman command line. Will be: 'connmand $newopts'"
sed -i "s|connmand -n\$|connmand $newopts|g"
/lib/systemd/system/connman.service
fi


Best regards,
Jan-Simon

------
Jan-Simon Möller
AGL Release Manager
The Linux Foundation

Visit us at:
www.automotivegradelinux.org
lists.automotivelinux.org
www.linuxfoundation.org

On Fri, Apr 8, 2022 at 12:55 PM Jan Simon Moeller via
lists.yoctoproject.org
<jsmoeller=linuxfoundation.org@...> wrote:


If that fixes a lot of intermittent issues - of course.

Just thinking loud:
What about making main.conf's 'Blacklist' changeable by a config
variable we can set ?
Autobuilder could have this as 'eth0' , we could ignore a different
interface easily then (or none).
Not sure if eth0 should be in the list by default for everyone ?

E.g. we set
meta-agl/meta-agl-core/recipes-connectivity/connman/files/main.conf:NetworkInterfaceBlacklist=vmnet,vboxnet,virbr,ifb,meth

In one case we grep or sed ...
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:if
! grep '^NetworkInterfaceBlacklist=' ${CONNMAN_CONF} | grep -q $1;
then
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:
sed -i "s/^\(NetworkInterfaceBlacklist=.*\)/\1,$1/" ${CONNMAN_CONF}
meta-agl-demo/recipes-connectivity/connman/connman_agldemo.inc:
sed -i 's/^\(NetworkInterfaceBlacklist=.*\)/\1,eth1/'
${D}${sysconfdir}/connman/main.conf

@Scott, whats your take ?

Best regards,
Jan-Simon

On Fri, Apr 8, 2022 at 9:09 AM Richard Purdie
<richard.purdie@...> wrote:

Hi Jan-Simon/Scott,

The attached change is going to break meta-agl:

https://autobuilder.yoctoproject.org/typhoon/#/builders/120/builds/1032

I think what OE-Core plans to do with a config file isn't unreasonable and it
does fix autobuilder intermittent failures that are driving us slightly crazy so
I'd really like to merge it ASAP really.

I think there should be options for sorting this on the AGL side. We probably do
need to make this change in kirkstone too.

Cheers,

Richard



---------- Forwarded message ----------
From: Alexander Kanavin <alex.kanavin@...>
To: openembedded-core@...
Cc: Alexander Kanavin <alex@...>
Bcc:
Date: Thu, 7 Apr 2022 19:00:13 +0200
Subject: [OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific
Signed-off-by: Alexander Kanavin <alex@...>
---
meta/recipes-connectivity/connman/connman-conf.bb | 7 +++++--
meta/recipes-connectivity/connman/connman-conf/main.conf | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)
create mode 100644 meta/recipes-connectivity/connman/connman-conf/main.conf

diff --git a/meta/recipes-connectivity/connman/connman-conf.bb b/meta/recipes-connectivity/connman/connman-conf.bb
index 6b9207c4cb..7959ed8e50 100644
--- a/meta/recipes-connectivity/connman/connman-conf.bb
+++ b/meta/recipes-connectivity/connman/connman-conf.bb
@@ -6,6 +6,9 @@ LIC_FILES_CHKSUM = "file://${COREBASE}/meta/files/common-licenses/GPL-2.0-only;m

PR = "r2"

+SRC_URI = "file://main.conf \
+ "
+
S = "${WORKDIR}"

PACKAGE_ARCH = "${MACHINE_ARCH}"
@@ -14,6 +17,6 @@ FILES:${PN} = "${sysconfdir}/*"

# Kernel IP-Config is perfectly capable of setting up networking passed in via ip=
do_install:append:qemuall() {
- mkdir -p ${D}${sysconfdir}/default
- echo "export EXTRA_PARAM=\"-I eth0\"" > ${D}${sysconfdir}/default/connman
+ mkdir -p ${D}${sysconfdir}/connman
+ cp ${S}/main.conf ${D}${sysconfdir}/connman/main.conf
}
diff --git a/meta/recipes-connectivity/connman/connman-conf/main.conf b/meta/recipes-connectivity/connman/connman-conf/main.conf
new file mode 100644
index 0000000000..a394e8f25b
--- /dev/null
+++ b/meta/recipes-connectivity/connman/connman-conf/main.conf
@@ -0,0 +1,2 @@
+[General]
+NetworkInterfaceBlacklist = eth0
--
2.30.2







Re: [OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific

Jan Simon Moeller
 

If that fixes a lot of intermittent issues - of course.

Just thinking loud:
What about making main.conf's 'Blacklist' changeable by a config
variable we can set ?
Autobuilder could have this as 'eth0' , we could ignore a different
interface easily then (or none).
Not sure if eth0 should be in the list by default for everyone ?

E.g. we set
meta-agl/meta-agl-core/recipes-connectivity/connman/files/main.conf:NetworkInterfaceBlacklist=vmnet,vboxnet,virbr,ifb,meth

In one case we grep or sed ...
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:if
! grep '^NetworkInterfaceBlacklist=' ${CONNMAN_CONF} | grep -q $1;
then
meta-agl-demo/recipes-config/cluster-demo-network-config/files/cluster-demo-network-conf.sh:
sed -i "s/^\(NetworkInterfaceBlacklist=.*\)/\1,$1/" ${CONNMAN_CONF}
meta-agl-demo/recipes-connectivity/connman/connman_agldemo.inc:
sed -i 's/^\(NetworkInterfaceBlacklist=.*\)/\1,eth1/'
${D}${sysconfdir}/connman/main.conf

@Scott, whats your take ?

Best regards,
Jan-Simon

On Fri, Apr 8, 2022 at 9:09 AM Richard Purdie
<richard.purdie@...> wrote:

Hi Jan-Simon/Scott,

The attached change is going to break meta-agl:

https://autobuilder.yoctoproject.org/typhoon/#/builders/120/builds/1032

I think what OE-Core plans to do with a config file isn't unreasonable and it
does fix autobuilder intermittent failures that are driving us slightly crazy so
I'd really like to merge it ASAP really.

I think there should be options for sorting this on the AGL side. We probably do
need to make this change in kirkstone too.

Cheers,

Richard



---------- Forwarded message ----------
From: Alexander Kanavin <alex.kanavin@...>
To: openembedded-core@...
Cc: Alexander Kanavin <alex@...>
Bcc:
Date: Thu, 7 Apr 2022 19:00:13 +0200
Subject: [OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific
Signed-off-by: Alexander Kanavin <alex@...>
---
meta/recipes-connectivity/connman/connman-conf.bb | 7 +++++--
meta/recipes-connectivity/connman/connman-conf/main.conf | 2 ++
2 files changed, 7 insertions(+), 2 deletions(-)
create mode 100644 meta/recipes-connectivity/connman/connman-conf/main.conf

diff --git a/meta/recipes-connectivity/connman/connman-conf.bb b/meta/recipes-connectivity/connman/connman-conf.bb
index 6b9207c4cb..7959ed8e50 100644
--- a/meta/recipes-connectivity/connman/connman-conf.bb
+++ b/meta/recipes-connectivity/connman/connman-conf.bb
@@ -6,6 +6,9 @@ LIC_FILES_CHKSUM = "file://${COREBASE}/meta/files/common-licenses/GPL-2.0-only;m

PR = "r2"

+SRC_URI = "file://main.conf \
+ "
+
S = "${WORKDIR}"

PACKAGE_ARCH = "${MACHINE_ARCH}"
@@ -14,6 +17,6 @@ FILES:${PN} = "${sysconfdir}/*"

# Kernel IP-Config is perfectly capable of setting up networking passed in via ip=
do_install:append:qemuall() {
- mkdir -p ${D}${sysconfdir}/default
- echo "export EXTRA_PARAM=\"-I eth0\"" > ${D}${sysconfdir}/default/connman
+ mkdir -p ${D}${sysconfdir}/connman
+ cp ${S}/main.conf ${D}${sysconfdir}/connman/main.conf
}
diff --git a/meta/recipes-connectivity/connman/connman-conf/main.conf b/meta/recipes-connectivity/connman/connman-conf/main.conf
new file mode 100644
index 0000000000..a394e8f25b
--- /dev/null
+++ b/meta/recipes-connectivity/connman/connman-conf/main.conf
@@ -0,0 +1,2 @@
+[General]
+NetworkInterfaceBlacklist = eth0
--
2.30.2




[OE-core] [PATCH 01/17] connman-conf: ignore eth0 in qemu in a way that is not sysvinit-specific

Richard Purdie
 

Hi Jan-Simon/Scott,

The attached change is going to break meta-agl:

https://autobuilder.yoctoproject.org/typhoon/#/builders/120/builds/1032

I think what OE-Core plans to do with a config file isn't unreasonable and it
does fix autobuilder intermittent failures that are driving us slightly crazy so
I'd really like to merge it ASAP really.

I think there should be options for sorting this on the AGL side. We probably do
need to make this change in kirkstone too.

Cheers,

Richard

1 - 20 of 289