Re: Intermittent network issue on centos7-ty-4


Richard Purdie
 

On Tue, 2021-10-19 at 12:43 +0100, Richard Purdie via lists.yoctoproject.org
wrote:
Yesterday there was some talk in irc about a breif networking failure on
centos7-ty-4. The build was:

https://autobuilder.yoctoproject.org/typhoon/#/builders/104/builds/3083

Showing these warnings:

stdio: WARNING: gtk+3-3.24.30-r0 do_package: Error contacting Hash Equivalence Server typhoon.yocto.io:8686: [Errno -2] Name or service not known
stdio: WARNING: gtk+3-3.24.30-r0 do_packagedata: Error contacting Hash Equivalence Server typhoon.yocto.io:8686: [Errno -2] Name or service not known

The worry being why was the DNS suddenly not working. Looking at the log:

/home/pokybuild/yocto-worker/genericx86-64-alt/build/build-renamed/tmp/work/core2-64-poky-linux/gtk+3/3.24.30-r0/temp/log.do_package.41319

The warning is near the end of that and the log timestamp is Oct 18 17:11 (i.e.
when it was last written to).

We didn't spot anything yesterday but I realised the systemd journalctl
timestamps are an hour different that the timestamp I displayed from the
filesystem thanks for daylight savings. Looking at journalctl an hour different,
I see:

Oct 18 16:00:03 centos7-ty-4.yocto.io systemd[1]: Removed slice User Slice of root.
Oct 18 16:01:03 centos7-ty-4.yocto.io systemd[1]: Created slice User Slice of root.
Oct 18 16:01:03 centos7-ty-4.yocto.io systemd[1]: Started Session 483 of user root.
Oct 18 16:01:03 centos7-ty-4.yocto.io CROND[25699]: (root) CMD (run-parts /etc/cron.hourly)
Oct 18 16:01:04 centos7-ty-4.yocto.io run-parts(/etc/cron.hourly)[25822]: starting 0anacron
Oct 18 16:01:04 centos7-ty-4.yocto.io run-parts(/etc/cron.hourly)[25923]: finished 0anacron
Oct 18 16:01:04 centos7-ty-4.yocto.io systemd[1]: Removed slice User Slice of root.
Oct 18 16:05:17 centos7-ty-4.yocto.io dhclient[3905]: DHCPREQUEST on enp4s0f0 to 172.29.10.1 port 67 (xid=0x1579bf79)
Oct 18 16:05:17 centos7-ty-4.yocto.io dhclient[3905]: DHCPACK from 172.29.10.1 (xid=0x1579bf79)
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2367] dhcp4 (enp4s0f0): address 172.29.10.2
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): plen 24 (255.255.255.0)
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): gateway 172.29.10.1
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): lease time 7200
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): hostname 'yct-ab02'
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): nameserver '172.29.10.1'
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp4 (enp4s0f0): domain name 'yocto.io'
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2373] dhcp (enp4s0f0): domain search 'pdx.yoctoproject.org.'
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2374] dhcp (enp4s0f0): domain search 'yocto.io.'
Oct 18 16:05:18 centos7-ty-4.yocto.io NetworkManager[1436]: <info> [1634573118.2374] dhcp4 (enp4s0f0): state changed bound -> bound
Oct 18 16:05:18 centos7-ty-4.yocto.io dhclient[3905]: bound to 172.29.10.2 -- renewal in 3024 seconds.
Oct 18 16:05:18 centos7-ty-4.yocto.io dbus[1414]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-d
Oct 18 16:05:18 centos7-ty-4.yocto.io systemd[1]: Starting Network Manager Script Dispatcher Service...
Oct 18 16:05:19 centos7-ty-4.yocto.io dbus[1414]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 18 16:05:19 centos7-ty-4.yocto.io systemd[1]: Started Network Manager Script Dispatcher Service.
Oct 18 16:05:19 centos7-ty-4.yocto.io nm-dispatcher[24137]: req:1 'dhcp4-change' [enp4s0f0]: new request (4 scripts)
Oct 18 16:05:19 centos7-ty-4.yocto.io nm-dispatcher[24137]: req:1 'dhcp4-change' [enp4s0f0]: start running ordered scripts...
Oct 18 16:10:05 centos7-ty-4.yocto.io systemd[1]: Created slice User Slice of root.
Oct 18 16:10:05 centos7-ty-4.yocto.io systemd[1]: Started Session 484 of user root.
Oct 18 16:10:05 centos7-ty-4.yocto.io CROND[26785]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Oct 18 16:10:08 centos7-ty-4.yocto.io systemd[1]: Removed slice User Slice of root.

i.e. it renewed the DHCP lease about that time and ran network manager scripts. 
This looks a bit like a smoking gun to me.

Not sure what to do about it but I think we may know where the networking
glitches come from.
Checking this again, I think the timestamps are not an hour different and I was
misreading some tests I made. The do_install task finished at 17:07 and the
do_package finished by 17:11 so we have a 4 minute window which does not match
the last dhcp refresh before the error at 16:55. It was a nice idea :/.

Cheers,

Richard

Join swat@lists.yoctoproject.org to automatically receive all group messages.