Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: July 22, 2021 -- resend


Randy MacLeod
 

Resend:

The list:
Yocto discussion list <yocto@...>
didn't work and may be dead so switch to:
yocto@... <yocto@...>
../Randy



YP AB Intermittent failures meeting
===================================
July 22, 2021, 9 AM ET
https://windriver.zoom.us/j/3696693975

Attendees: Tony, Richard, Trevor, Randy, Tony


Summary:
========

ptest failures, somewhat improved AGAIN but still seeing
problems particularly with LTT-NG.


If anyone wants to help, we could use more eyes on the logs,
particularly the summary logs and understanding iostat #
when the dd test times out.



Plans for the week:
===================

All: Wait and see if the ptest failure rate continues to be lower
than previous weeks.

Richard:
Alex:
Sakib: hook more responsive load average in to latency test.
Trevor: patch to set PARALLEL_MAKE : -l 50
-> dunfell, gatesgarth, hardknott
Tony:
Saul: on vacation
Randy: Look at performance data


Meeting Notes:
==============

1, runqemu (same as last week so I'll drop this comment next week)

Tony having trouble with runqemu on some Wind River machine.
Richard has a fix for a race in runqemu in master-next.
These might be related but if not Tony should debug the
issue/ collect logs.

2. job server

- ninja could be patched with make's more responsive algorithm
next or is this good enough?

- Richard suggested that we extract make's code for measuring the load
average to a separate binary and run it in the periodic io latency
test. Also can we translate it to python?
- Trevor is working on this.

- performance build servers
- fewer failures there so no point in limiting jobs and
it would be disruptive to performance monitoring.


3. AB status

ptest cases are improving, we may be close to done!
Let's wait a week to see how things go.


4. Nothing new on this item this week:
Richard reported
- something really flaky going on with serial ports.
- particularly bad on qemuppc but also x86.
- related to Saul's QMP data dump?
- Juy 22: We didn't talk about this issue this week.

5. Sakib's improvements to the logging are merged.
We think Michael needs to update the script that generates the
web page. Randy/Sakib to talk with Michael.

6. (From July 8)
Richard says that we may need to redesign the data collection system
that Sakib's AB INT tests are based on.



Still relevant parts of
Previous Meeting Notes:
=======================


4. bitbake server timeout.

"Timeout while waiting for a reply from the bitbake server (60s)"

Randy mentioned that the bitbake server timeouts seen in the
Wind River build cluster have gone away after upgrading to
a newer version of docker.

Old: Docker Version: Docker version 18.09.4, build d14af54266
New: Docker Version: Docker version 20.10.7, build f0df350


Clearly the YP ABs aren't running in docker but what
about firmware and kernel tunings.

Michael,

Is the BIOS/firmware kept up to date on most nodes?
- July 22: This was done.


For the performance builder trend see:

https://autobuilder.yocto.io/pub/non-release/20210721-9/testresults/buildperf-centos7/perf-centos7.yoctoproject.org_master_20210721150057_1ad79313a5.html


https://autobuilder.yocto.io/pub/non-release/20210721-14/testresults/buildperf-ubuntu1604/perf-ubuntu1604_master_20210721210034_1ad79313a5.html


Summary,
- CentOS-7 seems to take less time (~ 1 min),
- Ubuntu-16.04 seems to take more (~ 5 min)
That's a bit surprising!
Randy to look at 62659 commit number in poky.


5. io stalls

Richard said that it would make sense to write an ftrace utility
/ script to monitor io latency and we could install it with sudo
Ch^W mentioned ftrace on IRC.
Sakib and Randy will work on that but not for a week or two.





../Randy

Join yocto@lists.yoctoproject.org to automatically receive all group messages.