Yocto Autobuilder: Latency Monitor and AB-INT - Meeting notes: June 17, 2021
Join Zoom Meeting - 9 AM ET
https://windriver.zoom.us/j/3696693975 Attendees: Alex, Richard, Saul, Randy, Tony, Trevor, Sakib Summary: Things are improving somewhat on the autobuilder, RCU stalls are the top problem now. 1. LTP kernel BUG: Many thanks to Paul Gortmaker for his work on this! 2. The most common problem now is the qemu RCU hang. For example these builds: https://autobuilder.yoctoproject.org/typhoon/#/builders/73/builds/3541/steps/13/logs/stdio https://autobuilder.yoctoproject.org/typhoon/#/builders/73/builds/3541/steps/13/logs/stdio Richards links on RCU stall detection, and tuning parameters: https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt https://lwn.net/Articles/777214/ Next: - Ask around for advice on qemu debugging. - RP thinks that the underlying system has a problem: CPU or other overload. We do see that there are two qemus that are using lots of CPU in the links above. Richard says that the likely activity is: - core-image-sato-sdk, compiler tests - core-image-sato lighter general tests Alex thinks that the particular workload is not significant. - run two qemu in a controlled env, with stress-ng. - iostat will help - Sakib. 3. Valgrind ptest results are getting better. 4. ptest issues are coming along, with util-linux being the next thing to be merged today likely. 5. On the ubuntu-18.04 builders, we seem to see issues there, we dont' know why, maybe only that we have more of those workers... Alex, could you possibly get failures per worker statistics? 6. discussed Sakib's summary script. It's coming along. TO DO: - special activities: rm (of trash), tar, qemu* - report all zombies (The current hoard are due to Paul Barker's patch) 7. make: job server - the fifo was being re-created by the wrapper on each call so Trevor will fix that. 8. From last week, I don't think we've increased the timeouts: - qemu-runner? timeout increase 120 -> 240 - ptest timeouts 300 -> 450? 8. Plans for the week: Richard: RCU stall Alex: Sakib: task summary Trevor: make job server Tony: ptests and work with upstream valgrind on fixing bugs. Saul: (1 week) have QMP deal with sigusr1 to close the QMP socket Randy: coffee, herd cats!! ../Randy |
|