Date
1  1 of 1
[PATCH yoctoautobuilderhelper 1/4] config.json: add "collectdata" template  Build: 202104201
Randy MacLeod
Summary:
Build: 202104201 had 23 triggers and isn't really worth analyzing. Stay tuned for what is hopefully better data with a higher timeout threshold. There is a nice looking pair of graph attached though! :) ../Randy On 20210415 4:48 p.m., Randy MacLeod wrote: On 20210415 1:55 p.m., Randy MacLeod wrote: Should we increase the interval to 30, 60, ore more seconds?I've bumped the interval to 60 seconds. I spent some time looking at the first bit of data along with Sakib and Saul from time to time. General conclusions:Still true. 2. xz might be a problem but we're not sure yet.Added to graph attached. 3. We need more data and tools and time to think about it. To Do:Still to do, maybe. It's not clear that it's useful yet. 2. sometimes we see:With a 60 second interval, this does NOT happen. 3. tail the cooker console in addition to top. Present that before top.Still to do. This can be done via the cooker log. We can parse the file and do: 1. list recent tasks regardless of whether they have completed: tail 20? 2. list tasks that have started but not completed. This email is about: https://autobuilder.yocto.io/pub/nonrelease/202104201/ It was a 'quick' build. There was only 1 log file produced. It used the master so the timeout was still 5 seconds. There were 23 (!!) times that the dd time exceeded the 15 (ACTUALLY 5) second limit out of a total of 1504 invocations and those triggers were captured by 1 log file: testresults/qemux86world/202104200343/host_stats_0_top.txt Splitting each top output into a separate file as before, now we have 23 log files: How big are these files, ie how many process/kernel threads were running when top ran? $ wc l hoststats0*  sort n 699 hoststats03:42:4023.log 704 hoststats02:20:415.log 720 hoststats03:33:4121.log 733 hoststats02:19:404.log 743 hoststats03:32:4120.log 752 hoststats02:25:456.log 753 hoststats02:02:463.log 760 hoststats03:36:0722.log 784 hoststats02:01:482.log 784 hoststats03:30:4318.log 802 hoststats02:45:5510.log 807 hoststats03:31:4219.log 816 hoststats03:29:4917.log 829 hoststats03:19:5015.log 845 hoststats02:41:408.log 851 hoststats02:00:571.log 899 hoststats02:32:477.log 906 hoststats03:18:4114.log 925 hoststats03:28:2416.log 947 hoststats02:42:419.log 1084 hoststats03:16:5013.log 1204 hoststats02:54:4311.log 1314 hoststats03:03:4112.log 19661 total I noticed that several but not all log files were running xz: $ for i in `ls hoststats*`; do echo n $i ": "; grep "xz " $i  wc l; done hoststats02:00:571.log : 0 hoststats02:01:482.log : 0 hoststats02:02:463.log : 0 hoststats02:19:404.log : 2 hoststats02:20:415.log : 0 hoststats02:25:456.log : 2 hoststats02:32:477.log : 0 hoststats02:41:408.log : 1 hoststats02:42:419.log : 2 hoststats02:45:5510.log : 1 hoststats02:54:4311.log : 0 hoststats03:03:4112.log : 0 hoststats03:16:5013.log : 47 hoststats03:18:4114.log : 0 hoststats03:19:5015.log : 9 hoststats03:28:2416.log : 15 hoststats03:29:4917.log : 0 hoststats03:30:4318.log : 0 hoststats03:31:4219.log : 7 hoststats03:32:4120.log : 7 hoststats03:33:4121.log : 11 hoststats03:36:0722.log : 15 hoststats03:42:4023.log : 9 I've plotted the number of xz processes along with the load average and the dd time in the attached graph. All of the top output logs seems to be running oeselftest: for i in hoststats0*; do grep H c "DISPLAY.*oeselftest " $i ; done hoststats02:00:571.log:1 hoststats02:01:482.log:1 hoststats02:02:463.log:1 hoststats02:19:404.log:1 hoststats02:20:415.log:1 hoststats02:25:456.log:1 hoststats02:32:477.log:1 hoststats02:41:408.log:2 hoststats02:42:419.log:2 hoststats02:45:5510.log:2 hoststats02:54:4311.log:2 hoststats03:03:4112.log:2 hoststats03:16:5013.log:2 hoststats03:18:4114.log:2 hoststats03:19:5015.log:2 hoststats03:28:2416.log:2 hoststats03:29:4917.log:2 hoststats03:30:4318.log:2 hoststats03:31:4219.log:2 hoststats03:32:4120.log:2 hoststats03:33:4121.log:2 hoststats03:36:0722.log:2 hoststats03:42:4023.log:2 The logs DO seem to be clustered in that there are several 1 minute adjacent intervals where the dd time exceeded the threshold. Data here but you can see this in the attached graph. $ for i in hoststats*; do echo n $i ": "; head 1 $i  cut c 15; done hoststats02:00:571.log : top  02:00:57 hoststats02:01:482.log : top  02:01:48 hoststats02:02:463.log : top  02:02:46 hoststats02:19:404.log : top  02:19:40 hoststats02:20:415.log : top  02:20:41 hoststats02:25:456.log : top  02:25:45 hoststats02:32:477.log : top  02:32:47 hoststats02:41:408.log : top  02:41:40 hoststats02:42:419.log : top  02:42:41 hoststats02:45:5510.log : top  02:45:55 hoststats02:54:4311.log : top  02:54:43 hoststats03:03:4112.log : top  03:03:41 hoststats03:16:5013.log : top  03:16:50 hoststats03:18:4114.log : top  03:18:41 hoststats03:19:5015.log : top  03:19:50 hoststats03:28:2416.log : top  03:28:24 hoststats03:29:4917.log : top  03:29:49 hoststats03:30:4318.log : top  03:30:43 hoststats03:31:4219.log : top  03:31:42 hoststats03:32:4120.log : top  03:32:41 hoststats03:33:4121.log : top  03:33:41 hoststats03:36:0722.log : top  03:36:07 hoststats03:42:4023.log : top  03:42:40 *** Add time distribution of dd. Oddly enough, the output of dd does not seem to be consistent: 102400 bytes (102 kB) copied, 0.0163975 s, 6.2 MB/s 102400 bytes (102 kB) copied, 0.0054732 s, 18.7 MB/s 102400 bytes (102 kB, 100 KiB) copied, 0.0984687 s, 1.0 MB/s 102400 bytes (102 kB, 100 KiB) copied, 0.304233 s, 337 kB/s I'm not sure what's going on there, yet. Sakib? I think the cooker output will be quite useful. I'm not going to put any more time into this data set since the time threshold is still 5 seconds. All for now. ../Randy  # Randy MacLeod # Wind River Linux

