Date
1 - 7 of 7
[PATCH yocto-autobuilder-helper 1/4] config.json: add "collect-data" template
sakib.sajal@...
On 2021-04-15 9:52 a.m., Richard Purdie wrote:
I have left the template in case the data collection is removed from defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any examples of it, neither did i try it myself. If nesting does work, the template should be useful.
Sakib
[Please note: This e-mail is from an EXTERNAL e-mail address]The template is not used anywhere, yet, the initial patchset enables the data collection by default.
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
Cheers,
Richard
I have left the template in case the data collection is removed from defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any examples of it, neither did i try it myself. If nesting does work, the template should be useful.
Sakib
On 2021-04-15 4:48 p.m., Randy MacLeod wrote:
Is the system back to normal and operational now?
What was the impact of running the heartbeat and the dd test every
10 seconds on the system build performance?
Should we increase the interval to 30, 60, ore more seconds?
I spent some time looking at the first bit of data along with
Sakib and Saul from time to time.
General conclusions:
1. It seems like ALL triggers involve oe-selftest being active.
2. xz might be a problem but we're not sure yet.
3. We need more data and tools and time to think about it.
To Do:
1. increase top cmdline length from 512 to 16K
2. sometimes we see:
Command '['oe-time-dd-test.sh', '100']' timed out after 10.0 seconds
That should not happen so we should understand why and either increase
the time between runs or fix the tooling. This seems to happen under load
so it's hiding the interesting data that we are looking for!
3. tail the cooker console in addition to top. Present that before top.
It would be nice to have a top equivalent for bitbake.
We did collect some triggered host data last night as seen in:
https://autobuilder.yocto.io/pub/non-release/
https://autobuilder.yocto.io/pub/non-release/20210415-16/
Only one a-full build was run. There were 10 log files produced.
There were 21 times that the dd time exceeded the 5 second limit
out of a total of 21581 (or so!) invocations and those triggers we
captured by 10 log files:
testresults/beaglebone-alt/2021-04-16--00-19/host_stats_0_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_2_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_4_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_6_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_8_top.txt
testresults/qemuarm/2021-04-16--00-02/host_stats_0_top.txt
testresults/qemuarm/2021-04-16--00-02/host_stats_1_top.txt
testresults/qemumips-alt/2021-04-15--23-36/host_stats_1_top.txt
testresults/qemumips64/2021-04-16--02-46/host_stats_0_top.txt
testresults/qemux86-world/2021-04-16--00-00/host_stats_0_top.txt
We knew that our naming convention needed work in that the files
are generically named and differ only by the directory datastamp and,
where the logs contain 'top' output or not the _top suffix. We'd like to help
whoever is looking at the data understand what the context of
the build was. That's not clear to Sakib and I given that we are YP AB newbies still.
Do you have any suggestions about what the directory structure or file naming convention should be?
The other thing need to do is correlate these higher latency times
with the intermittent problems we've encountered. We can do that manually
I support via the SWAT team but ideally there would be an automated process.
More quick analysis...
The number of times that top ran per log file:
$ grep "^top - " `fd _top autobuilder.yocto.io/` | cut -d":" -f1 | uniq -c | \
sed -e 's|autobuilder.yocto.io/pub/non-release/20210415-16/||'
2 testresults/beaglebone-alt/2021-04-16--00-19/host_stats_0_top.txt
3 testresults/qa-extras2/2021-04-15--22-43/host_stats_2_top.txt
1 testresults/qa-extras2/2021-04-15--22-43/host_stats_4_top.txt
2 testresults/qa-extras2/2021-04-15--22-43/host_stats_6_top.txt
1 testresults/qa-extras2/2021-04-15--22-43/host_stats_8_top.txt
5 testresults/qemuarm/2021-04-16--00-02/host_stats_0_top.txt
2 testresults/qemuarm/2021-04-16--00-02/host_stats_1_top.txt
1 testresults/qemumips-alt/2021-04-15--23-36/host_stats_1_top.txt
2 testresults/qemumips64/2021-04-16--02-46/host_stats_0_top.txt
2 testresults/qemux86-world/2021-04-16--00-00/host_stats_0_top.txt
some of these are duplicates in that the different steps _2,4,6,8 above
A little shell hacking can produce one file per top output with
ample access to stackoverflow!
COUNTER=1
for i in `fd _top`;
do for j in `grep "^top - " $i | cut -c 7-15`; do
sed -n "/top - ${j}/,/Event Time:/p" $i >> host-stats-$j--$COUNTER.log;
((COUNTER++));
done;
done
This works because the first line of time is similar to:
top - 15:40:53 up 2 days, 22:17, 1 user, load average: 0.36, 0.58, 0.85
so cutting out chars 7-15 gives a fairly uniq timestamp string for the filename and
adding the counter makes it unique.
Now we have 21 log files:
$ ls host-stats-2* | wc -l
21
How big are these files, ie how many process/kernel threads were running
when top ran?
$ wc -l host-stats-2* | sort -n
757 host-stats-22:12:32--17.log
778 host-stats-22:18:21--8.log
784 host-stats-21:59:42--5.log
785 host-stats-21:59:42--12.log
792 host-stats-22:18:01--7.log
800 host-stats-22:18:21--14.log
811 host-stats-22:07:40--13.log
812 host-stats-22:07:59--6.log
821 host-stats-21:56:21--3.log
850 host-stats-21:59:33--11.log
856 host-stats-21:59:33--4.log
869 host-stats-22:29:49--16.log
884 host-stats-22:29:14--9.log
886 host-stats-22:29:36--15.log
981 host-stats-21:55:40--10.log
985 host-stats-22:47:27--2.log
987 host-stats-22:47:27--21.log
1124 host-stats-22:37:33--1.log
1193 host-stats-22:37:26--20.log
1304 host-stats-23:19:14--19.log
1321 host-stats-23:18:57--18.log
19380 total
I noticed that several but not all log files were running xz with args like:
xz -a --memlimit=50% --threads=56
$ for i in `ls host-stats-2*`; do echo -n $i ": "; grep "xz " $i | wc -l; done
host-stats-21:55:40--10.log : 28
host-stats-21:56:21--3.log : 4
host-stats-21:59:33--11.log : 1
host-stats-21:59:33--4.log : 1
host-stats-21:59:42--12.log : 1
host-stats-21:59:42--5.log : 1
host-stats-22:07:40--13.log : 2
host-stats-22:07:59--6.log : 2
host-stats-22:12:32--17.log : 0
host-stats-22:18:01--7.log : 6
host-stats-22:18:21--14.log : 3
host-stats-22:18:21--8.log : 3
host-stats-22:29:14--9.log : 1
host-stats-22:29:36--15.log : 0
host-stats-22:29:49--16.log : 0
host-stats-22:37:26--20.log : 56
host-stats-22:37:33--1.log : 16
host-stats-22:47:27--21.log : 0
host-stats-22:47:27--2.log : 0
host-stats-23:18:57--18.log : 0
host-stats-23:19:14--19.log : 18
In this case, I don't think it's a problem but if we had several packages
running xz like that at once with a limit of 50% of memory each,
that could be a problem. Has anyone looked at the time impact of
say reducing the number of threads to 32 and the memory limit to
15% ?
All of the top output logs seems to be running oe-selftest:
$ for i in host-stats-2*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done
host-stats-21:55:40--10.log:1
host-stats-21:56:21--3.log:1
host-stats-21:59:33--11.log:1
host-stats-21:59:33--4.log:1
host-stats-21:59:42--12.log:1
host-stats-21:59:42--5.log:1
host-stats-22:07:40--13.log:1
host-stats-22:07:59--6.log:1
host-stats-22:12:32--17.log:1
host-stats-22:18:01--7.log:1
host-stats-22:18:21--14.log:1
host-stats-22:18:21--8.log:1
host-stats-22:29:14--9.log:1
host-stats-22:29:36--15.log:1
host-stats-22:29:49--16.log:1
host-stats-22:37:26--20.log:1
host-stats-22:37:33--1.log:1
host-stats-22:47:27--21.log:1
host-stats-22:47:27--2.log:1
host-stats-23:18:57--18.log:2
host-stats-23:19:14--19.log:2
$ for i in host-stats-2*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done | wc -l
21
Yikes, that seems like more than just random chance.
The logs do not seem to be duplicates in that there isn't a single cluster
of identical or similar timestamps although some are close and are
likely from the same file. That said they certainly don't seem to
be spread out uniformly over time and that's what we all expect
in that the system response time is okay for much of the time and
is poor for quite a while every now and then.
$ for i in host-stats-2*; do echo -n $i ": "; head -1 $i | cut -c -15; done
host-stats-21:55:40--10.log : top - 21:55:40
host-stats-21:56:21--3.log : top - 21:56:21
host-stats-21:59:33--11.log : top - 21:59:33
host-stats-21:59:33--4.log : top - 21:59:33
host-stats-21:59:42--12.log : top - 21:59:42
host-stats-21:59:42--5.log : top - 21:59:42
host-stats-22:07:40--13.log : top - 22:07:40
host-stats-22:07:59--6.log : top - 22:07:59
host-stats-22:12:32--17.log : top - 22:12:32
host-stats-22:18:01--7.log : top - 22:18:01
host-stats-22:18:21--14.log : top - 22:18:21
host-stats-22:18:21--8.log : top - 22:18:21
host-stats-22:29:14--9.log : top - 22:29:14
host-stats-22:29:36--15.log : top - 22:29:36
host-stats-22:29:49--16.log : top - 22:29:49
host-stats-22:37:26--20.log : top - 22:37:26
host-stats-22:37:33--1.log : top - 22:37:33
host-stats-22:47:27--21.log : top - 22:47:27
host-stats-22:47:27--2.log : top - 22:47:27
host-stats-23:18:57--18.log : top - 23:18:57
host-stats-23:19:14--19.log : top - 23:19:14
All for now.
../Randy
# Randy MacLeod
# Wind River Linux
On 2021-04-15 1:55 p.m., Randy MacLeod wrote:Thanks for fixing the fall-out due to assumptions in other tests.On 2021-04-15 11:55 a.m., Richard Purdie wrote:Richard,On Thu, 2021-04-15 at 11:31 -0400, Sakib Sajal wrote:We were using it for testing on the YP AB and thought it would beOn 2021-04-15 9:52 a.m., Richard Purdie wrote:[Please note: This e-mail is from an EXTERNAL e-mail address]
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
useful if at some point the monitoring was dropped from the
default config.
I think we can just add it later if needed.
I think that the web server for:
https://autobuilder.yocto.io/pub/non-release/
runs every 30 seconds via cron so if you are happy with
this crude dd trigger once things have soaked in master-next
and we want to gather some data overnight, could you merge to master?
I ran a simpler test with fewer io stressors from:
$ stress -hdd N
and have attached a graph with up to 3000! stressors that
we looked at this morning and another with up to 35 stressors.
It's a crude indicators but once we get beyond 18-20 io stressors
on the system I tested (48 cores, 128 GB RAM, 12 TB magnetic disk)
dd time become erratic.
Running qemu from tmpfs has clearly helped.
Let's gather some data and decide if we want to spend more time
learning how to monitor the system to tune how we are using it.
../Randy
Is the system back to normal and operational now?
What was the impact of running the heartbeat and the dd test every
10 seconds on the system build performance?
Should we increase the interval to 30, 60, ore more seconds?
I spent some time looking at the first bit of data along with
Sakib and Saul from time to time.
General conclusions:
1. It seems like ALL triggers involve oe-selftest being active.
2. xz might be a problem but we're not sure yet.
3. We need more data and tools and time to think about it.
To Do:
1. increase top cmdline length from 512 to 16K
2. sometimes we see:
Command '['oe-time-dd-test.sh', '100']' timed out after 10.0 seconds
That should not happen so we should understand why and either increase
the time between runs or fix the tooling. This seems to happen under load
so it's hiding the interesting data that we are looking for!
3. tail the cooker console in addition to top. Present that before top.
It would be nice to have a top equivalent for bitbake.
We did collect some triggered host data last night as seen in:
https://autobuilder.yocto.io/pub/non-release/
https://autobuilder.yocto.io/pub/non-release/20210415-16/
Only one a-full build was run. There were 10 log files produced.
There were 21 times that the dd time exceeded the 5 second limit
out of a total of 21581 (or so!) invocations and those triggers we
captured by 10 log files:
testresults/beaglebone-alt/2021-04-16--00-19/host_stats_0_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_2_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_4_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_6_top.txt
testresults/qa-extras2/2021-04-15--22-43/host_stats_8_top.txt
testresults/qemuarm/2021-04-16--00-02/host_stats_0_top.txt
testresults/qemuarm/2021-04-16--00-02/host_stats_1_top.txt
testresults/qemumips-alt/2021-04-15--23-36/host_stats_1_top.txt
testresults/qemumips64/2021-04-16--02-46/host_stats_0_top.txt
testresults/qemux86-world/2021-04-16--00-00/host_stats_0_top.txt
We knew that our naming convention needed work in that the files
are generically named and differ only by the directory datastamp and,
where the logs contain 'top' output or not the _top suffix. We'd like to help
whoever is looking at the data understand what the context of
the build was. That's not clear to Sakib and I given that we are YP AB newbies still.
Do you have any suggestions about what the directory structure or file naming convention should be?
The other thing need to do is correlate these higher latency times
with the intermittent problems we've encountered. We can do that manually
I support via the SWAT team but ideally there would be an automated process.
More quick analysis...
The number of times that top ran per log file:
$ grep "^top - " `fd _top autobuilder.yocto.io/` | cut -d":" -f1 | uniq -c | \
sed -e 's|autobuilder.yocto.io/pub/non-release/20210415-16/||'
2 testresults/beaglebone-alt/2021-04-16--00-19/host_stats_0_top.txt
3 testresults/qa-extras2/2021-04-15--22-43/host_stats_2_top.txt
1 testresults/qa-extras2/2021-04-15--22-43/host_stats_4_top.txt
2 testresults/qa-extras2/2021-04-15--22-43/host_stats_6_top.txt
1 testresults/qa-extras2/2021-04-15--22-43/host_stats_8_top.txt
5 testresults/qemuarm/2021-04-16--00-02/host_stats_0_top.txt
2 testresults/qemuarm/2021-04-16--00-02/host_stats_1_top.txt
1 testresults/qemumips-alt/2021-04-15--23-36/host_stats_1_top.txt
2 testresults/qemumips64/2021-04-16--02-46/host_stats_0_top.txt
2 testresults/qemux86-world/2021-04-16--00-00/host_stats_0_top.txt
some of these are duplicates in that the different steps _2,4,6,8 above
A little shell hacking can produce one file per top output with
ample access to stackoverflow!
COUNTER=1
for i in `fd _top`;
do for j in `grep "^top - " $i | cut -c 7-15`; do
sed -n "/top - ${j}/,/Event Time:/p" $i >> host-stats-$j--$COUNTER.log;
((COUNTER++));
done;
done
This works because the first line of time is similar to:
top - 15:40:53 up 2 days, 22:17, 1 user, load average: 0.36, 0.58, 0.85
so cutting out chars 7-15 gives a fairly uniq timestamp string for the filename and
adding the counter makes it unique.
Now we have 21 log files:
$ ls host-stats-2* | wc -l
21
How big are these files, ie how many process/kernel threads were running
when top ran?
$ wc -l host-stats-2* | sort -n
757 host-stats-22:12:32--17.log
778 host-stats-22:18:21--8.log
784 host-stats-21:59:42--5.log
785 host-stats-21:59:42--12.log
792 host-stats-22:18:01--7.log
800 host-stats-22:18:21--14.log
811 host-stats-22:07:40--13.log
812 host-stats-22:07:59--6.log
821 host-stats-21:56:21--3.log
850 host-stats-21:59:33--11.log
856 host-stats-21:59:33--4.log
869 host-stats-22:29:49--16.log
884 host-stats-22:29:14--9.log
886 host-stats-22:29:36--15.log
981 host-stats-21:55:40--10.log
985 host-stats-22:47:27--2.log
987 host-stats-22:47:27--21.log
1124 host-stats-22:37:33--1.log
1193 host-stats-22:37:26--20.log
1304 host-stats-23:19:14--19.log
1321 host-stats-23:18:57--18.log
19380 total
I noticed that several but not all log files were running xz with args like:
xz -a --memlimit=50% --threads=56
$ for i in `ls host-stats-2*`; do echo -n $i ": "; grep "xz " $i | wc -l; done
host-stats-21:55:40--10.log : 28
host-stats-21:56:21--3.log : 4
host-stats-21:59:33--11.log : 1
host-stats-21:59:33--4.log : 1
host-stats-21:59:42--12.log : 1
host-stats-21:59:42--5.log : 1
host-stats-22:07:40--13.log : 2
host-stats-22:07:59--6.log : 2
host-stats-22:12:32--17.log : 0
host-stats-22:18:01--7.log : 6
host-stats-22:18:21--14.log : 3
host-stats-22:18:21--8.log : 3
host-stats-22:29:14--9.log : 1
host-stats-22:29:36--15.log : 0
host-stats-22:29:49--16.log : 0
host-stats-22:37:26--20.log : 56
host-stats-22:37:33--1.log : 16
host-stats-22:47:27--21.log : 0
host-stats-22:47:27--2.log : 0
host-stats-23:18:57--18.log : 0
host-stats-23:19:14--19.log : 18
In this case, I don't think it's a problem but if we had several packages
running xz like that at once with a limit of 50% of memory each,
that could be a problem. Has anyone looked at the time impact of
say reducing the number of threads to 32 and the memory limit to
15% ?
All of the top output logs seems to be running oe-selftest:
$ for i in host-stats-2*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done
host-stats-21:55:40--10.log:1
host-stats-21:56:21--3.log:1
host-stats-21:59:33--11.log:1
host-stats-21:59:33--4.log:1
host-stats-21:59:42--12.log:1
host-stats-21:59:42--5.log:1
host-stats-22:07:40--13.log:1
host-stats-22:07:59--6.log:1
host-stats-22:12:32--17.log:1
host-stats-22:18:01--7.log:1
host-stats-22:18:21--14.log:1
host-stats-22:18:21--8.log:1
host-stats-22:29:14--9.log:1
host-stats-22:29:36--15.log:1
host-stats-22:29:49--16.log:1
host-stats-22:37:26--20.log:1
host-stats-22:37:33--1.log:1
host-stats-22:47:27--21.log:1
host-stats-22:47:27--2.log:1
host-stats-23:18:57--18.log:2
host-stats-23:19:14--19.log:2
$ for i in host-stats-2*; do grep -H -c "DISPLAY.*oe-selftest " $i ; done | wc -l
21
Yikes, that seems like more than just random chance.
The logs do not seem to be duplicates in that there isn't a single cluster
of identical or similar timestamps although some are close and are
likely from the same file. That said they certainly don't seem to
be spread out uniformly over time and that's what we all expect
in that the system response time is okay for much of the time and
is poor for quite a while every now and then.
$ for i in host-stats-2*; do echo -n $i ": "; head -1 $i | cut -c -15; done
host-stats-21:55:40--10.log : top - 21:55:40
host-stats-21:56:21--3.log : top - 21:56:21
host-stats-21:59:33--11.log : top - 21:59:33
host-stats-21:59:33--4.log : top - 21:59:33
host-stats-21:59:42--12.log : top - 21:59:42
host-stats-21:59:42--5.log : top - 21:59:42
host-stats-22:07:40--13.log : top - 22:07:40
host-stats-22:07:59--6.log : top - 22:07:59
host-stats-22:12:32--17.log : top - 22:12:32
host-stats-22:18:01--7.log : top - 22:18:01
host-stats-22:18:21--14.log : top - 22:18:21
host-stats-22:18:21--8.log : top - 22:18:21
host-stats-22:29:14--9.log : top - 22:29:14
host-stats-22:29:36--15.log : top - 22:29:36
host-stats-22:29:49--16.log : top - 22:29:49
host-stats-22:37:26--20.log : top - 22:37:26
host-stats-22:37:33--1.log : top - 22:37:33
host-stats-22:47:27--21.log : top - 22:47:27
host-stats-22:47:27--2.log : top - 22:47:27
host-stats-23:18:57--18.log : top - 23:18:57
host-stats-23:19:14--19.log : top - 23:19:14
All for now.
../Randy
--
../RandyI had a quick look at the code and sadly, it doesn't appear I implementedThe template is not used anywhere, yet, the initial patchset enables the
Cheers,
Richard
data collection by default.
I have left the template in case the data collection is removed from
defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any
examples of it, neither did i try it myself. If nesting does work, the
template should be useful.
nesting so this wouldn't be that useful as things stand.
Cheers,
Richard
# Randy MacLeod
# Wind River Linux
On 2021-04-15 1:55 p.m., Randy MacLeod wrote:
I think that the web server for:
https://autobuilder.yocto.io/pub/non-release/
runs every 30 seconds via cron so if you are happy with
this crude dd trigger once things have soaked in master-next
and we want to gather some data overnight, could you merge to master?
I ran a simpler test with fewer io stressors from:
$ stress -hdd N
and have attached a graph with up to 3000! stressors that
we looked at this morning and another with up to 35 stressors.
It's a crude indicators but once we get beyond 18-20 io stressors
on the system I tested (48 cores, 128 GB RAM, 12 TB magnetic disk)
dd time become erratic.
Running qemu from tmpfs has clearly helped.
Let's gather some data and decide if we want to spend more time
learning how to monitor the system to tune how we are using it.
../Randy
--
# Randy MacLeod
# Wind River Linux
On 2021-04-15 11:55 a.m., Richard Purdie wrote:Richard,On Thu, 2021-04-15 at 11:31 -0400, Sakib Sajal wrote:We were using it for testing on the YP AB and thought it would beOn 2021-04-15 9:52 a.m., Richard Purdie wrote:[Please note: This e-mail is from an EXTERNAL e-mail address]
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
useful if at some point the monitoring was dropped from the
default config.
I think we can just add it later if needed.
I think that the web server for:
https://autobuilder.yocto.io/pub/non-release/
runs every 30 seconds via cron so if you are happy with
this crude dd trigger once things have soaked in master-next
and we want to gather some data overnight, could you merge to master?
I ran a simpler test with fewer io stressors from:
$ stress -hdd N
and have attached a graph with up to 3000! stressors that
we looked at this morning and another with up to 35 stressors.
It's a crude indicators but once we get beyond 18-20 io stressors
on the system I tested (48 cores, 128 GB RAM, 12 TB magnetic disk)
dd time become erratic.
Running qemu from tmpfs has clearly helped.
Let's gather some data and decide if we want to spend more time
learning how to monitor the system to tune how we are using it.
../Randy
../RandyI had a quick look at the code and sadly, it doesn't appear I implementedThe template is not used anywhere, yet, the initial patchset enables the
Cheers,
Richard
data collection by default.
I have left the template in case the data collection is removed from
defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any
examples of it, neither did i try it myself. If nesting does work, the
template should be useful.
nesting so this wouldn't be that useful as things stand.
Cheers,
Richard
--
# Randy MacLeod
# Wind River Linux
On 2021-04-15 11:55 a.m., Richard Purdie wrote:
useful if at some point the monitoring was dropped from the
default config.
I think we can just add it later if needed.
../Randy
--
# Randy MacLeod
# Wind River Linux
On Thu, 2021-04-15 at 11:31 -0400, Sakib Sajal wrote:We were using it for testing on the YP AB and thought it would beOn 2021-04-15 9:52 a.m., Richard Purdie wrote:[Please note: This e-mail is from an EXTERNAL e-mail address]
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
useful if at some point the monitoring was dropped from the
default config.
I think we can just add it later if needed.
../Randy
I had a quick look at the code and sadly, it doesn't appear I implementedThe template is not used anywhere, yet, the initial patchset enables the
Cheers,
Richard
data collection by default.
I have left the template in case the data collection is removed from
defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any
examples of it, neither did i try it myself. If nesting does work, the
template should be useful.
nesting so this wouldn't be that useful as things stand.
Cheers,
Richard
--
# Randy MacLeod
# Wind River Linux
Richard Purdie
On Thu, 2021-04-15 at 11:31 -0400, Sakib Sajal wrote:
nesting so this wouldn't be that useful as things stand.
Cheers,
Richard
On 2021-04-15 9:52 a.m., Richard Purdie wrote:I had a quick look at the code and sadly, it doesn't appear I implemented[Please note: This e-mail is from an EXTERNAL e-mail address]The template is not used anywhere, yet, the initial patchset enables the
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
Cheers,
Richard
data collection by default.
I have left the template in case the data collection is removed from
defaults and need to be used on a case by case basis.
I am not entirely sure if nesting templates work. I have not seen any
examples of it, neither did i try it myself. If nesting does work, the
template should be useful.
nesting so this wouldn't be that useful as things stand.
Cheers,
Richard
Richard Purdie
On Tue, 2021-04-13 at 13:02 -0400, sakib.sajal@... wrote:
case this is useful, or not?
Cheers,
Richard
collect-data template can run arbitrary commands/scriptsIs the template used anywhere? I can't remember if we support nesting templates in which
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
case this is useful, or not?
Cheers,
Richard
sakib.sajal@...
collect-data template can run arbitrary commands/scripts
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
"ptest-qemu" : {
"BUILDINFO" : true,
"BBTARGETS" : "core-image-sato-ptest",
--
2.25.1
on a regular basis and logs the output in a file.
See oe-core for more details:
edb7098e9e buildstats.bbclass: add functionality to collect build system stats
Signed-off-by: Sakib Sajal <sakib.sajal@...>
Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config.json b/config.json
index 5bfa240..c43d231 100644
--- a/config.json
+++ b/config.json
@@ -87,6 +87,13 @@
"SANITYTARGETS" : "core-image-full-cmdline:do_testimage core-image-sato:do_testimage core-image-sato-sdk:do_testimage"
}
},
+ "collect-data" : {
+ "extravars" : [
+ "BB_HEARTBEAT_EVENT = '10'",
+ "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
+ "BB_LOG_HOST_STAT_CMDS = 'oe-time-dd-test.sh 100'"
+ ]
+ },
"ptest-qemu" : {
"BUILDINFO" : true,
"BBTARGETS" : "core-image-sato-ptest",
--
2.25.1