[] [PATCH yocto-autobuilder-helper] config.json: set oe-time-dd-test.sh timeout to 3 seconds


Randy MacLeod
 

For the month of January 2023, the distribution of dd times has a long
tail that extends to 13 seconds with 2 events exceeding the current
limit of 30 seconds.

Reduce the timeout to 3 seconds based on the observed distribution of
dd times, which would result in the timout triggering about 20 times a month.
That's enough data to be useful but not so much that it's overwhelming the
logging or the people who will analyze it. It also avoids the rapid increase
in the tail of the distribution which starts to rise exponentially under 2 seconds.
It's sensible response time for people to expect the system to have.

Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
config.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.json b/config.json
index 446528a..e50ec44 100644
--- a/config.json
+++ b/config.json
@@ -68,7 +68,7 @@
"RUNQEMU_TMPFS_DIR = '/home/pokybuild/tmp'",
"BB_HEARTBEAT_EVENT = '60'",
"BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
- "BB_LOG_HOST_STAT_CMDS_INTERVAL = 'oe-time-dd-test.sh -c 100 -t 15'",
+ "BB_LOG_HOST_STAT_CMDS_INTERVAL = 'oe-time-dd-test.sh -c 100 -t 3'",
"BB_LOG_HOST_STAT_ON_FAILURE = '1'",
"BB_LOG_HOST_STAT_CMDS_FAILURE = 'oe-time-dd-test.sh -l'",
"SDK_TOOLCHAIN_LANGS += 'rust'",
--
2.34.1


Randy MacLeod
 

On 2023-02-02 17:00, Randy MacLeod via lists.yoctoproject.org wrote:
For the month of January 2023, the distribution of dd times has a long
tail that extends to 13 seconds with 2 events exceeding the current
limit of 30 seconds.

Reduce the timeout to 3 seconds based on the observed distribution of
dd times, which would result in the timout triggering about 20 times a month.
That's enough data to be useful but not so much that it's overwhelming the
logging or the people who will analyze it. It also avoids the rapid increase
in the tail of the distribution which starts to rise exponentially under 2 seconds.
It's sensible response time for people to expect the system to have.


See attached graphs!

I don't know why there are two peaks that you can easily see on the linear scale distribution
but the 3 graphs show why I picked the 3 second cutoff.

Below is histogram data for a 0.1 second bin of  the tail of the distribution.

I can share the raw data or the 0.001 ms binned version if anyone is interested.

../Randy

     53 1.0
     39 1.1
     31 1.2
     22 1.3
     14 1.4
     23 1.5
     10 1.6
     14 1.7
      7 1.8
      5 1.9
      6 2.0
      7 2.1
      3 2.2
      3 2.3
      2 2.4
      5 2.5
      1 2.6
      2 2.7
      1 2.9
      1 3.1
      3 3.3
      1 3.4
      4 3.6
      1 3.8
      1 3.9
      1 4.1
      2 4.2
      2 4.5
      1 5.0
      1 5.1
      1 5.4
      1 6.3
      1 13.


Signed-off-by: Randy MacLeod <Randy.MacLeod@...>
---
 config.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.json b/config.json
index 446528a..e50ec44 100644
--- a/config.json
+++ b/config.json
@@ -68,7 +68,7 @@
             "RUNQEMU_TMPFS_DIR = '/home/pokybuild/tmp'",
             "BB_HEARTBEAT_EVENT = '60'",
             "BB_LOG_HOST_STAT_ON_INTERVAL = '1'",
-            "BB_LOG_HOST_STAT_CMDS_INTERVAL = 'oe-time-dd-test.sh -c 100 -t 15'",
+            "BB_LOG_HOST_STAT_CMDS_INTERVAL = 'oe-time-dd-test.sh -c 100 -t 3'",
             "BB_LOG_HOST_STAT_ON_FAILURE = '1'",
             "BB_LOG_HOST_STAT_CMDS_FAILURE = 'oe-time-dd-test.sh -l'",
             "SDK_TOOLCHAIN_LANGS += 'rust'",



-- 
# Randy MacLeod
# Wind River Linux


Randy MacLeod
 

On 2023-02-02 17:10, Randy MacLeod via lists.yoctoproject.org wrote:
I don't know why there are two peaks that you can easily see on the linear scale distribution
but the 3 graphs show why I picked the 3 second cutoff.

I'm guessing but I suspect that the lower latency distribution is just from when

the Yocto AB workers are idle. The builders in the cluster at WR are always
busy, sometimes needlessly,  so I didn't consider that initially.

-- 
# Randy MacLeod
# Wind River Linux


Randy MacLeod
 

On 2023-02-02 18:32, Randy MacLeod via lists.yoctoproject.org wrote:
On 2023-02-02 17:10, Randy MacLeod via lists.yoctoproject.org wrote:
I don't know why there are two peaks that you can easily see on the linear scale distribution
but the 3 graphs show why I picked the 3 second cutoff.

I'm guessing but I suspect that the lower latency distribution is just from when

the Yocto AB workers are idle. The builders in the cluster at WR are always
busy, sometimes needlessly,  so I didn't consider that initially.


Sigh, that's obviously wrong since this data is only collected when running bitbake!

Maybe it's the arm vs the intel workers?

I'll stop speculating until I look at the data a bit more...


--
# Randy MacLeod
# Wind River Linux