When we encounter a hung ltp build I wanted to document what we need to do
as a best practise for debugging it. What we need to do is:
a) ssh to the worker where the build is hanging
b) Look at the output of "ps ax" or similar and determine the hung
process which is hanging. You can filter with "ps ax | grep /qemuarm64-ltp/"
since the path for an ltp build will contain it's name (changing to x86 where
c) From the qemu process commandline, spot it's IP address. Often it is 192.168.7.2
but the last digit can/will vary.
d) "ssh root@....2" to attempt to login to the qemu VM. You may need to handle
host cert mismatches as normal for ssh.
e) Within the vm, spot where it is hanging. Often, "top" will show nothing actively
using the cpu. The output of "ps" is key, where we can attempt to spot which ltp
test is/was running. "cgroup_xattr" and "proc01" are two examples of test names
which we've seen hang and have now disabled. If you can't see what is hanging,
save the ps output into the bug and ping me+Alexandre for further analysis.
f) Another tip if we know the process that is hanging is to run
"ls -la /proc/<pid>/fd" which will list the open files the test has open.
I appreciate not everyone has worker ssh access so if you do not, please let
someone who does (Alexandre, Ross, Micheal, Armin, Saul, myself) know if
you spot one of these.