How to provide info for a hung ltp build


Richard Purdie
 

Hi All,

When we encounter a hung ltp build I wanted to document what we need to do
as a best practise for debugging it. What we need to do is:

a) ssh to the worker where the build is hanging

b) Look at the output of "ps ax" or similar and determine the hung
process which is hanging. You can filter with "ps ax | grep /qemuarm64-ltp/"
since the path for an ltp build will contain it's name (changing to x86 where
appropriate).

c) From the qemu process commandline, spot it's IP address. Often it is 192.168.7.2
but the last digit can/will vary.

d) "ssh root@....2" to attempt to login to the qemu VM. You may need to handle
host cert mismatches as normal for ssh.

e) Within the vm, spot where it is hanging. Often, "top" will show nothing actively
using the cpu. The output of "ps" is key, where we can attempt to spot which ltp
test is/was running. "cgroup_xattr" and "proc01" are two examples of test names 
which we've seen hang and have now disabled. If you can't see what is hanging,
save the ps output into the bug and ping me+Alexandre for further analysis.

f) Another tip if we know the process that is hanging is to run 
"ls -la /proc/<pid>/fd" which will list the open files the test has open.

I appreciate not everyone has worker ssh access so if you do not, please let 
someone who does (Alexandre, Ross, Micheal, Armin, Saul, myself) know if
you spot one of these.

Cheers,

Richard

Join {swat@lists.yoctoproject.org to automatically receive all group messages.