[PATCH 0/4] Re-implement prserv on top of asyncrpc


Richard Purdie
 

Hi Paul,

On Fri, 2021-05-28 at 09:42 +0100, Paul Barker wrote:
These changes replace the old XML-based RPC system in prserv with the
new asyncrpc implementation originally used by hashserv. A couple of
improvments are required in asyncrpc to support this.

I finally stumbled across the issue which led to the hanging builds
seen on the autobuilder when testing the initial RFC series.
It was a fairly dumb mistake on my behalf and I'm not sure how it
didn't trigger in my initial testing! The
`PRServerClient.handle_export()` function was missing a call to
`self.write_message()` so the client just ended up stuck waiting for a
response that was never to come. This issue is fixed here.

I've ran these changes through both `bitbake-selftest` and
`oe-selftest -a` and all looks good on my end. A couple of failures
were seen in oe-selftest but these are related to my host system
configuration (socat not installed, firewall blocking ports, etc) so
I'm fairly confident they aren't caused by this patch series.
Thanks for these. Unfortunately I think there is still a gremlin somewhere
as this was included in an autobuilder test build that is showing as this:

https://autobuilder.yoctoproject.org/typhoon/#/builders/83/builds/2203

i.e. all four selftests have not finished and I'd have expected them to 
by now.

I'm trying not to work today so I haven't debugged them or confirmed where
they are hanging but it seems likely related.

Cheers,

Richard


Paul Barker <pbarker@...>
 

On Mon, 31 May 2021 at 12:25, Richard Purdie
<richard.purdie@...> wrote:

Hi Paul,

On Fri, 2021-05-28 at 09:42 +0100, Paul Barker wrote:
These changes replace the old XML-based RPC system in prserv with the
new asyncrpc implementation originally used by hashserv. A couple of
improvments are required in asyncrpc to support this.

I finally stumbled across the issue which led to the hanging builds
seen on the autobuilder when testing the initial RFC series.
It was a fairly dumb mistake on my behalf and I'm not sure how it
didn't trigger in my initial testing! The
`PRServerClient.handle_export()` function was missing a call to
`self.write_message()` so the client just ended up stuck waiting for a
response that was never to come. This issue is fixed here.

I've ran these changes through both `bitbake-selftest` and
`oe-selftest -a` and all looks good on my end. A couple of failures
were seen in oe-selftest but these are related to my host system
configuration (socat not installed, firewall blocking ports, etc) so
I'm fairly confident they aren't caused by this patch series.
Thanks for these. Unfortunately I think there is still a gremlin somewhere
as this was included in an autobuilder test build that is showing as this:

https://autobuilder.yoctoproject.org/typhoon/#/builders/83/builds/2203

i.e. all four selftests have not finished and I'd have expected them to
by now.
(╯°□°)╯︵ ┻━┻


I'm trying not to work today so I haven't debugged them or confirmed where
they are hanging but it seems likely related.
If you're planning to take the day off don't worry about investigating
these. I'll take a look at the patches again on Wednesday. I think the
best approach may be to add some timeouts and maybe more error
handling to the asyncrpc code I extracted from hashserv - if we can
turn these hangs into a proper error then we can reduce the amount of
autobuilder time they take to test and hopefully we'll get a better
insight into what is actually going wrong. My guess is that there's
something in the autobuilder config or just the level of load on the
machines which is aggravating this as the tests finish successfully on
my build machine (with a few expected test failures as noted
previously).

Thanks,

--
Paul Barker
Konsulko Group