Yocto Technical Team Minutes, Engineering Sync, for December 14, 2021

Trevor Woerner

Yocto Technical Team Minutes, Engineering Sync, for December 14, 2021
archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit

== disclaimer ==
Best efforts are made to ensure the below is accurate and valid. However,
errors sometimes happen. If any errors or omissions are found, please feel
free to reply to this email with any corrections.

== attendees ==
Trevor Woerner, Stephen Jolley, Bruce Ashfield, Daiane, Jan-Simon Möller,
Jon Mason, Joshua Watt, Saul Wold, Steve Sakoman, Randy MacLeod, Richard
Purdie, Scott Murray, Rephael C, Peter Kjellerstedt, Ross Burton, Michael
Opdenacker, Armin Kuster, Nathan Glimsdale, Ryan Eatmon

== project status ==
- 3.5 M1 (kirkstone) in QA
- 3.1.13 (dunfell) to be built this week
- maintenance for AB, updating SSDs and updating distros, next week (Dec 20-24)
- significant improvements to patch count, some changes might affect other layers
- CVE metrics improved for dunfell and master
- rising AB-int issues (new high!)

== discussion ==
RP: looked at more patches last week. removed some patches related to a MIPS
platform (support for which was also removed from the latest kernel, see
Netlogic). these patches were never added to upstream binutils. an SH4 gdb
patch was also removed, not sure if it even works anymore. if users need
these things, they can be re-introduced in separate layers if required,
but not appropriate for oe-core going forward. Ross wins the award for
most invasive change, and changes most likely to require changes in
other layers, but these are good changes. seeing good trends: 50 patches
removed, and 50 patched moved out of pending state. i’m still working
with upstream gcc to get some patches merged and am still hoping to get
some libtool patches upstream too.

RP: re: AB-int issues: there is a recurring bitbake selftest issue with the
runqueue tests that does have a fix ready. also lttng: there appear to
be a number of tools issues going on but all logged as one bug. upstream
has fixed some of the original bugs we reported, but there are some other
things. it all appears to have started around june

RP: AB downtime next week. there’s never a good time to do it. reimaging of most of the cluster. Michael has permission to replace all the OSes on all the workers (bring in new ones, remove old ones). it’s a good time to bring in new distros and get rid of older versions. for all we know the AB may never work again! (lol). if you have anything that needs to be preserved make sure to let us know.

RP: it also means that if we’re going to have a 3.1.13 release, it has to be
this week.
SS: i’m ready, there is a small set of patches
RP: so the plan is to get those in, then do the build?
SS: yes. i don’t think there’s anything controversial there at all
RP: there’s a chance the parts might not arrive in time, so the update to
the AB might get delayed

Randy: if we upgrade all the AB to SSDs then we won’t have a control to see
how things go, other than historical data? does everyone only use SSD? are
magnetic disks still important?
SS: i’ve been SSD-only for a couple years now
RP: conversely i only use spinning rust
JPEW: i would expect that the intersection of people doing things as extreme
as the AB and still using spinning disks is confined to just the AB
Randy: you would be wrong (lol). at least half of WR is still using magnetic
disks. however we do plan to upgrade.
RP: i understand the desire to have a control, but that would add to the
maintenance burden. we’ll have to see how it goes. we have 2 performance
testing workers as well, one is running CentOS 7 and the other is running
Ubuntu 16.04; those will also need upgrading as well (we’ve been putting
it off for too long now). so we might end up with 2 more performance
workers (that will run in parallel with the existing 2) or the existing
ones might just get replaced. it’s up to Michael
Randy: what about the ARM worker, any sign of that machine arriving?
RP: there’s talk of it, but getting stuff into the US is not easy
Saul: is anyone talking to Ampere?
RP: the people involved are the ARM people, so they know what they’re doing
Randy: will the ARM worker get an SSD as well?
RP: it think it already has one. if it doesn’t then it will
Ross: the ARM worker is pretty old hardware, unfortunately
RP: we have 2, one is older but bulletproof. the other one is faster but has a
tendency to report CPU temps that are high

JPEW: i sent an RFC to switch the bitbake-worker to asyncio
RP: i had a look. i hadn’t thought of using asyncio in bitbake-worker
because generally it is one of the more self-contained bits of bitbake
that generally actually works and i had wanted to leave it alone. the
patch adds more lines than it removes. is it an improvement?
JPEW: given what it’s doing, i don’t think it’s going to be more
efficient. most of the time it sits waiting for things. the big advantage
would be the maintainability. asycnio is easier to read than the polling
loop it was doing. the adding of lines might just be my way of writing
RP: i don’t object to it as such. if *i* had done the conversion then
i could read that code more easily, however, since i didn’t do the
conversion, it makes maintainability harder for me. that’s not a
criticism of the work itself. the diff is too big, maybe easier to just
look at the updated code
JPEW: yes the diff is worthless. also, we could simplify it even more if we
slightly changed the protocol between bitbake-worker and bitbake-server.
would fit better with how asyncio works and what’s already included
in asyncio (i.e. asyncio already knows how to hande reading text
line-by-line, but we do a tagged XML thing, which i had to write
explicitly). if we change it to be more like the hashserv protocol
(newline-delimited JSON) then that would fit very well with asyncio. that
would reduce the size
RP: i think the data (that goes over the bitbake-worker to bitbake-server
link) can have newlines in it, so we’d need an escape mechanism
JPEW: yes, it’s pickled data. it wouldn’t have to be newlines, you could
split on any character
RP: also, there are some lines removing some multiprocessing locking, is that
still safe for workers that call into multiprocessing?
JPEW: the lock was never used in bitbake-worker itself, just the child
processes. so i moved it to the child processes. the child processes have
a pipe to bitbake-worker and i left the lock in the child process. so if
they’re multithreaded (or whatever) they still have a lock when they
write into the parent process. but each of them has a dedicated pipe into
bitbake-worker parent process, so that doesn’t need locking
RP: yes, fair enough. i need to look at the final code and think about it some

RP: i’m worried about the bitbake server process (i.e. not the worker but
the cooker). i have a pile of bugs but the general theme is: someone
presses Ctrl-C and bitbake is off doing something else and doesn’t
respond. in general (by design) we tend to defer things off (tasks are run
by bitbake-worker and not bitbake itself) the trouble is once the parsing
occurs in sub-processes it can can starve the connection handling. i’m
worried about the threading model (or lack thereof) in our design. there
are 2 types of commands that can be run against the server: synchronous
and asynchronous. but if something goes wrong in some of those synchronous
commands then you can’t even send a stop event to the server. asyncio
doesn’t necessarily help us with any of this stuff
JPEW: in order for asycio to help, everything has to be done asynchronously.
e.g. long-running tasks have to punt it to a thread (if it’s not I/O
RP: asyncio probably isn’t going to be the answer here, we might have to
push some of this out to a separate cooker thread with the server running
in its own thread and handling the actual UI and commands (etc) separately
JPEW: we’ll probably need a hybrid approach: asyncio for the main loop, and
long-running stuff in a thread
RP: it’s one of the bigger problems we have with bitbake right now. if
anyone has any ideas…

RP: re: meetings over the holidays. i’m guessing we’ll cancel meetings on
the 28th, and most will be back by the 4th of January? will enough people
be around for a meeting on the 21st?
<several>: i’ll be around
RP: okay, we’ll cancel the 28th and keep the others

Randy: i heard that someone got the terminal working in phosh? has anyone else
played with phosh and got it working?
JPEW: yes, mostly working. you can download the daily build
Randy: i’ll give it a try shortly. is it something we’ll keep until after
3.5? or are we going to rip out sato and replace it with phosh for this
JPEW: oh no, not this release
RP: that’s a bad idea. we’ll run with sato for the LTS

RP: any other patches in oe-core that we should be doing things with? we
have some good success cases (e.g. the puzzles app in sato, binutils,
gdb). tcp-wrappers is appearing on my radar; upstream is dead and we’re
carrying about 15 patches. also the musl systemd patches need attention
ScottM: the two people who would care are not on this call
Ross: i think there’s been some improvement to systemd accepting musl
ScottM: maybe alpine would drive this issue, but maybe not
RP: there are 2 sets of issues with systemd and musl: 1) headers issue (which
i think is relatively work-around-able) and i think systemd is willing to
negotiate on some of those patches 2) pieces of c library are missing and
by patching them out causes security holes, therefore we probably won’t
see systemd accepting those. systemd has made it quite clear that they
want to rely on those libc features and they’re simply not there in musl
(as is my understanding)
ScottM: they’re quite vocal about being fine with being very linux-centric
RP: i want to get this done early in the cycle, rather than waiting for the
week before feature-freeze

Join yocto@lists.yoctoproject.org to automatically receive all group messages.