Yocto Technical Team Minutes, Engineering Sync, for December 14, 2021
Trevor Woerner
Yocto Technical Team Minutes, Engineering Sync, for December 14, 2021
archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit == disclaimer == Best efforts are made to ensure the below is accurate and valid. However, errors sometimes happen. If any errors or omissions are found, please feel free to reply to this email with any corrections. == attendees == Trevor Woerner, Stephen Jolley, Bruce Ashfield, Daiane, Jan-Simon Möller, Jon Mason, Joshua Watt, Saul Wold, Steve Sakoman, Randy MacLeod, Richard Purdie, Scott Murray, Rephael C, Peter Kjellerstedt, Ross Burton, Michael Opdenacker, Armin Kuster, Nathan Glimsdale, Ryan Eatmon == project status == - 3.5 M1 (kirkstone) in QA - 3.1.13 (dunfell) to be built this week - maintenance for AB, updating SSDs and updating distros, next week (Dec 20-24) - significant improvements to patch count, some changes might affect other layers - CVE metrics improved for dunfell and master - rising AB-int issues (new high!) == discussion == RP: looked at more patches last week. removed some patches related to a MIPS platform (support for which was also removed from the latest kernel, see https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.16-Drops-MIPS- Netlogic). these patches were never added to upstream binutils. an SH4 gdb patch was also removed, not sure if it even works anymore. if users need these things, they can be re-introduced in separate layers if required, but not appropriate for oe-core going forward. Ross wins the award for most invasive change, and changes most likely to require changes in other layers, but these are good changes. seeing good trends: 50 patches removed, and 50 patched moved out of pending state. i’m still working with upstream gcc to get some patches merged and am still hoping to get some libtool patches upstream too. RP: re: AB-int issues: there is a recurring bitbake selftest issue with the runqueue tests that does have a fix ready. also lttng: there appear to be a number of tools issues going on but all logged as one bug. upstream has fixed some of the original bugs we reported, but there are some other things. it all appears to have started around june RP: AB downtime next week. there’s never a good time to do it. reimaging of most of the cluster. Michael has permission to replace all the OSes on all the workers (bring in new ones, remove old ones). it’s a good time to bring in new distros and get rid of older versions. for all we know the AB may never work again! (lol). if you have anything that needs to be preserved make sure to let us know. RP: it also means that if we’re going to have a 3.1.13 release, it has to be this week. SS: i’m ready, there is a small set of patches RP: so the plan is to get those in, then do the build? SS: yes. i don’t think there’s anything controversial there at all RP: there’s a chance the parts might not arrive in time, so the update to the AB might get delayed Randy: if we upgrade all the AB to SSDs then we won’t have a control to see how things go, other than historical data? does everyone only use SSD? are magnetic disks still important? SS: i’ve been SSD-only for a couple years now RP: conversely i only use spinning rust JPEW: i would expect that the intersection of people doing things as extreme as the AB and still using spinning disks is confined to just the AB Randy: you would be wrong (lol). at least half of WR is still using magnetic disks. however we do plan to upgrade. RP: i understand the desire to have a control, but that would add to the maintenance burden. we’ll have to see how it goes. we have 2 performance testing workers as well, one is running CentOS 7 and the other is running Ubuntu 16.04; those will also need upgrading as well (we’ve been putting it off for too long now). so we might end up with 2 more performance workers (that will run in parallel with the existing 2) or the existing ones might just get replaced. it’s up to Michael Randy: what about the ARM worker, any sign of that machine arriving? RP: there’s talk of it, but getting stuff into the US is not easy Saul: is anyone talking to Ampere? RP: the people involved are the ARM people, so they know what they’re doing Randy: will the ARM worker get an SSD as well? RP: it think it already has one. if it doesn’t then it will Ross: the ARM worker is pretty old hardware, unfortunately RP: we have 2, one is older but bulletproof. the other one is faster but has a tendency to report CPU temps that are high JPEW: i sent an RFC to switch the bitbake-worker to asyncio RP: i had a look. i hadn’t thought of using asyncio in bitbake-worker because generally it is one of the more self-contained bits of bitbake that generally actually works and i had wanted to leave it alone. the patch adds more lines than it removes. is it an improvement? JPEW: given what it’s doing, i don’t think it’s going to be more efficient. most of the time it sits waiting for things. the big advantage would be the maintainability. asycnio is easier to read than the polling loop it was doing. the adding of lines might just be my way of writing code. RP: i don’t object to it as such. if *i* had done the conversion then i could read that code more easily, however, since i didn’t do the conversion, it makes maintainability harder for me. that’s not a criticism of the work itself. the diff is too big, maybe easier to just look at the updated code JPEW: yes the diff is worthless. also, we could simplify it even more if we slightly changed the protocol between bitbake-worker and bitbake-server. would fit better with how asyncio works and what’s already included in asyncio (i.e. asyncio already knows how to hande reading text line-by-line, but we do a tagged XML thing, which i had to write explicitly). if we change it to be more like the hashserv protocol (newline-delimited JSON) then that would fit very well with asyncio. that would reduce the size RP: i think the data (that goes over the bitbake-worker to bitbake-server link) can have newlines in it, so we’d need an escape mechanism JPEW: yes, it’s pickled data. it wouldn’t have to be newlines, you could split on any character RP: also, there are some lines removing some multiprocessing locking, is that still safe for workers that call into multiprocessing? JPEW: the lock was never used in bitbake-worker itself, just the child processes. so i moved it to the child processes. the child processes have a pipe to bitbake-worker and i left the lock in the child process. so if they’re multithreaded (or whatever) they still have a lock when they write into the parent process. but each of them has a dedicated pipe into bitbake-worker parent process, so that doesn’t need locking RP: yes, fair enough. i need to look at the final code and think about it some more RP: i’m worried about the bitbake server process (i.e. not the worker but the cooker). i have a pile of bugs but the general theme is: someone presses Ctrl-C and bitbake is off doing something else and doesn’t respond. in general (by design) we tend to defer things off (tasks are run by bitbake-worker and not bitbake itself) the trouble is once the parsing occurs in sub-processes it can can starve the connection handling. i’m worried about the threading model (or lack thereof) in our design. there are 2 types of commands that can be run against the server: synchronous and asynchronous. but if something goes wrong in some of those synchronous commands then you can’t even send a stop event to the server. asyncio doesn’t necessarily help us with any of this stuff JPEW: in order for asycio to help, everything has to be done asynchronously. e.g. long-running tasks have to punt it to a thread (if it’s not I/O bound) RP: asyncio probably isn’t going to be the answer here, we might have to push some of this out to a separate cooker thread with the server running in its own thread and handling the actual UI and commands (etc) separately JPEW: we’ll probably need a hybrid approach: asyncio for the main loop, and long-running stuff in a thread RP: it’s one of the bigger problems we have with bitbake right now. if anyone has any ideas… RP: re: meetings over the holidays. i’m guessing we’ll cancel meetings on the 28th, and most will be back by the 4th of January? will enough people be around for a meeting on the 21st? <several>: i’ll be around RP: okay, we’ll cancel the 28th and keep the others Randy: i heard that someone got the terminal working in phosh? has anyone else played with phosh and got it working? JPEW: yes, mostly working. you can download the daily build Randy: i’ll give it a try shortly. is it something we’ll keep until after 3.5? or are we going to rip out sato and replace it with phosh for this release? JPEW: oh no, not this release RP: that’s a bad idea. we’ll run with sato for the LTS RP: any other patches in oe-core that we should be doing things with? we have some good success cases (e.g. the puzzles app in sato, binutils, gdb). tcp-wrappers is appearing on my radar; upstream is dead and we’re carrying about 15 patches. also the musl systemd patches need attention ScottM: the two people who would care are not on this call Ross: i think there’s been some improvement to systemd accepting musl patches ScottM: maybe alpine would drive this issue, but maybe not RP: there are 2 sets of issues with systemd and musl: 1) headers issue (which i think is relatively work-around-able) and i think systemd is willing to negotiate on some of those patches 2) pieces of c library are missing and by patching them out causes security holes, therefore we probably won’t see systemd accepting those. systemd has made it quite clear that they want to rely on those libc features and they’re simply not there in musl (as is my understanding) ScottM: they’re quite vocal about being fine with being very linux-centric RP: i want to get this done early in the cycle, rather than waiting for the week before feature-freeze
|
|