Yocto Technical Team Minutes, Engineering Sync, for May 18, 2021
Trevor Woerner
Yocto Technical Team Minutes, Engineering Sync, for May 18, 2021
archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit == announcements == The upcoming Yocto Project Summit is taking place May 25-26 2021 details: https://www.yoctoproject.org/yocto-project-virtual-summit-2021/ registration: https://www.cvent.com/d/yjq4dr/4W?ct=868bfddd-ca91-46bb-aaa5-62d2b61b2501 == disclaimer == Best efforts are made to ensure the below is accurate and valid. However, errors sometimes happen. If any errors or omissions are found, please feel free to reply to this email with any corrections. == attendees == Trevor Woerner, Stephen Jolley, Peter Kjellerstedt, Steve Sakoman, Joshua Watt, Scott Murray, Michael Halstead, Ross Burton, Randy MacLeod, Tim Orling, Jon Mason, Jan-Simon Möller, Alexandre Belloni, Bruce Ashfield, Richard Purdie, Trevor Gamblin, Tony Tascioglu, Alejandro H == notes == - 3.2.4 was released (gatesgarth), this is the last 3.2.x release before community support - 3.3.1 is in QA (hardknott) - round of AUH updates being added now - Anuj not available, RP acting as maintainer for hardknott - huge drop in CVEs against master - multiconfig issues in bitbake - smp enabled on qemu arm/x86 and switched to newer MACHINE for x86 - OOM issue tracked to glibc - bitbake heartbeat events causing bitbake to hang, patch pending - sstate bug tracked down (thanks JaMa), fix merged - enabled more resource control == general == RP: the bitbake heartbeat issues seems to be cause by additional logging that was added RP: mips/ppc/arm glibc usermode testing seems to cause an OOM issues (it was found to be using 83GB and up) Ross: that’s a lot of memory for a glibc test RP: appears to loop infinitely, even if glibc exits, it leaves things behind that eat up memory. there are some tests that segfault in malloc(), seeing 12GB of memory being used, we might have to get rid of them until we can figure it out. this seems unloved, if nobody fixes this we'll pull it out Ross: glibc tests running inside qemu-user? Ross: what about downgrading qemu RP: the tests are things we supply (system mode and user mode) x86 is fast enough (thanks to kvm) that they can run okay, but off architectures struggle Ross: hopefully we’ll just find some tests to disable (if they’re not important enough) RP: i’ve identified 4 tests specifically (e.g. pthread timed locked loop) Ross: be interesting to run on real hardware to see if there’s an actual leak in qemu or an issue with the test RP: maybe we can add a resource constraint so qemu dies and doesn’t take the whole system down with it instead Ross: we can look at it from an arm point of view RP: this is an oe-selftest, so not run on arm RP: rpm and deb compression not covered, and we know rpm (for example) is using all available threads (because it uses liblzma directly instead of xz) so our constraints aren’t trickling down to tools like this. we need to look to see if there are other places and examples of this happening RP: open letter about challenges (maintainers, resources, contributions to day-to-day running of an open source project). was written because people have asked me to. let me know if you know of anyone wanting to follow up. for example, it would be nice to know who is using the project (and list it publicly). spread the word. see: https://lists.openembedded.org/g/openembedded-architecture/topic/open_source_maintainers_an/82722442 ScottM: multiconfig issues on hardknott? any details? about to start some dunfell stuff RP: i don’t think some things have been backported to dunfell, a fix was added to master but it didn’t fix the problem. looked at it last week but got dragged away by other things. JPEW: we use multiconfig a lot, i’ll take a look at it RP: instead of a deferred list that is continuously updated, i want it to calculate things once (the rehash list), so take a look at what’s in master-next. it shouldn’t be too hard to pull back into dunfell JPEW: we use dunfell at work, so it shouldn't be too hard to test. i can test master-next at home RP: the master-next patch should apply to dunfell quite easily RP: i’m worried about multiconfig RP: we’ve improved the sstate cleanups quite a bit, tmp cleanliness is quite good (hardknott) but people are writing recipes that don’t care if data is machine-specific, this causes multiconfig to blow up. people are writing bad recipes and then complaining when their bad recipes blow things up JPEW: how to tell if recipe is bad? can we automate something? RP: no, not easily JPEW: the problem happens when all multiconfigs share the same tmp directory? RP: yes. writing a test would be hard, it would have to go into oe not bitbake JPEW: curious to know how many people are using multiconfig and using multiconfig with tmp directories RP: according to irc, many Ross: i think there are lots of people using multiconfig too soon, multiconfig is overkill for them, and then they end up abusing the tmp dirctory JPEW: because multiconfig uses a common tmp directory by default? Ross: yes AlejandroH: i submitted doc fix to recommend people use separate tmp dirs when using multiconfig JPEW: i will be talking about it next week in my hands-on class at the YP Summit RP: to Ross’s point, too many people are now turning to multiconfig to build completely separate builds for multiple separate machines. this ignores how the build works, historically ScottM: sstate stamps are not machine-specific RP: exactly. i think we have to figure out how to detect it, but not sure TrevorW: do we want to steer people away from using multiconfig to build unrelated images for unrelated machines? RP: i don’t think we can, i think this is what people have been wanting, so we need to make sure people are doing it right. JPEW: on the stamp files, would it be possible (expedient) to run bitbake in a way that all it does is generate the stamp files for all tasks? then maybe we could run bitbake to generate all these stamps and then look for overlaps? RP: that’s what deferred tasks are. where it sees sstate stamp files that overlap it will do one first, then look at the other overlapping ones. ideally doing one first will put stuff into sstate that the others can then use. so it does do these checks. the problem is that when there are mistakes in the recipes these tasks don’t actually have anything to do with each other: 1. hashes are identical or not (deferred tasks) 2. checks for stamp overlaps (don’t run 2 tasks with the same hash) TimO: that explains why i’m not seeing issues because i always build with separate tmp dirs. we can use shared download dir and shared sstate dirs, but it’s only with shared tmp dirs? RP: yes. but you will be seeing deferred tasks, they just won’t interfere with each other if you’re using separate tmp dirs AlejandroH: i test dunfell multiconfig but haven’t seen it (because i too use separate tmp dirs for each build) JPEW: in theory we should be able to use the same tmp dir, we just have to do it right RP: actually, in my testing i have been using separate tmp dirs, but i’m still seeing issues. so i don’t think separate tmp dirs is completely related. qemu cross wrapper script was building it twice and the license was the same but causes a rehash event for one of the builds, this is what starts the problems. so the cause is: deferred builds, multiconfig, reusing hashes TrevorW: cancel next week’s call due to YP Summit? RP: (gathers consensus) agreed RP: software bill of materials, SBOM (in the US) we don’t have one currently, but we could do something similar following the model of the license SPDX. this could really put our project in the limelight Randy: i think Saul has looked into this TrevorW: i’ll add it to the OEDVM list JPEW: we have a tool internally that does something like this, if Saul is working on it then i’d like to be looped in RP: meta-doubleopen seems to be doing something like this: https://github.com/doubleopen-project/meta-doubleopen https://lists.yoctoproject.org/g/yocto/topic/vs_vs_yocto_make/82910309 we don’t need multiple solutions, we need something simple and be able to say we have it, then build on it JPEW: do you do that at WR? RP: the tool i linked does SPDX PeterK: easy as long as it’s open source, the closed code we bring in gets harder TrevorW: isn’t this what we get with buildhistory? RP: yes, apart from the spdx format JPEW: since we’re building everything, we have all the data RP: i’m sure there will be lots of corner cases, but we need to say “we do this” JPEW: json format in version 2 RP: and we can translate accordingly (xls, rdf, yaml, json, xml) RP: let’s use the licensing email list to discuss this
|
|