Yocto Technical Team Minutes, Engineering Sync, for August 24, 2021
Trevor Woerner
Yocto Technical Team Minutes, Engineering Sync, for August 24, 2021
archive: https://docs.google.com/document/d/1ly8nyhO14kDNnFcW2QskANXW3ZT7QwKC5wWVDg9dDH4/edit == disclaimer == Best efforts are made to ensure the below is accurate and valid. However, errors sometimes happen. If any errors or omissions are found, please feel free to reply to this email with any corrections. == attendees == Trevor Woerner, Stephen Jolley, Peter Kjellerstedt, Randy MacLeod, Armin Kuster, Jan-Simon Möller, Joshua Watt, Richard Elberger, Scott Murray, Steve Sakoman, Richard Purdie, Saul Wold, Tim Orling, Alejandro Hernandez, Bruce Ashfield, Denys Dmytriyenko, Jon Mason, Ross Burton, Trevor Gamblin == project status == - now in feature freeze for 3.4 (honister) - read-only prserv and switch to asyncio merged - rust merge is problematic (issues with uninative), will need to be fixed in next day or two to make it into 3.4 - glibc 2.34 causes significant issues for pseudo, this will get worse as more host distros upgrade - tune file refactorization merged - still hoping to get some sbom stuff into 3.4 == discussion == RP: we’re now at feature freeze RP: the asycio stuff is finally working, thanks Scott RP: the news isn’t so good with rust - there’s some weird uninative issue (something to do with the linker relocations that we do). we were seeing issues on debian 8, but it looks like we can reproduce that issue by using the buildtools’ extended tarball as the compiler, which also provides its own libc, which them seems to cause the problems. i could get rid of the relocations that uninative causes, but at a cost of it not working with the eSDK, but i decided to ignore that. but even if we do that there’s always the relocation issue with the buildtools tarball which we can’t avoid. for a while i could reproduce reliably, but then it stopped and i can’t reproduce anymore Randy: i tried reproducing but couldn't. my impression is that the rust community is happy with meta-rust and use it for specific use-cases but they don’t go beyond that very much (and therefore aren’t seeing issues). even if we fixed the things you call blockers, i’d still call it beta quality for or-core if we merge it. do you want to merge it now (as beta quality) or wait for the next window? RP: there’s no winning scenario. if we merge it then i’m signing myself up to maintain and fix it (esp before release). on the other hand if we push it out then we’ll be in feature freeze and nobody will pay any attention to it until later, then other things will bump its priority down. i can see that there are some open issues dating back to 2016, that obviously nobody cares much about, so pushing it out isn’t going to change anything. Randy: not having rust in is holding back a bunch of things, but i, relatively, don’t know rust very well and without the rust community’s help i don’t know how to move this forward. ideally someone with rust experience could step up; maybe ARM? Ross: we’d like to see it in core, we’re using it but with meta-rust so we’re happy with it so far. my preference would be to hone it and push it early in the next release cycle Randy: schedules are dancing around, so we’ll try to get things moving along RP: the pseudo glibc problem has me scared. any distro that upgrades to glibc 2.34 (natively) will break. we have a ticking timebomb, and it was discovered by our toolchain testing (thanks Ross) RP: we make interesting assumptions with unintave and pseudo. we end up with host tools that are linked, potentially, against a newer glibc, therefore pseudo has to run as an LD_PRELOAD against multiple libc versions, so if it links against a newer one but then has to run against an older one it breaks with symbol location problems. we’ve had these issues before, and we’ve implemented various fixes. libpseudo only links against libdl and libpthread and we can’t get rid of those things (libdl because that’s how it works (fundamentally loading libraries dynamically), and threads because of the mutex that we use for locking). the release can’t go out if, when people upgrade their host systems, it’s going to break; badly. we’ve tried every technique that we’ve tried before and then some. in 2.34 all the symbols are merged back into the main library, so there are no libpthread symbols, it’s all part of libc.so. in the past we’ve been able to link against uninative 2.33 (libdl and libpthread) and then link pseudo-native against those binaries. thereby force-linking against older versions using the newer glibc headers (which is horrible). what worries me is i’m basically the only one paying attention; i don’t even have anyone to bounce ideas off of or talk to about it. so we have a solution, it is horrendous, but it’s the only thing we’ve got right now. so if there’s anyone who knows about weak linking or strong linking or mutex locks without pthreads i’d like to talk to them. JEPW: would you be opposed to making the direct kernel call to do the locking? that would bypass pthreads RP: i’m not adverse to it, you mean the futex calls? JPEW: yes RP: i’m not opposed, but i don’t think it’s as simple as making direct calls to the kernel. i read up on it but decided implementing our own locks wasn’t quite the direction i wanted to take. the number of ways to get this wrong is… interesting. JPEW: i know the futex call does a million things, and that’s one of the problems with it. i wonder if it would be possible to look at the pthreads mutex code and copy the parts that deal with futex? RP: i did think of doing that; just distilling the pthreads code into what we need. we just need a very simple lock so it might be possible. may be something we need to look at PeterK: wouldn’t you still need to link against libdl RP: yes, but the scary stuff that goes on is in pthreads (headers and declarations). the libdl stuff is 3 function calls that are plain; no dependencies, no crazy symbols, etc. long term, ideally, we’d get rid of the libpthread dependency, then libdl should be comparatively simpler TrevorW: i could take a stab at it, i’ve done dynamic library things before: loading a library, looking for a symbol, doing one thing or another based on whether it’s found RP: it’s more complicated than that. what they’ve done in libc is there’s now a libdl with weak globbing symbols that redirect the previous symbols back to libc, so you only get a libc linkage. i haven’t worked out how you’d force it to link to the libdl (which you have to do if you run against an older binary). specifying versions is one thing (easy to do), specifying the library… there’s no way to specify the library, it’s hard-coded at link time… as far as i can tell. the other viable solution (instead of the current one which is to use an older libc and force the link) my other plan was to create a dummy binary to link against that would put the symbols in the right place. so we could just take the linker and generate a specially-crafted binary, and then use it in the linking process to force libpseudo to look in the correct form. however i realized that it was probably easier just for testing purposes to download the glibc 2.33 binaries, rather than try to create a specially-crafted one. so another thing to potentially look at (besides those pre-built 2.33) would be a binary that would do the right things. then we could do it as part of the build process. so that could be something to look at TrevorW: my first step would be to reduce the problem to a simple test case RP: generating a simple test case isn’t so much the issue, it’s the fact it only breaks when you have a build within a build. but creating a test case would be easy. there is a bugzilla: https://bugzilla.yoctoproject.org/show_bug.cgi?id=14521 longer term, getting rid of the pthread dependency would be helpful then the libdl thing would be relatively simple. RP: i’ve talked about the things i know about which are gating m3, is there anything i don’t know about or haven’t mentioned JPEW: sbom stuff. it’s pretty hands-off, the only thing it touches that might affect anyone is the package data extended. RP: we should try that JPEW: is everyone okay with it (Saul and Ross) has anyone had a look at it. is it ready to go in (i know there still are things to add) Ross: looks good to me, the only thing i would mention is the path that’s used, but there’s a fix for that JPEW: yep Ross: i haven’t run selftest myself, but i don’t think there’s any massive problem with what’s there now Saul: i agree with Ross, there is one thing, but we can work around it, so i’m okay with it going in RP: Anuj and myself have started and killed loads so quickly recently that the AB is keeling over because it can’t delete things fast enough, so it’s running out of space SS: i think i’ve been contributing to that as well this week JPEW: is it a matter of “rm -fr” being slow RP: actually when we delete we actually move stuff to a junk area then do the actual deletion at idle, but there hasn’t been enough idle lately, so it’s running out of disk space Randy: is this something that TrevorG should look at? i.e. I/O load too high meaning builds won’t take place RP: not sure how we’d go about solving it TrevorG: i could look at it once i’m done my current stuff RP: maybe adding a task that runs early in a build that would block the start of new builds until a certain amount of resources are available TrevorG: sounds good JonM: with the last mesa update (2 days ago), anything that doesn’t have hard float on arm won’t compile. i don’t know if we’re going to need to have that as a requirement. it tries to do neon regardless of anything else RP: is it something they did intentionally, or by mistake? JonM: according to the mesa build logs, they were trying to speed up their build times by using neon instructions. this isn’t a problem even if you you have semi-modern arm hardware. anything with cortex is going to have hard float but we’re blowing up on the armv5 stuff because it’s ancient and we’re intentionally using it for the soft-float RP: is there something in mesa that we can configure to disable this JonM: don’t think so, it looks like it’s just checking for arm and then going ahead and doing it RP: maybe Ross has a friend or two who we could ask. perhaps ask upstream why the change was and if we couldn’t at least configure it TrevorW: curiously enough i do know of at least 1 armv5 soc that does have hard float (or vfp at least) because it is optional. but the vast majority of them don’t do hard float. i’m wondering about the pi 0’s and the pi 1’s, i believe those are armv6. JonM: the qemu that we’re using has hard float natively TrevorW: so you’re saying the pi’s shouldn’t be affected? JonM: probably not. although it would be affected if you had one of those but purposefully disabled hard float. you could configure yourself into a hole RP: we should figure out if they did this intentionally or not, because it is easy to do things like this unintentionally RP: the tune updates seemed to have gone well TrevorW: speaking of tunes i did run into one that doesn’t seem happy (mips32r2el-24kc) JonM: i could take a look at it RP: speaking of older platforms, we’re seeing an issue with serial port emulation on qemuppc that is causing lots of problems. paulg is looking at it. hopefully we can get a band-aid that will keep the AB happy. i do wonder how many people are using ppc, but everytime i try to remove it i get lots of pushback. it does show the project is multiplatform PeterK: i did the conversion to the new override syntax the other day, we now have a brand new syntax that is used for real overrides and wannabe overrides (e.g. FILES:${PN} and RDEPENDS:). these look like real overrides but they aren’t. ${PN} has to be first, but with real overrides the order doesn't matter. also you can’t say the :append has to be first because it has to come after the override-wannabe RP: i can see what you’re saying because the ordering is important. it’s not fair to call them wannabe overrides because the code does treat them as overrides PeterK: but they, technically, don’t use the override mechanism, so you can’t change the order of them RP: you can, it’s just that they get appended to the overrides variable in a limited context. e.g. when it’s writing the pn-package it will have ${PN} in overrides, when it’s writing the pn-debug it will put the ${PN} in overrides. so they are used as overrides. PeterK: yea, but there are a lot of places where you do things like getvar_foo:${PN} to get these variables RP: right. it is a compromise. going forward into the future when you do a getvar_files:${PN}, behind the scenes we put ${PN} in overrides then fetch that variable. in the future we can get creative and use this more effectively. i can’t promise you what the future will look like, but, code-wise, we had painted ourselves into a corner and we had to do something. so i don’t think they should be considered wannabe-overrides, they are overrides they’re just used in a slightly different context to say “machine override”. i know what you mean about the :append being a little bit tricky because i have seen a couple cases where some code was using the alternative format you alluded to which doesn’t quite make sense, but sorta does. the nice thing is we can at least now detect this which gives us more options going forward. this opens up the possibility to be more creative in the future, but it’s not like i have a concrete plan yet going forward. in my spare time i have been looking into the bitbake code, there’s a huge override data variable bitbake uses globally and it was hard to tell what was a variable and what was an override (e.g. SRC_URI). so we can move things from global scope to local scope which will give us a cleaner syntax and make things faster. as a worse case, even if there was no parsing advantages, it would at least make the syntax cleaner, which i think is a huge win. TrevorW: any plans to do the corner cases, e.g. layer.conf. these might not be overrides in the code, technically, but conceptually they are overrides RP: i do have a branch where i played with this (making layer.conf variables overrides). there are some interesting side effects. yes, they do look like overrides, but they aren’t ever used as overrides, which is why they weren’t converted, and there would be problems if some of them were converted because of the way they get used. the nice thing is that it is a very specific namespace. the : change was huge and global, but this is localized so it might not be too bad. maybe in the next release. there are things to do with collections and things that perhaps could go away. nobody today knows what a collection is, it’s only something you’d know if you used bitbake 12 years ago. SteveS: there are some updates we’re still waiting for on the AB restart RP: there are patches to swapbot that need to be applied as well. remind MichaelH. it’ll happen as part of the regular maintenance JPEW: did you want to enable spdx output on the AB? RP: we should at least have some tests for it JPEW: there are a couple knobs to balance the time it takes to generate it vs the amount of stuff you’re generating RP: we should at least have something somewhere exercising those TimO: recipetool/devtool don’t know about the spdx license identifier so they failed to pick up the right license for a couple things i was looking at recently RP: please open a bug TimO: OEHH tomorrow! RP: there’s a patch on the list, involving changes to glibc testing that concerns me. there has always been a dilemma regarding glibc’s testing: whether to include as a ptest or run with its own test runner? in other words, run it as a special case. and we already have a handful of special cases: binutils, gcc, and glibc. they’re big and unwieldy and aren’t easy to turn into ptests therefore we did run them standalone. the patch enables turning it into a ptest. so we now have the options of running it under system emulation using NFS, user-mode emulation, or using ptests. i’m worried it enables too many options where we have too many half-working solutions. Randy: for people who are concerned about the integrity of the toolchains it sounds like a good idea; more options sounds good RP: options are good to a point, but if you have two things doing, effectively, the same thing, then that can be problematic Randy: is there a way to run the current glibc tests on a target? RP: yes, not easy to setup, but can be done (give it an IP address etc) Randy: maybe give it to the doc person (MO) RP: there might be other higher priority things for docs right now TrevorW: are the 2 sets of tests orthogonal? RP: exact same tests, just run different ways Denys: OEHH tomorrow, Asia-Pacific, 9pm UTC Randy: do we have a test suite for self-hosted builds? RP: Ross’s tests for buildtools is close to that Randy: how do i find that? do you have a keyword? RP: the way you would run it is: bitbake buildtools-extended-tarball -c testsdk Ross: it only builds libc, as it depends on how much of the builds works RP: it’s the closest thing we have, it could be easily extended Denys: nomination period for OE TSC ends of today TrevorW: Joshua: was you video posted? JPEW: not yet, i think it should be soon Ross: what i read said it should be soon RP: something else i heard today says it would be soon, if not today. it was a good presentation, thanks Joshua
|
|