What are the key factors for yocto build speed?


Oliver Westermann
 

Hey,

We're currently using a VM on Windows and it's a lot slower than the native linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a self-build tower PC). Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a "consumer grade" board like the X570.

Suggestions, hints and links welcome :)


Mikko Rapeli
 

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
Hey,

We're currently using a VM on Windows and it's a lot slower than the native linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a self-build tower PC). Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a "consumer grade" board like the X570.
Drop all virtualization and go for Linux on bare metal. Then make sure there
is enough(tm) physical RAM for each CPU thread. For a "big Ryzen" the 64G of RAM sounds
too little. I'd go higher there, but it all depends what kind of project
is being compiled.

I would also setup CPU, RAM, disk IO and network IO monitoring on the build machines
and review and monitor the build times and results when the project evolves. There are
times when most CPUs will be idling and there will be times when IO to disk is
happening even when RAM is available. Linux kernel can be tuned to avoid disk IO if
RAM is still available for example.

Hope this helps,

-Mikko


 

On Wed, 18 Mar 2020 at 13:01, Mikko Rapeli <mikko.rapeli@bmw.de> wrote:

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
Hey,

We're currently using a VM on Windows and it's a lot slower than the native linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a self-build tower PC). Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a "consumer grade" board like the X570.
Drop all virtualization and go for Linux on bare metal. Then make sure there
is enough(tm) physical RAM for each CPU thread. For a "big Ryzen" the 64G of RAM sounds
too little. I'd go higher there, but it all depends what kind of project
is being compiled.

I would also setup CPU, RAM, disk IO and network IO monitoring on the build machines
and review and monitor the build times and results when the project evolves. There are
times when most CPUs will be idling and there will be times when IO to disk is
happening even when RAM is available. Linux kernel can be tuned to avoid disk IO if
RAM is still available for example.
What Mikko said is excellent advice.

I'd recommend NVMe drives if you can. My build machine has two large
M.2 NVMe drives, one is used for the working directory and the other
for sstate, downloads and source checkouts to make the best use of the
available bandwidth. I find that XFS is a better fit than ext4 when
you've got fast drives and highly parallel I/O workloads.

Make sure you spread your RAM across the different memory channels on
your motherboard as this increases memory bandwidth - this usually
means either filling your RAM slots or half-filling them with RAM in
every 2nd slot but you'll need to check the motherboard manual to
confirm the channel layout.

Let me know if you need a detailed review of your proposed setup.

Thanks,
Paul


Mike Looijmans
 

Big ryzen is a good choice.

My home rig is a Ryzen 1900, with only 8GB RAM. It's way faster at OE yocto builds than the i7 at work that has 32GB RAM installed. My 8GB rig does not use swap space while building huge images (like satellite receivers and full-blown XFCE desktops).

For the CPU, the more cores the better. OE loves cores. Real cores are better than SMT cores (a.k.a. hyperthreading), and AMD's SMT has more effect than Intel's.

Harddisk speed has very little impact on your build time. It helps with the "setscene" parts, but doesn't affect actual compile time at all. I recall someone did a build from RAM disks only on a rig, and it was only about 1 minute faster on a one hour build compared to rotating disks.

SSD or NVMe doesn't make much difference. If you have budget to spend, spend it on RAM and CPU but not on disks.

I'd go for a reasonable NVMe disk, mostly for storing and booting the OS itself.

If you plan to share sstate-cache and downloads from this machine to other clients, I'd even suggest a plain big (4TB or so) rotating disk for storing that kind of stuff.

Again, don't spend your budget on disks. Disks are way easier to add later than any other component.

On 18-03-2020 13:52, Oliver Westermann via Lists.Yoctoproject.Org wrote:
Hey,
We're currently using a VM on Windows and it's a lot slower than the native linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a self-build tower PC). Any suggestions what to put in that build to get the most out of it?
Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a "consumer grade" board like the X570.
--
Mike Looijmans


Jean-Marie Lemetayer
 

Hi,

In my company we have tested some "big Ryzen" configurations to speed up Yocto builds.

The conclusion of these tests is that the build time is almost only related to the CPU score: https://www.cpubenchmark.net/high_end_cpus.html

The speed (overclock) and size of the RAM does not influence the build time, neither the the use of a SATA SSD or a NVME.

For example one of our build servers is using:
- AMD Ryzen 9 3900X
- ASUS PRIME X570-P
- 32Go DDR4 3200 MHZ CL14
- 500Go SSD

It is a really good price / build time ratio configuration.

Best regards,
Jean-Marie

On Mar 18, 2020, at 2:29 PM, Paul Barker pbarker@konsulko.com wrote:
On Wed, 18 Mar 2020 at 13:01, Mikko Rapeli <mikko.rapeli@bmw.de> wrote:

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
Hey,

We're currently using a VM on Windows and it's a lot slower than the native
linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a
self-build tower PC). Any suggestions what to put in that build to get the most
out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a
"consumer grade" board like the X570.
Drop all virtualization and go for Linux on bare metal. Then make sure there
is enough(tm) physical RAM for each CPU thread. For a "big Ryzen" the 64G of RAM
sounds
too little. I'd go higher there, but it all depends what kind of project
is being compiled.

I would also setup CPU, RAM, disk IO and network IO monitoring on the build
machines
and review and monitor the build times and results when the project evolves.
There are
times when most CPUs will be idling and there will be times when IO to disk is
happening even when RAM is available. Linux kernel can be tuned to avoid disk IO
if
RAM is still available for example.
What Mikko said is excellent advice.

I'd recommend NVMe drives if you can. My build machine has two large
M.2 NVMe drives, one is used for the working directory and the other
for sstate, downloads and source checkouts to make the best use of the
available bandwidth. I find that XFS is a better fit than ext4 when
you've got fast drives and highly parallel I/O workloads.

Make sure you spread your RAM across the different memory channels on
your motherboard as this increases memory bandwidth - this usually
means either filling your RAM slots or half-filling them with RAM in
every 2nd slot but you'll need to check the motherboard manual to
confirm the channel layout.

Let me know if you need a detailed review of your proposed setup.

Thanks,
Paul


Alexander Kanavin
 

I have to say that AMD aggressively pursuing core count increase on consumer level CPUs is awesome news for the YP. Previously, you had to buy some hugely overpriced Xeons or similar to be able to work efficiently, or rely on CI doing builds for you which makes interactive development complicated.

Alex


On Wed, 18 Mar 2020 at 15:12, Jean-Marie Lemetayer <jean-marie.lemetayer@...> wrote:
Hi,

In my company we have tested some "big Ryzen" configurations to speed up Yocto builds.

The conclusion of these tests is that the build time is almost only related to the CPU score: https://www.cpubenchmark.net/high_end_cpus.html

The speed (overclock) and size of the RAM does not influence the build time, neither the the use of a SATA SSD or a NVME.

For example one of our build servers is using:
- AMD Ryzen 9 3900X
- ASUS PRIME X570-P
- 32Go DDR4 3200 MHZ CL14
- 500Go SSD

It is a really good price / build time ratio configuration.

Best regards,
Jean-Marie

On Mar 18, 2020, at 2:29 PM, Paul Barker pbarker@... wrote:
> On Wed, 18 Mar 2020 at 13:01, Mikko Rapeli <mikko.rapeli@...> wrote:
>>
>> On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
>> > Hey,
>> >
>> > We're currently using a VM on Windows and it's a lot slower than the native
>> > linux build (which is expected).
>> > We're looking into getting a dedicated build server for our team (basically a
>> > self-build tower PC). Any suggestions what to put in that build to get the most
>> > out of it?
>> >
>> > Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a
>> > "consumer grade" board like the X570.
>>
>> Drop all virtualization and go for Linux on bare metal. Then make sure there
>> is enough(tm) physical RAM for each CPU thread. For a "big Ryzen" the 64G of RAM
>> sounds
>> too little. I'd go higher there, but it all depends what kind of project
>> is being compiled.
>>
>> I would also setup CPU, RAM, disk IO and network IO monitoring on the build
>> machines
>> and review and monitor the build times and results when the project evolves.
>> There are
>> times when most CPUs will be idling and there will be times when IO to disk is
>> happening even when RAM is available. Linux kernel can be tuned to avoid disk IO
>> if
>> RAM is still available for example.
>
> What Mikko said is excellent advice.
>
> I'd recommend NVMe drives if you can. My build machine has two large
> M.2 NVMe drives, one is used for the working directory and the other
> for sstate, downloads and source checkouts to make the best use of the
> available bandwidth. I find that XFS is a better fit than ext4 when
> you've got fast drives and highly parallel I/O workloads.
>
> Make sure you spread your RAM across the different memory channels on
> your motherboard as this increases memory bandwidth - this usually
> means either filling your RAM slots or half-filling them with RAM in
> every 2nd slot but you'll need to check the motherboard manual to
> confirm the channel layout.
>
> Let me know if you need a detailed review of your proposed setup.
>
> Thanks,
> Paul
>
>


Mike Looijmans
 

On 18-03-2020 15:09, Mike Looijmans via Lists.Yoctoproject.Org wrote:
Big ryzen is a good choice.
My home rig is a Ryzen 1900, with only 8GB RAM. It's way faster at OE yocto builds than the i7 at work that has 32GB RAM installed.

Sorry - wrong number. My rig does not have a 1900, but an "AMD Ryzen 7 1700".


Adrian Bunk
 

On Wed, Mar 18, 2020 at 10:12:26AM -0400, Jean-Marie Lemetayer wrote:
...
For example one of our build servers is using:
- AMD Ryzen 9 3900X
...
- 32Go DDR4 3200 MHZ CL14
...
It is a really good price / build time ratio configuration.
Depends on what you are building.

Building non-trivial C++ code (e.g. webkitgtk) with 24 cores
but only 32 GB RAM will not work, for such code you need
more than 2 GB/core.

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
...
Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple
SSDs on a "consumer grade" board like the X570.
...
I would buy 128 GB RAM to not run into problems due to lack of RAM,
and Linux will also automatically use unused RAM as disk cache.

As long as you aren't running out of RAM or disk space all that matters
is CPU speed, Ryzen 9 3950X with 128 GB RAM would be my choice unless
you are on a tight budget.

cu
Adrian


Mike Looijmans
 

On 18-03-2020 15:49, Adrian Bunk via Lists.Yoctoproject.Org wrote:
On Wed, Mar 18, 2020 at 10:12:26AM -0400, Jean-Marie Lemetayer wrote:
...
For example one of our build servers is using:
- AMD Ryzen 9 3900X
...
- 32Go DDR4 3200 MHZ CL14
...
It is a really good price / build time ratio configuration.
Depends on what you are building.
Building non-trivial C++ code (e.g. webkitgtk) with 24 cores
but only 32 GB RAM will not work, for such code you need
more than 2 GB/core.
Seems a bit excessive to buy hardware just to handle a particular corner case. Most of OE/Yocto code is plain C, not even C++.

My rig only has 8GB but doesn't run into memory issues during big GUI builds. The only thing that made it swap was the populate_sdk task that created a 1.1GB fiel and needed 20GB of RAM to compress that. Took a few minutes more due to swapping.
I submitted a patch today to fix that in OE.

Your mileage may vary. But RAM is easy to add.

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
...
Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple
SSDs on a "consumer grade" board like the X570.
...
I would buy 128 GB RAM to not run into problems due to lack of RAM,
and Linux will also automatically use unused RAM as disk cache.
As long as you aren't running out of RAM or disk space all that matters
is CPU speed, Ryzen 9 3950X with 128 GB RAM would be my choice unless
you are on a tight budget.
Of course he's on a tight budget. He wouldn't need to ask for advice otherwise...

Most consumer boards support up to 64GB RAM. Pushing to 128 may suddenly double the price of the mobo as well. I'd go for 32 (as 2x16GB) and do an easy upgrade to 64 when there's trouble. Even with 4x16GB that's not a bad investment, if it turns out to be a bottleneck 16GB modules will be easy to sell (contrary to smaller modules).


Martin Jansa
 

On Wed, Mar 18, 2020 at 05:52:37AM -0700, Oliver Westermann wrote:
Hey,

We're currently using a VM on Windows and it's a lot slower than the native linux build (which is expected).
We're looking into getting a dedicated build server for our team (basically a self-build tower PC). Any suggestions what to put in that build to get the most out of it?

Currently we're looking at a big Ryzen, 64G of RAM and one or multiple SSDs on a "consumer grade" board like the X570.

Suggestions, hints and links welcome :)
Other replies look good to me, here are few additions:

If you want to compare how terrible your current VM compares with some
other builders, you can use:
https://github.com/shr-project/test-oe-build-time
I wouldn't be surprised if your VM performs even worse than +- 200USD
ryzen 1600 system.

I would be happy to apply pull requests from other people in this
thread with their suggestions.

You didn't mention the budget, but big Ryzen is definitely good choice.

You also didn't mention how "big" your typical builds are, if you're
building something as big as the webengines used in test-oe-build-time,
then it might be worth to spend a bit extra on 3970X Threadripper if
budget allows.

I'm still looking for someone with access to Epyc (ideally 7702P or
7502P), because it's only a bit more expensive than corresponding
Threadripper, but without the unfortunate limitation to 8 DIMM slots
with difficult to buy 256GB sets:
https://www.gskill.com/community/1502239313/1574739775/G.SKILL-Announces-New-High-Performance,-Ultra-Capacity-DDR4-Memory-Kits-for-HEDT-Platforms
will be nice when it gets finally available. On Epyc on the other hand
you'll get 8 channels instead of 4 and much less issues to find
compatible kit (even with ECC support). If the performance is
significantly better than 3990X, then 7702P might be much better option
for "professional" builder - as long as you can cool server
mother board in tower PC efficiently enough.

Cheers,


Mikko Rapeli
 

On Wed, Mar 18, 2020 at 04:09:39PM +0100, Mike Looijmans wrote:
On 18-03-2020 15:49, Adrian Bunk via Lists.Yoctoproject.Org wrote:
On Wed, Mar 18, 2020 at 10:12:26AM -0400, Jean-Marie Lemetayer wrote:
...
For example one of our build servers is using:
- AMD Ryzen 9 3900X
...
- 32Go DDR4 3200 MHZ CL14
...
It is a really good price / build time ratio configuration.
Depends on what you are building.

Building non-trivial C++ code (e.g. webkitgtk) with 24 cores
but only 32 GB RAM will not work, for such code you need
more than 2 GB/core.
Seems a bit excessive to buy hardware just to handle a particular corner
case. Most of OE/Yocto code is plain C, not even C++.

My rig only has 8GB but doesn't run into memory issues during big GUI
builds. The only thing that made it swap was the populate_sdk task that
created a 1.1GB fiel and needed 20GB of RAM to compress that. Took a few
minutes more due to swapping.
I submitted a patch today to fix that in OE.

Your mileage may vary. But RAM is easy to add.
Well, I can't build with under 2 gigs per core or I run out of physical memory
and kernel oom-killer kicks in to kill the build. Also can't run
with yocto default parallel settings which only take into account the
number of cores and thus have a custom script which does caps the threads
so that 2 gigs of RAM for each are available.

Though I'm sure plain C and plain poky projects have less requirements for RAM.

-Mikko


Ross Burton
 

On 18/03/2020 14:09, Mike Looijmans wrote:
Harddisk speed has very little impact on your build time. It helps with the "setscene" parts, but doesn't affect actual compile time at all. I recall someone did a build from RAM disks only on a rig, and it was only about 1 minute faster on a one hour build compared to rotating disks.
My build machine has lots of RAM and I do builds in a 32GB tmpfs with rm_work (and no, I don't build webkit, which would make this impractical).

As you say, with sufficient RAM the build speed is practically the same as on disks due to the caching (especially if you tune the mount options), so I'd definitely spend money on more RAM instead of super-fast disks. I just prefer doing tmpfs builds because it saves my spinning rust. :)

Ross


Mikko Rapeli
 

On Wed, Mar 18, 2020 at 10:56:50PM +0000, Ross Burton wrote:
On 18/03/2020 14:09, Mike Looijmans wrote:
Harddisk speed has very little impact on your build time. It helps with
the "setscene" parts, but doesn't affect actual compile time at all. I
recall someone did a build from RAM disks only on a rig, and it was only
about 1 minute faster on a one hour build compared to rotating disks.
My build machine has lots of RAM and I do builds in a 32GB tmpfs with
rm_work (and no, I don't build webkit, which would make this impractical).

As you say, with sufficient RAM the build speed is practically the same as
on disks due to the caching (especially if you tune the mount options), so
I'd definitely spend money on more RAM instead of super-fast disks. I just
prefer doing tmpfs builds because it saves my spinning rust. :)
Alternative for tmpfs with hard size limit is to keep file system caches in
memory as long as possible and only start writes to disks when page cache gets
too full. This scales but still uses all the RAM available. Here's how to do this:

$ cat /etc/sysctl.d/99-build_server_fs_ops_to_memory.conf
# fs cache can use 90% of memory before system starts io to disk,
# keep as much as possible in RAM
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 90
# keep stuff for 12h in memory before writing to disk,
# allows reusing data as much as possible between builds
vm.dirty_expire_centisecs = 4320000
vm.dirtytime_expire_seconds = 432000
# allow single process to use 60% of system RAM for file caches, e.g. image build
vm.dirty_bytes = 0
vm.dirty_ratio = 60
# disable periodic background writes, only write when running out of RAM
vm.dirty_writeback_centisecs = 0

Once this is done, IO still happens when anything calls sync() and fsync()
and worst offenders are package management tools. In yocto builds, package
manager actions to flush to disk are always useless since rootfs images
are going to be compressed and original ones wiped by rm_work anyway.
I've tried to hook eatmydata library into the build which makes sync() and fsync()
calls no-ops but I've still failed to fix all the tools and processes called
during build from python code. For shell based tasks this does it:

$ export LD_LIBRARY_PATH=/usr/lib/libeatmydata
$ export LD_PRELOAD=libeatmydata.so
$ grep -rn LD_PRELOAD conf/local.conf
conf/local.conf:305:BB_HASHBASE_WHITELIST_append = " LD_PRELOAD"
conf/local.conf:306:BB_HASHCONFIG_WHITELIST_append = " LD_PRELOAD"

The effect is clearly visible during build time using Performance Co-Pilot (pcp)
or similar tools to monitor CPU, memory, IO and network IO. The usage of RAM
as page cache grows until limits are hit and only then writes to disk
start, except for the python image classes... Hints to fix this are welcome!

To my knowledge of monitoring our builds, there is a lot of optimization
potential to better build times. CPU are under utilized during bitbake recipe
parsing, fetch, configure, package and rootfs tasks. Memory is not fully utilized
either since IO through sync()/fsync() happens everywhere, and due to background
writes by default on ext4 etc file systems. Only do_compile() tasks are saturating
all CPUs and when linking lots of C++ also all of RAM. Then dependencies between
various recipes and tasks leaves large gaps in CPU utilization too.

-Mikko


Richard Purdie
 

On Thu, 2020-03-19 at 08:05 +0000, Mikko Rapeli wrote:
Once this is done, IO still happens when anything calls sync() and
fsync() and worst offenders are package management tools. In yocto
builds, package manager actions to flush to disk are always useless
since rootfs images are going to be compressed and original ones
wiped by rm_work anyway.
I've tried to hook eatmydata library into the build which makes
sync() and fsync() calls no-ops but I've still failed to fix all the
tools and processes called during build from python code. For shell
based tasks this does it:

$ export LD_LIBRARY_PATH=/usr/lib/libeatmydata
$ export LD_PRELOAD=libeatmydata.so
$ grep -rn LD_PRELOAD conf/local.conf
conf/local.conf:305:BB_HASHBASE_WHITELIST_append = " LD_PRELOAD"
conf/local.conf:306:BB_HASHCONFIG_WHITELIST_append = " LD_PRELOAD"
Doesn't pseudo intercept and stop these sync calls already? Its
supposed to so if its not, we should fix that.

The effect is clearly visible during build time using Performance Co-
Pilot (pcp) or similar tools to monitor CPU, memory, IO and network
IO. The usage of RAM as page cache grows until limits are hit and
only then writes to disk start, except for the python image
classes... Hints to fix this are welcome!

To my knowledge of monitoring our builds, there is a lot of
optimization
potential to better build times. CPU are under utilized during
bitbake recipe parsing
Recipe parsing should hit 100% CPU, its one of the few places we can do
that.

, fetch, configure, package and rootfs tasks.
Sadly these tasks are much harder.

Memory is not fully utilized either since IO through sync()/fsync()
happens everywhere
non-pseudo tasks?

Cheers,

Richard


Mikko Rapeli
 

On Thu, Mar 19, 2020 at 11:04:26AM +0000, Richard Purdie wrote:
On Thu, 2020-03-19 at 08:05 +0000, Mikko Rapeli wrote:
Once this is done, IO still happens when anything calls sync() and
fsync() and worst offenders are package management tools. In yocto
builds, package manager actions to flush to disk are always useless
since rootfs images are going to be compressed and original ones
wiped by rm_work anyway.
I've tried to hook eatmydata library into the build which makes
sync() and fsync() calls no-ops but I've still failed to fix all the
tools and processes called during build from python code. For shell
based tasks this does it:

$ export LD_LIBRARY_PATH=/usr/lib/libeatmydata
$ export LD_PRELOAD=libeatmydata.so
$ grep -rn LD_PRELOAD conf/local.conf
conf/local.conf:305:BB_HASHBASE_WHITELIST_append = " LD_PRELOAD"
conf/local.conf:306:BB_HASHCONFIG_WHITELIST_append = " LD_PRELOAD"
Doesn't pseudo intercept and stop these sync calls already? Its
supposed to so if its not, we should fix that.
I will double check, but I'm sure I see IO going to disk when plenty of RAM
is still available in page cache.

The effect is clearly visible during build time using Performance Co-
Pilot (pcp) or similar tools to monitor CPU, memory, IO and network
IO. The usage of RAM as page cache grows until limits are hit and
only then writes to disk start, except for the python image
classes... Hints to fix this are welcome!

To my knowledge of monitoring our builds, there is a lot of
optimization
potential to better build times. CPU are under utilized during
bitbake recipe parsing
Recipe parsing should hit 100% CPU, its one of the few places we can do
that.
I'm not fully aware what bitbake does before starting task execution.
With sumo, there is an initial spike in CPU use and then a long single
thread wait where log shows "Initialising tasks..." and Cooker process
is using a single core. For me this takes at least one minutes for
every build. Same is visible with zeus too.

Example graph from pmchart:

https://mcfrisk.kapsi.fi/temp/bitbake_start_to_task_execution.png

, fetch, configure, package and rootfs tasks.
Sadly these tasks are much harder.
Yep.

Memory is not fully utilized either since IO through sync()/fsync()
happens everywhere
non-pseudo tasks?
I'll try to check this case once more.

-Mikko


Richard Purdie
 

On Thu, 2020-03-19 at 11:43 +0000, Mikko.Rapeli@bmw.de wrote:
On Thu, Mar 19, 2020 at 11:04:26AM +0000, Richard Purdie wrote:
Recipe parsing should hit 100% CPU, its one of the few places we
can do
that.
I'm not fully aware what bitbake does before starting task execution.
With sumo, there is an initial spike in CPU use and then a long
single thread wait where log shows "Initialising tasks..." and Cooker
process is using a single core. For me this takes at least one
minutes for every build. Same is visible with zeus too.
This isn't recipe parsing but runqueue setup and taskgraph calculation
which happens after parsing but before task execution. More recent
bitbake is probably a bit better at it but it is unfortunately a
single threaded process :(

Cheers,

Richard


Mike Looijmans
 

On 19-03-2020 12:04, Richard Purdie via Lists.Yoctoproject.Org wrote:
, fetch, configure, package and rootfs tasks.
Sadly these tasks are much harder.
It would be really great if some sort of "weight" could be attached to a task. This relates to memory usage.

My system has 16 cores but only 8GB RAM. With both parallelization options to "16", I might end up with 16 compile tasks running 16 compile threads each, i.e. 256 running processes. In practice this doesn't actually happen, but the memory load gets high sometimes, so I reduce the TASKS to 8 at most. That has kept my system out of swap trouble for the time being.

The idea was that tasks get a "weight" in terms of cores they'll use, and the scheduler takes that into account. So it would run 16 do_configure tasks (weight=1) in parallel, but it would not start a new task that would push the weight over some number (say 40 for my case). So it would start a third compile, but not a fourth, but it would start a do_configure task.

Does that make sense?

In builds involving FPGA's I have tasks that take up about 48GB of RAM (my machine cannot run them) but only a single CPU core. Attempting to run multiple of these in parallel (happened to me when I changed some shared recipe content) will bring most machines to their knees. Currently my only way of handling that is manual interference...

--
Mike Looijmans


Yann Dirson
 



Le jeu. 19 mars 2020 à 17:07, Mike Looijmans <mike.looijmans@...> a écrit :
On 19-03-2020 12:04, Richard Purdie via Lists.Yoctoproject.Org wrote:
>> , fetch, configure, package and rootfs tasks.
>
> Sadly these tasks are much harder.

It would be really great if some sort of "weight" could be attached to a
task. This relates to memory usage.

My system has 16 cores but only 8GB RAM. With both parallelization
options to "16", I might end up with 16 compile tasks running 16 compile
threads each, i.e. 256 running processes. In practice this doesn't
actually happen, but the memory load gets high sometimes, so I reduce
the TASKS to 8 at most. That has kept my system out of swap trouble for
the time being.

This could be neatly handled by using the GNU-make job-server mechanism.
If bitbake itself would provide a jub-server, all make-based recipes would
automatically get their jobs properly limited.  There is a (sadly not merged yet)
MR [1] for ninja tu gain job-server support as well, through which we should have
a pretty good coverage of the recipes set (as a backend for cmake, meson, and more).


--
Yann Dirson <yann@...>
Blade / Shadow -- http://shadow.tech


Richard Purdie
 

On Thu, 2020-03-19 at 17:29 +0100, Yann Dirson wrote:


Le jeu. 19 mars 2020 à 17:07, Mike Looijmans <mike.looijmans@topic.nl> a écrit :
On 19-03-2020 12:04, Richard Purdie via Lists.Yoctoproject.Org wrote:
, fetch, configure, package and rootfs tasks.
Sadly these tasks are much harder.
It would be really great if some sort of "weight" could be attached to a
task. This relates to memory usage.

My system has 16 cores but only 8GB RAM. With both parallelization
options to "16", I might end up with 16 compile tasks running 16 compile
threads each, i.e. 256 running processes. In practice this doesn't
actually happen, but the memory load gets high sometimes, so I reduce
the TASKS to 8 at most. That has kept my system out of swap trouble for
the time being.
This could be neatly handled by using the GNU-make job-server mechanism.
If bitbake itself would provide a jub-server, all make-based recipes would
automatically get their jobs properly limited. There is a (sadly not merged yet)
MR [1] for ninja tu gain job-server support as well, through which we should have
a pretty good coverage of the recipes set (as a backend for cmake, meson, and more).

[1] https://github.com/ninja-build/ninja/issues/1139
You mean like:

http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/wipqueue4&id=d66a327fb6189db5de8bc489859235dcba306237

? :)

Sadly we never fixed all the issues to let that merge.

Cheers,

Richard


Adrian Bunk
 

On Thu, Mar 19, 2020 at 05:07:17PM +0100, Mike Looijmans wrote:
...
With both parallelization options
to "16", I might end up with 16 compile tasks running 16 compile threads
each, i.e. 256 running processes.
...
This is a bug:
http://bugzilla.yoctoproject.org/show_bug.cgi?id=13306

I sometimes wonder whether something basic like "no more than one
compile task at a time" would be sufficient in practice to avoid
overloading all cores.

It would also help with RAM usage, there are some combinations of
recipes where the build gets aborted by the oom killer on my laptop
(8 cores, 32 GB RAM) when bitbake runs the compile tasks in parallel.

cu
Adrian