Creating a build system which can scale. #yocto


philip.lewis@...
 

Hi,

I'm looking for some advice about the best way to implement a build environment in the cloud for multiple dev teams which will scale as the number of dev teams grow.

Our devs are saying:

What do we want?

To scale our server-based build infrastructure, so that engineers can build branches using the same infrastructure that produces a releasable artefact, before pushing it into develop. As much automation of this as possible is desired..

Blocker: Can’t just scale current system – can’t keep throwing more hardware at it, particularly storage. The main contributor to storage requirements is using a local cache in each build workspace and there will be one workspace for each branch, per Jenkins agent: 3 teams x 10 branches per team x 70Gb per branch/workspace x number of build agents (let say 5) = 10Tb. As you can see this doesn’t scale well as we add branches, teams or build agents. Most of this 10Tb is the caches in each workspace, where most of the contents of each individual cache is identical.

A possible solution:

Disclaimer/admission: I’ve not really researched/considered _all_  possible solutions to the problem above, I just started searching and reading and came up with/felt led towards  this. I think there is some value in spending some of the meeting exploring other options to see if anything sounds better (for a what definition of better?).

 

Something using the built-in cache mirror in Yocto–there are a few ways it can do this, as it’s essentially a file share somewhere. https://pelux.io/2017/06/19/How-to-create-a-shared-sstate-dir.html for an example shows how to share it via NFS, but you can also use http or ftp.

Having a single cache largely solves the storage issue as there is only one cache, so having solved that issue, it introduces a few more questions and constraints:

  1. How do we manage the size of the cache?

There’s no built-in expiry mechanism I could find. This means we’d probably have to create something ourselves (parse access logs from the server hosting the cache and apply a garbage collector process).

  1. How/When do we update the cache?

All environments contributing to the cache need to be identical (that ansible playbook just grabs the latest of everything) to avoid subtle differences in the build artefacts depending on which environment populated the cache.

  1. How much time will fetching the cache from a remote server add to the build?

I think this is probably something we will have to just live with, but if it’s all in the cloud the network speed between VMs is fast.

This shared cache solution removes the per agent cost on storage, and also – a varying extent – the per branch costs (assuming that you’re not working on something at the top/start beginning of the dependency tree)  from the equation above.

 

I’d love to see some other ideas as well as I worry I’m missing something easier or more obvious – better.

Any thoughts?
Thanks
Phill

 

 


Quentin Schulz
 

Hi Philip,

*Very* quick and vague answer as it's not something I'm doing right now.
I can only give hints to where to look next.

On Mon, Feb 17, 2020 at 04:27:17AM -0800, philip.lewis@domino-uk.com wrote:
Hi,

I'm looking for some advice about the best way to implement a build environment in the cloud for multiple dev teams which will scale as the number of dev teams grow.

Our devs are saying:

*What do we want?*

To scale our server-based build infrastructure, so that engineers can build branches using the same infrastructure that produces a releasable artefact, before pushing it into develop. As much automation of this as possible is desired..

*Blocker* : Can’t just scale current system – can’t keep throwing more hardware at it, particularly storage. The main contributor to storage requirements is using a local cache in each build workspace and there will be one workspace for each branch, per Jenkins agent: 3 teams x 10 branches per team x 70Gb per branch/workspace x number of build agents (let say 5) = 10Tb. As you can see this doesn’t scale well as we add branches, teams or build agents. Most of this 10Tb is the caches in each workspace, where most of the contents of each individual cache is identical.
Have you had a look at INHERIT += "rm_work"? Should get rid of most of
the space in the work directory (we use this one, tremendous benefit in
terms of storage space).

c.f. https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#ref-classes-rm-work

Incidently, also highlights broken recipes (e.g. one getting files from other
sysroots/elsewhere in the FS).

*A possible solution:
*

Disclaimer/admission: I’ve not really researched/considered _ all _  possible solutions to the problem above, I just started searching and reading and came up with/felt led towards  this. I think there is some value in spending some of the meeting exploring other options to see if anything sounds better (for a what definition of better?).

**

Something using the built-in cache mirror in Yocto–there are a few ways it can do this, as it’s essentially a file share somewhere. https://pelux.io/2017/06/19/How-to-create-a-shared-sstate-dir.html for an example shows how to share it via NFS, but you can also use http or ftp.

Having a single cache largely solves the storage issue as there is only one cache, so having solved that issue, it introduces a few more questions and constraints:

* How do we manage the size of the cache?

There’s no built-in expiry mechanism I could find. This means we’d probably have to create something ourselves (parse access logs from the server hosting the cache and apply a garbage collector process).
Provided you're not using a webserver with a cache (or a cache that is
refreshed every now and then), cronjob with find -atime -delete and
you're good.

* How/When do we update the cache?

All environments contributing to the cache need to be identical (that ansible playbook just grabs the latest of everything) to avoid subtle differences in the build artefacts depending on which environment populated the cache.

* How much time will fetching the cache from a remote server add to the build?

I think this is probably something we will have to just live with, but if it’s all in the cloud the network speed between VMs is fast.
I remember (wrongly?) reading that sharing sstate-cache over NFS isn't a very
good idea (latency outweights the benefits in terms of storage/shared
sstate cache).

This shared cache solution removes the per agent cost on storage, and also – a varying extent – the per branch costs (assuming that you’re not working on something at the top/start beginning of the dependency tree)  from the equation above.

I’d love to see some other ideas as well as I worry I’m missing something easier or more obvious – better.
I'm not too sure to have understood the exact use case but maybe you
would want to have a look at:

- shared DL_DIR (this one can be served by an NFS, there isn't too much
access to it during a build).
- SSTATE_MIRRORS (c.f. https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#var-SSTATE_MIRRORS),
is basically a webserver serving the sstate-cache from an already-built
image/system. This is RO, would make sense if your Jenkins is building
a system and then your devs are basing their work on top of it. They
would get the sstate-cache from your Jenkins and AFAIK, does not
duplicate the sstate-cache locally = more free storage space.
- investigate docker containers for guaranteed identical build environment,
Pyrex has been often suggested on IRC.
https://github.com/garmin/pyrex/

That's all I could think of about your issue, I unfortunately do not
have more knowledge to share on that topic.

Good luck, let us know what you decided to do :)

Quentin


Robert P. J. Day
 

On Mon, 17 Feb 2020, Quentin Schulz wrote:

Hi Philip,

*Very* quick and vague answer as it's not something I'm doing right now.
I can only give hints to where to look next.

On Mon, Feb 17, 2020 at 04:27:17AM -0800, philip.lewis@domino-uk.com wrote:
Hi,

I'm looking for some advice about the best way to implement a
build environment in the cloud for multiple dev teams which will
scale as the number of dev teams grow.

Our devs are saying:

*What do we want?*

To scale our server-based build infrastructure, so that engineers
can build branches using the same infrastructure that produces a
releasable artefact, before pushing it into develop. As much
automation of this as possible is desired..

*Blocker* : Can’t just scale current system – can’t keep throwing
more hardware at it, particularly storage. The main contributor to
storage requirements is using a local cache in each build
workspace and there will be one workspace for each branch, per
Jenkins agent: 3 teams x 10 branches per team x 70Gb per
branch/workspace x number of build agents (let say 5) = 10Tb. As
you can see this doesn’t scale well as we add branches, teams or
build agents. Most of this 10Tb is the caches in each workspace,
where most of the contents of each individual cache is identical.
Have you had a look at INHERIT += "rm_work"? Should get rid of most of
the space in the work directory (we use this one, tremendous benefit in
terms of storage space).

c.f.
https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#ref-classes-rm-work
in addition, you can always override that build-wide setting with
RM_WORK_EXCLUDE if you want to keep generated work from a small set of
recipes for debugging.

rday


Thomas Goodwin
 

Since Docker was mentioned, I use the community's CROPS containers via Docker in GitLab CI on a shared build server, providing the builders' downloads and sstate caches to the team to accelerate their own builds (these paths are volume-mounted to the runners).  One of the caveats to this approach is that if you use the containers in a shared build host, you should limit the individual builder's bitbake environment in terms of parallelization (PARALLEL_MAKE and the like).  This will prevent a single containers from causing one another to fail by not sharing effectively (yes, you can set GitLab docker runner limits but those limits are invisible to the container).  The good news is that these variables are in the white list, so you do not have to set them in a conf file; exporting them in the build environment is enough, meaning your build runner can be tuned according to that build host executing the runner.

I based my tuning on this person work, https://elinux.org/images/d/d4/Goulart.pdf, a presentation from a few years back at an ELC event.  It contains a significant amount of information about project flow and other things that you might also find interesting.

Cheers,

Thomas

On Mon, Feb 17, 2020 at 7:52 AM rpjday@... <rpjday@...> wrote:
On Mon, 17 Feb 2020, Quentin Schulz wrote:

> Hi Philip,
>
> *Very* quick and vague answer as it's not something I'm doing right now.
> I can only give hints to where to look next.
>
> On Mon, Feb 17, 2020 at 04:27:17AM -0800, philip.lewis@... wrote:
> > Hi,
> >
> > I'm looking for some advice about the best way to implement a
> > build environment in the cloud for multiple dev teams which will
> > scale as the number of dev teams grow.
> >
> > Our devs are saying:
> >
> > *What do we want?*
> >
> > To scale our server-based build infrastructure, so that engineers
> > can build branches using the same infrastructure that produces a
> > releasable artefact, before pushing it into develop. As much
> > automation of this as possible is desired..
> >
> > *Blocker* : Can’t just scale current system – can’t keep throwing
> > more hardware at it, particularly storage. The main contributor to
> > storage requirements is using a local cache in each build
> > workspace and there will be one workspace for each branch, per
> > Jenkins agent: 3 teams x 10 branches per team x 70Gb per
> > branch/workspace x number of build agents (let say 5) = 10Tb. As
> > you can see this doesn’t scale well as we add branches, teams or
> > build agents. Most of this 10Tb is the caches in each workspace,
> > where most of the contents of each individual cache is identical.
> >
>
> Have you had a look at INHERIT += "rm_work"? Should get rid of most of
> the space in the work directory (we use this one, tremendous benefit in
> terms of storage space).
>
> c.f.
> https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#ref-classes-rm-work

  in addition, you can always override that build-wide setting with
RM_WORK_EXCLUDE if you want to keep generated work from a small set of
recipes for debugging.

rday


Rudolf J Streif
 

Hi Philip,

We have done this with many Yocto Project builds using AWS EC2, Docker, Gitlab and Artifactory.

Rest inlined below.

:rjs

On 2/17/20 4:27 AM, philip.lewis@... wrote:
Hi,

I'm looking for some advice about the best way to implement a build environment in the cloud for multiple dev teams which will scale as the number of dev teams grow.

Our devs are saying:

What do we want?

To scale our server-based build infrastructure, so that engineers can build branches using the same infrastructure that produces a releasable artefact, before pushing it into develop. As much automation of this as possible is desired..

It can be configured that any check in to branches can trigger a build. That is what we do with developers on their own branches as well as with the master branches. The master branch is the integration branch. Then there are release and development branches but they all use the same build environment.


Blocker: Can’t just scale current system – can’t keep throwing more hardware at it, particularly storage. The main contributor to storage requirements is using a local cache in each build workspace and there will be one workspace for each branch, per Jenkins agent: 3 teams x 10 branches per team x 70Gb per branch/workspace x number of build agents (let say 5) = 10Tb. As you can see this doesn’t scale well as we add branches, teams or build agents. Most of this 10Tb is the caches in each workspace, where most of the contents of each individual cache is identical.

A possible solution:

Disclaimer/admission: I’ve not really researched/considered _all_  possible solutions to the problem above, I just started searching and reading and came up with/felt led towards  this. I think there is some value in spending some of the meeting exploring other options to see if anything sounds better (for a what definition of better?).


We do this with Gitlab runners and working instances on EC2. Since it can take some time to spin up a new instance we hold a certain amount running during business hours. If more are needed more are spun up transparently. Of course this costs money in particular when large instances with a lot of memory and a lot of vCPUs are used. Instances can automatically be terminated if there is overcapacity. There are other cost control options. Docker images inside the instances provide the controlled build environment.

Something using the built-in cache mirror in Yocto–there are a few ways it can do this, as it’s essentially a file share somewhere. https://pelux.io/2017/06/19/How-to-create-a-shared-sstate-dir.html for an example shows how to share it via NFS, but you can also use http or ftp.

EC2 elastic storage works via NFS (pretty straight forward). Artifactory can be used too.

Having a single cache largely solves the storage issue as there is only one cache, so having solved that issue, it introduces a few more questions and constraints:

  1. How do we manage the size of the cache?

There’s no built-in expiry mechanism I could find. This means we’d probably have to create something ourselves (parse access logs from the server hosting the cache and apply a garbage collector process).

You have to prune it yourself. Typically based on age and when the development moves to a new release of YP.
  1. How/When do we update the cache?

All environments contributing to the cache need to be identical (that ansible playbook just grabs the latest of everything) to avoid subtle differences in the build artefacts depending on which environment populated the cache.

We do this with the release builds only.
  1. How much time will fetching the cache from a remote server add to the build?

I think this is probably something we will have to just live with, but if it’s all in the cloud the network speed between VMs is fast.

There is no generic answer to this. It depends on the storage and of course the networking infrastructure.

This shared cache solution removes the per agent cost on storage, and also – a varying extent – the per branch costs (assuming that you’re not working on something at the top/start beginning of the dependency tree)  from the equation above.

 

Yes, that is the idea. Since the builds are running inside a Docker instance there is a local cache but it will be discarded when the container is discarded and the VM is spun down. Cache misses require additional time but that's the nature of it.

I’d love to see some other ideas as well as I worry I’m missing something easier or more obvious – better.

Any thoughts?
Thanks
Phill

 

 

:rjs


    
-- 
-----
Rudolf J Streif
CEO/CTO ibeeto
+1.855.442.3386 x700


Richard Purdie
 

On Mon, 2020-02-17 at 04:27 -0800, philip.lewis@domino-uk.com wrote:
Something using the built-in cache mirror in Yocto–there are a few
ways it can do this, as it’s essentially a file share somewhere.
https://pelux.io/2017/06/19/How-to-create-a-shared-sstate-dir.html
for an example shows how to share it via NFS, but you can also use http or ftp.
Sharing sstate between the workers is the obvious win, as is rm_work to
reduce individual build sizes.

Having a single cache largely solves the storage issue as there is
only one cache, so having solved that issue, it introduces a few more
questions and constraints:

How do we manage the size of the cache?
There’s no built-in expiry mechanism I could find. This means we’d
probably have to create something ourselves (parse access logs from
the server hosting the cache and apply a garbage collector process).
The system is setup to "touch" files it uses if it has write access so
you can tell which artefacts are being used.

How/When do we update the cache?
All environments contributing to the cache need to be identical (that
ansible playbook just grabs the latest of everything) to avoid subtle
differences in the build artefacts depending on which environment
populated the cache.
All environments contributing to the cache don't have to identical, we
aim to build reproducible binaries regardless of the host OS.

Obviously you reduce risk by doing so but I just wanted to be clear
that we have protection in place for this and sstate does support it.

How much time will fetching the cache from a remote server add to the
build?
Mostly depends on your interconnecting network speed.

Some mentioned NFS, we do support NFS for sstate, our autobuilders make
extensive use of that.

Cheers,

Richard


Mikko Rapeli
 

Hi,

Good pointers in this thread already. Here are mine:

* Share sstate mirror and download cache from release builds to
developer topic builds. NFS, web server or rsync before calling bitbake
will work.

* I've added buildhistory and prserv database as extra files to sstate mirror
and use that to initiate new developer topic and release builds. This way
we don't add prserv or buildhistory git trees to critical path in builds
but get the benefits of QA checks, binary package versions, full history etc.

* Don't use virtual machines or clouds to build. Bare metal throw away machines
are much faster and more reliable. We've broken all clouds.

* Use rm_work to reduce disk space usage during builds.

* Tune build machines to bind things into memory and to not flush things to disk
all the time since bitbake tmp, images etc are anyways going to be tar'ed
as build output. If they fit to page cache in RAM, you can avoid a lot of IO and
save disks/ssd. Linux kernel vm tuning does this:

$ cat /etc/sysctl.d/99-build_server_fs_ops_to_memory.conf
# fs cache can use 90% of memory before system starts io to disk,
# keep as much as possible in RAM
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 90
# keep stuff for 12h in memory before writing to disk,
# allows reusing data as much as possible between builds
vm.dirty_expire_centisecs = 4320000
vm.dirtytime_expire_seconds = 432000
# allow single process to use 60% of system RAM for file caches, e.g. image build
vm.dirty_bytes = 0
vm.dirty_ratio = 60
# disable periodic background writes, only write when running out of RAM
vm.dirty_writeback_centisecs = 0

* Finding optimal cost and power combination for build slaves is tricky.
Track CPU, memory, IO and network usage for your project and find out which
one is the bottle neck. For us it was RAM. CPUs are not effectly used by bitbake
builds except when all hell breaks loose with C++ projects and their templates.
Lots of CPU time is wasted when running single threaded bitbake tasks and
creating images. Avoiding IO to disk and caching to RAM helps. I've not seen benefits
of having more than 64 gigs of RAM or more than 32 CPUs (with hyper threading).
Also project evolve over time and suddenly may start eating more RAM and triggering
the kernel OOM killer, shivers..

Hope this helps,

-Mikko