Re: Creating a build system which can scale. #yocto


Quentin Schulz
 

Hi Philip,

*Very* quick and vague answer as it's not something I'm doing right now.
I can only give hints to where to look next.

On Mon, Feb 17, 2020 at 04:27:17AM -0800, philip.lewis@domino-uk.com wrote:
Hi,

I'm looking for some advice about the best way to implement a build environment in the cloud for multiple dev teams which will scale as the number of dev teams grow.

Our devs are saying:

*What do we want?*

To scale our server-based build infrastructure, so that engineers can build branches using the same infrastructure that produces a releasable artefact, before pushing it into develop. As much automation of this as possible is desired..

*Blocker* : Can’t just scale current system – can’t keep throwing more hardware at it, particularly storage. The main contributor to storage requirements is using a local cache in each build workspace and there will be one workspace for each branch, per Jenkins agent: 3 teams x 10 branches per team x 70Gb per branch/workspace x number of build agents (let say 5) = 10Tb. As you can see this doesn’t scale well as we add branches, teams or build agents. Most of this 10Tb is the caches in each workspace, where most of the contents of each individual cache is identical.
Have you had a look at INHERIT += "rm_work"? Should get rid of most of
the space in the work directory (we use this one, tremendous benefit in
terms of storage space).

c.f. https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#ref-classes-rm-work

Incidently, also highlights broken recipes (e.g. one getting files from other
sysroots/elsewhere in the FS).

*A possible solution:
*

Disclaimer/admission: I’ve not really researched/considered _ all _  possible solutions to the problem above, I just started searching and reading and came up with/felt led towards  this. I think there is some value in spending some of the meeting exploring other options to see if anything sounds better (for a what definition of better?).

**

Something using the built-in cache mirror in Yocto–there are a few ways it can do this, as it’s essentially a file share somewhere. https://pelux.io/2017/06/19/How-to-create-a-shared-sstate-dir.html for an example shows how to share it via NFS, but you can also use http or ftp.

Having a single cache largely solves the storage issue as there is only one cache, so having solved that issue, it introduces a few more questions and constraints:

* How do we manage the size of the cache?

There’s no built-in expiry mechanism I could find. This means we’d probably have to create something ourselves (parse access logs from the server hosting the cache and apply a garbage collector process).
Provided you're not using a webserver with a cache (or a cache that is
refreshed every now and then), cronjob with find -atime -delete and
you're good.

* How/When do we update the cache?

All environments contributing to the cache need to be identical (that ansible playbook just grabs the latest of everything) to avoid subtle differences in the build artefacts depending on which environment populated the cache.

* How much time will fetching the cache from a remote server add to the build?

I think this is probably something we will have to just live with, but if it’s all in the cloud the network speed between VMs is fast.
I remember (wrongly?) reading that sharing sstate-cache over NFS isn't a very
good idea (latency outweights the benefits in terms of storage/shared
sstate cache).

This shared cache solution removes the per agent cost on storage, and also – a varying extent – the per branch costs (assuming that you’re not working on something at the top/start beginning of the dependency tree)  from the equation above.

I’d love to see some other ideas as well as I worry I’m missing something easier or more obvious – better.
I'm not too sure to have understood the exact use case but maybe you
would want to have a look at:

- shared DL_DIR (this one can be served by an NFS, there isn't too much
access to it during a build).
- SSTATE_MIRRORS (c.f. https://www.yoctoproject.org/docs/current/mega-manual/mega-manual.html#var-SSTATE_MIRRORS),
is basically a webserver serving the sstate-cache from an already-built
image/system. This is RO, would make sense if your Jenkins is building
a system and then your devs are basing their work on top of it. They
would get the sstate-cache from your Jenkins and AFAIK, does not
duplicate the sstate-cache locally = more free storage space.
- investigate docker containers for guaranteed identical build environment,
Pyrex has been often suggested on IRC.
https://github.com/garmin/pyrex/

That's all I could think of about your issue, I unfortunately do not
have more knowledge to share on that topic.

Good luck, let us know what you decided to do :)

Quentin

Join yocto@lists.yoctoproject.org to automatically receive all group messages.