SCM usage in source urls and bandwidth


Richard Purdie
 

[list address fixed, sorry]

We've been having bandwidth trouble with downloads.yoctoproject.org so we did
some quick analysis to see what the issue is. Basically in speeding up the
server which was the rate limit, we hit the limits of the hosting pipe. I'd note
a few things:

a) it isn't the sstate mirroring, it is nearly all being used by downloads.

b) 25% of all our bandwidth is going on "git2_sourceware.org.git.binutils-
gdb.git.tar.gz" - i.e. downloading the source mirror binutils tarball

c) 15% is on git2_sourceware.org.git.glibc.git.tar.gz i.e. glibc

d) OE-Core has downloads.yoctoproject.org as a MIRROR

e) poky has it as a PREMIRROR

What are our options? As far as I can see we could:

a) increase the pipe from downloads.yoctoproject.org but that does come at a
non-trivial cost to the project.

b) Seek help with hosting some of the larger mirror tarballs from people better
able to host them and have that as a first premirror?

c) Switch the binutils and glibc recipes to tarballs and patches. I know Khem
finds this less convenient and they keep moving back and forward but we keep
running into this issue and having to switch back from git.

d) To soften the blow of c) we could add devupstream support to the recipes? We
could script updating the recipe to add the patches?

e) We could drop the PREMIRRORS from poky. This would stop the SCM targets from
hitting our mirrors first. That does transfer load to the upstream project SCMs
though and I'm not sure that will be appreciated. I did sent that patch, I'm not
sure about it though.

We are going to need to do *something* though as the current situation can't
continue. I'm open to other ideas...

Cheers,

Richard


Richard Purdie
 

On Wed, 2022-03-30 at 11:42 +0100, Richard Purdie via lists.yoctoproject.org
wrote:
What are our options? As far as I can see we could:

a) increase the pipe from downloads.yoctoproject.org but that does come at a
non-trivial cost to the project.

b) Seek help with hosting some of the larger mirror tarballs from people better
able to host them and have that as a first premirror?

c) Switch the binutils and glibc recipes to tarballs and patches. I know Khem
finds this less convenient and they keep moving back and forward but we keep
running into this issue and having to switch back from git.

d) To soften the blow of c) we could add devupstream support to the recipes? We
could script updating the recipe to add the patches?

e) We could drop the PREMIRRORS from poky. This would stop the SCM targets from
hitting our mirrors first. That does transfer load to the upstream project SCMs
though and I'm not sure that will be appreciated. I did sent that patch, I'm not
sure about it though.
I meant to add:

f) Switch the problematic recipes to use shallow clones with something like:

BB_GIT_SHALLOW:pn-binutils = "1"
BB_GIT_SHALLOW:pn-binutils-cross-${TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-canadian-${TRANSLATED_TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-testsuite = "1"
BB_GIT_SHALLOW:pn-binutils-crosssdk-${SDK_SYS} = "1"
BB_GIT_SHALLOW:pn-glibc = "1"

The challenge here is that in order to be effective, there needs to be a
PREMIRROR setup with the shallow tarballs on it. This means we couldn't do e)
above and have this have much effect unless we craft some very specific
PREMIRROR entries too.

Cheers,

Richard


Ross Burton <ross@...>
 

On Wed, 30 Mar 2022 at 12:10, Richard Purdie
<richard.purdie@...> wrote:
f) Switch the problematic recipes to use shallow clones with something like:

BB_GIT_SHALLOW:pn-binutils = "1"
BB_GIT_SHALLOW:pn-binutils-cross-${TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-canadian-${TRANSLATED_TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-testsuite = "1"
BB_GIT_SHALLOW:pn-binutils-crosssdk-${SDK_SYS} = "1"
BB_GIT_SHALLOW:pn-glibc = "1"

The challenge here is that in order to be effective, there needs to be a
PREMIRROR setup with the shallow tarballs on it. This means we couldn't do e)
above and have this have much effect unless we craft some very specific
PREMIRROR entries too.
Even without premirrors this is a lot faster for glibc:

$ time git clone git://sourceware.org/git/glibc.git
Cloning into 'glibc'...
remote: Enumerating objects: 6956, done.
remote: Counting objects: 100% (6956/6956), done.
remote: Compressing objects: 100% (2938/2938), done.
remote: Total 670093 (delta 5328), reused 4750 (delta 3932), pack-reused 663137
Receiving objects: 100% (670093/670093), 205.19 MiB | 16.39 MiB/s, done.
Resolving deltas: 100% (573265/573265), done.
Updating files: 100% (19011/19011), done.

real 1m56.255s

$ time git clone git://sourceware.org/git/glibc.git --depth 1
Cloning into 'glibc'...
remote: Enumerating objects: 18809, done.
remote: Counting objects: 100% (18809/18809), done.
remote: Compressing objects: 100% (9704/9704), done.
remote: Total 18809 (delta 8812), reused 12185 (delta 7968), pack-reused 0
Receiving objects: 100% (18809/18809), 41.79 MiB | 11.96 MiB/s, done.
Resolving deltas: 100% (8812/8812), done.
Updating files: 100% (19011/19011), done.

real 0m8.701s

A full clone fetches 200MB and takes 2 minutes (a lot of that is
actually resolving the deltas, not the fetch). A shallow clone of the
current HEAD fetches 40MB and is done in 8 seconds.

Why would we need a premirror?

Ross


Richard Purdie
 

On Wed, 2022-03-30 at 12:18 +0100, Ross Burton wrote:
On Wed, 30 Mar 2022 at 12:10, Richard Purdie
<richard.purdie@...> wrote:
f) Switch the problematic recipes to use shallow clones with something like:

BB_GIT_SHALLOW:pn-binutils = "1"
BB_GIT_SHALLOW:pn-binutils-cross-${TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-canadian-${TRANSLATED_TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-testsuite = "1"
BB_GIT_SHALLOW:pn-binutils-crosssdk-${SDK_SYS} = "1"
BB_GIT_SHALLOW:pn-glibc = "1"

The challenge here is that in order to be effective, there needs to be a
PREMIRROR setup with the shallow tarballs on it. This means we couldn't do e)
above and have this have much effect unless we craft some very specific
PREMIRROR entries too.
Even without premirrors this is a lot faster for glibc:

$ time git clone git://sourceware.org/git/glibc.git
Cloning into 'glibc'...
remote: Enumerating objects: 6956, done.
remote: Counting objects: 100% (6956/6956), done.
remote: Compressing objects: 100% (2938/2938), done.
remote: Total 670093 (delta 5328), reused 4750 (delta 3932), pack-reused 663137
Receiving objects: 100% (670093/670093), 205.19 MiB | 16.39 MiB/s, done.
Resolving deltas: 100% (573265/573265), done.
Updating files: 100% (19011/19011), done.

real 1m56.255s

$ time git clone git://sourceware.org/git/glibc.git --depth 1
Cloning into 'glibc'...
remote: Enumerating objects: 18809, done.
remote: Counting objects: 100% (18809/18809), done.
remote: Compressing objects: 100% (9704/9704), done.
remote: Total 18809 (delta 8812), reused 12185 (delta 7968), pack-reused 0
Receiving objects: 100% (18809/18809), 41.79 MiB | 11.96 MiB/s, done.
Resolving deltas: 100% (8812/8812), done.
Updating files: 100% (19011/19011), done.

real 0m8.701s

A full clone fetches 200MB and takes 2 minutes (a lot of that is
actually resolving the deltas, not the fetch). A shallow clone of the
current HEAD fetches 40MB and is done in 8 seconds.

Why would we need a premirror?
The code doesn't do "--depth=1".

https://git.yoctoproject.org/poky/commit/?id=27d56982c7ba05e86a100b0cca2411ee5ac7a85e

"""
This implements support for shallow mirror tarballs, not shallow clones.
Supporting shallow clones directly is not really doable for us, as we'd need
to hardcode the depth between branch HEAD and the SRCREV, and that depth would
change as the branch is updated.
"""

Put another way, you didn't specify a revision in your clone above and if you
try, it becomes rather tricky.

To make this work we therefore need a mirror with the shallow tarballs on it.

Just for info, the binutils mirror tarball is ~1.3GB, the shallow tarball is
65MB.

Cheers,

Richard


Alexandre Belloni
 

On 30/03/2022 11:42:46+0100, Richard Purdie wrote:
[list address fixed, sorry]

We've been having bandwidth trouble with downloads.yoctoproject.org so we did
some quick analysis to see what the issue is. Basically in speeding up the
server which was the rate limit, we hit the limits of the hosting pipe. I'd note
a few things:

a) it isn't the sstate mirroring, it is nearly all being used by downloads.

b) 25% of all our bandwidth is going on "git2_sourceware.org.git.binutils-
gdb.git.tar.gz" - i.e. downloading the source mirror binutils tarball

c) 15% is on git2_sourceware.org.git.glibc.git.tar.gz i.e. glibc

d) OE-Core has downloads.yoctoproject.org as a MIRROR

e) poky has it as a PREMIRROR

What are our options? As far as I can see we could:

a) increase the pipe from downloads.yoctoproject.org but that does come at a
non-trivial cost to the project.

b) Seek help with hosting some of the larger mirror tarballs from people better
able to host them and have that as a first premirror?

c) Switch the binutils and glibc recipes to tarballs and patches. I know Khem
finds this less convenient and they keep moving back and forward but we keep
running into this issue and having to switch back from git.

d) To soften the blow of c) we could add devupstream support to the recipes? We
could script updating the recipe to add the patches?

e) We could drop the PREMIRRORS from poky. This would stop the SCM targets from
hitting our mirrors first. That does transfer load to the upstream project SCMs
though and I'm not sure that will be appreciated. I did sent that patch, I'm not
sure about it though.
I would simply drop PREMIRRORS, this is actually a privacy concern for
some of our customers that didn't realize they are leaking the names of
their internal git repositories to downloads.yoctoproject.org.



--
Alexandre Belloni, co-owner and COO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Claude Bing
 

On 3/30/22 09:53, Alexandre Belloni via lists.yoctoproject.org wrote:
On 30/03/2022 11:42:46+0100, Richard Purdie wrote:
[list address fixed, sorry]

We've been having bandwidth trouble with downloads.yoctoproject.org so we did
some quick analysis to see what the issue is. Basically in speeding up the
server which was the rate limit, we hit the limits of the hosting pipe. I'd note
a few things:

a) it isn't the sstate mirroring, it is nearly all being used by downloads.

b) 25% of all our bandwidth is going on "git2_sourceware.org.git.binutils-
gdb.git.tar.gz" - i.e. downloading the source mirror binutils tarball

c) 15% is on git2_sourceware.org.git.glibc.git.tar.gz i.e. glibc

d) OE-Core has downloads.yoctoproject.org as a MIRROR

e) poky has it as a PREMIRROR

What are our options? As far as I can see we could:

a) increase the pipe from downloads.yoctoproject.org but that does come at a
non-trivial cost to the project.

b) Seek help with hosting some of the larger mirror tarballs from people better
able to host them and have that as a first premirror?

c) Switch the binutils and glibc recipes to tarballs and patches. I know Khem
finds this less convenient and they keep moving back and forward but we keep
running into this issue and having to switch back from git.

d) To soften the blow of c) we could add devupstream support to the recipes? We
could script updating the recipe to add the patches?

e) We could drop the PREMIRRORS from poky. This would stop the SCM targets from
hitting our mirrors first. That does transfer load to the upstream project SCMs
though and I'm not sure that will be appreciated. I did sent that patch, I'm not
sure about it though.
I would simply drop PREMIRRORS, this is actually a privacy concern for
some of our customers that didn't realize they are leaking the names of
their internal git repositories to downloads.yoctoproject.org.
Indeed, that would be concerning for us as well. Would it be possible to
ignore PREMIRRORS based on the recipe layer? Alternatively, we could
create blocklists for heavy packages that need to fetch from upstream
first rather than drop PREMIRRORS completely. Sometimes, having a
secondary source could save valuable time when the upstream is not
responsive.








Richard Purdie
 

On Wed, 2022-03-30 at 10:05 -0400, Claude Bing wrote:
On 3/30/22 09:53, Alexandre Belloni via lists.yoctoproject.org wrote:
On 30/03/2022 11:42:46+0100, Richard Purdie wrote:
[list address fixed, sorry]

We've been having bandwidth trouble with downloads.yoctoproject.org so we did
some quick analysis to see what the issue is. Basically in speeding up the
server which was the rate limit, we hit the limits of the hosting pipe. I'd note
a few things:

a) it isn't the sstate mirroring, it is nearly all being used by downloads.

b) 25% of all our bandwidth is going on "git2_sourceware.org.git.binutils-
gdb.git.tar.gz" - i.e. downloading the source mirror binutils tarball

c) 15% is on git2_sourceware.org.git.glibc.git.tar.gz i.e. glibc

d) OE-Core has downloads.yoctoproject.org as a MIRROR

e) poky has it as a PREMIRROR

What are our options? As far as I can see we could:

a) increase the pipe from downloads.yoctoproject.org but that does come at a
non-trivial cost to the project.

b) Seek help with hosting some of the larger mirror tarballs from people better
able to host them and have that as a first premirror?

c) Switch the binutils and glibc recipes to tarballs and patches. I know Khem
finds this less convenient and they keep moving back and forward but we keep
running into this issue and having to switch back from git.

d) To soften the blow of c) we could add devupstream support to the recipes? We
could script updating the recipe to add the patches?

e) We could drop the PREMIRRORS from poky. This would stop the SCM targets from
hitting our mirrors first. That does transfer load to the upstream project SCMs
though and I'm not sure that will be appreciated. I did sent that patch, I'm not
sure about it though.
I would simply drop PREMIRRORS, this is actually a privacy concern for
some of our customers that didn't realize they are leaking the names of
their internal git repositories to downloads.yoctoproject.org.
Indeed, that would be concerning for us as well. Would it be possible to
ignore PREMIRRORS based on the recipe layer? Alternatively, we could
create blocklists for heavy packages that need to fetch from upstream
first rather than drop PREMIRRORS completely. Sometimes, having a
secondary source could save valuable time when the upstream is not
responsive.
We don't have any support for "per-layer" overrides at this time which would be
the way to do that. It is something I think we probably do want to consider
adding but I haven't had the bandwidth to look at it.

I'd note that these mirrors in PREMIRRORS are also in MIRRORS already in OE-Core
so there is a fallback, it just controls the order they're tried in.

Cheers,

Richard


Khem Raj
 

On Wed, Mar 30, 2022 at 4:29 AM Richard Purdie
<richard.purdie@...> wrote:

On Wed, 2022-03-30 at 12:18 +0100, Ross Burton wrote:
On Wed, 30 Mar 2022 at 12:10, Richard Purdie
<richard.purdie@...> wrote:
f) Switch the problematic recipes to use shallow clones with something like:

BB_GIT_SHALLOW:pn-binutils = "1"
BB_GIT_SHALLOW:pn-binutils-cross-${TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-canadian-${TRANSLATED_TARGET_ARCH} = "1"
BB_GIT_SHALLOW:pn-binutils-cross-testsuite = "1"
BB_GIT_SHALLOW:pn-binutils-crosssdk-${SDK_SYS} = "1"
BB_GIT_SHALLOW:pn-glibc = "1"

The challenge here is that in order to be effective, there needs to be a
PREMIRROR setup with the shallow tarballs on it. This means we couldn't do e)
above and have this have much effect unless we craft some very specific
PREMIRROR entries too.
Even without premirrors this is a lot faster for glibc:

$ time git clone git://sourceware.org/git/glibc.git
Cloning into 'glibc'...
remote: Enumerating objects: 6956, done.
remote: Counting objects: 100% (6956/6956), done.
remote: Compressing objects: 100% (2938/2938), done.
remote: Total 670093 (delta 5328), reused 4750 (delta 3932), pack-reused 663137
Receiving objects: 100% (670093/670093), 205.19 MiB | 16.39 MiB/s, done.
Resolving deltas: 100% (573265/573265), done.
Updating files: 100% (19011/19011), done.

real 1m56.255s

$ time git clone git://sourceware.org/git/glibc.git --depth 1
Cloning into 'glibc'...
remote: Enumerating objects: 18809, done.
remote: Counting objects: 100% (18809/18809), done.
remote: Compressing objects: 100% (9704/9704), done.
remote: Total 18809 (delta 8812), reused 12185 (delta 7968), pack-reused 0
Receiving objects: 100% (18809/18809), 41.79 MiB | 11.96 MiB/s, done.
Resolving deltas: 100% (8812/8812), done.
Updating files: 100% (19011/19011), done.

real 0m8.701s

A full clone fetches 200MB and takes 2 minutes (a lot of that is
actually resolving the deltas, not the fetch). A shallow clone of the
current HEAD fetches 40MB and is done in 8 seconds.

Why would we need a premirror?
The code doesn't do "--depth=1".

https://git.yoctoproject.org/poky/commit/?id=27d56982c7ba05e86a100b0cca2411ee5ac7a85e

"""
This implements support for shallow mirror tarballs, not shallow clones.
Supporting shallow clones directly is not really doable for us, as we'd need
to hardcode the depth between branch HEAD and the SRCREV, and that depth would
change as the branch is updated.
"""

Put another way, you didn't specify a revision in your clone above and if you
try, it becomes rather tricky.

To make this work we therefore need a mirror with the shallow tarballs on it.

Just for info, the binutils mirror tarball is ~1.3GB, the shallow tarball is
65MB.
right, I think shallow clone should be default IMO for all git fetcher tarballs


Cheers,

Richard


kergoth@...
 



On Wed, Mar 30, 2022 at 10:24 AM Khem Raj <raj.khem@...> wrote:
On Wed, Mar 30, 2022 at 4:29 AM Richard Purdie
<richard.purdie@...> wrote:
>
> On Wed, 2022-03-30 at 12:18 +0100, Ross Burton wrote:
> > On Wed, 30 Mar 2022 at 12:10, Richard Purdie
> > <richard.purdie@...> wrote:
> > > f) Switch the problematic recipes to use shallow clones with something like:
> > >
> > > BB_GIT_SHALLOW:pn-binutils = "1"
> > > BB_GIT_SHALLOW:pn-binutils-cross-${TARGET_ARCH} = "1"
> > > BB_GIT_SHALLOW:pn-binutils-cross-canadian-${TRANSLATED_TARGET_ARCH} = "1"
> > > BB_GIT_SHALLOW:pn-binutils-cross-testsuite = "1"
> > > BB_GIT_SHALLOW:pn-binutils-crosssdk-${SDK_SYS} = "1"
> > > BB_GIT_SHALLOW:pn-glibc = "1"
> > >
> > > The challenge here is that in order to be effective, there needs to be a
> > > PREMIRROR setup with the shallow tarballs on it. This means we couldn't do e)
> > > above and have this have much effect unless we craft some very specific
> > > PREMIRROR entries too.
> >
> > Even without premirrors this is a lot faster for glibc:
> >
> > $ time git clone git://sourceware.org/git/glibc.git
> > Cloning into 'glibc'...
> > remote: Enumerating objects: 6956, done.
> > remote: Counting objects: 100% (6956/6956), done.
> > remote: Compressing objects: 100% (2938/2938), done.
> > remote: Total 670093 (delta 5328), reused 4750 (delta 3932), pack-reused 663137
> > Receiving objects: 100% (670093/670093), 205.19 MiB | 16.39 MiB/s, done.
> > Resolving deltas: 100% (573265/573265), done.
> > Updating files: 100% (19011/19011), done.
> >
> > real 1m56.255s
> >
> > $ time git clone git://sourceware.org/git/glibc.git --depth 1
> > Cloning into 'glibc'...
> > remote: Enumerating objects: 18809, done.
> > remote: Counting objects: 100% (18809/18809), done.
> > remote: Compressing objects: 100% (9704/9704), done.
> > remote: Total 18809 (delta 8812), reused 12185 (delta 7968), pack-reused 0
> > Receiving objects: 100% (18809/18809), 41.79 MiB | 11.96 MiB/s, done.
> > Resolving deltas: 100% (8812/8812), done.
> > Updating files: 100% (19011/19011), done.
> >
> > real 0m8.701s
> >
> > A full clone fetches 200MB and takes 2 minutes (a lot of that is
> > actually resolving the deltas, not the fetch).  A shallow clone of the
> > current HEAD fetches 40MB and is done in 8 seconds.
> >
> > Why would we need a premirror?
>
> The code doesn't do "--depth=1".
>
> https://git.yoctoproject.org/poky/commit/?id=27d56982c7ba05e86a100b0cca2411ee5ac7a85e
>
> """
> This implements support for shallow mirror tarballs, not shallow clones.
> Supporting shallow clones directly is not really doable for us, as we'd need
> to hardcode the depth between branch HEAD and the SRCREV, and that depth would
> change as the branch is updated.
> """
>
> Put another way, you didn't specify a revision in your clone above and if you
> try, it becomes rather tricky.
>
> To make this work we therefore need a mirror with the shallow tarballs on it.
>
> Just for info, the binutils mirror tarball is ~1.3GB, the shallow tarball is
> 65MB.

right, I think shallow clone should be default IMO for all git fetcher tarballs

We've been using shallow git tarballs for all recipes for years at Mentor, definitely speeds up fetches from local mirrors and reduces how much we need to ship to customers to allow them to use BB_NO_NETWORK out of the box. 
--
Christopher Larson
chris_larson@..., chris.larson@..., kergoth@...
Principal Software Engineer, Embedded Linux Solutions, Siemens Digital Industries Software