Integrating meta-doubleopen into OE


Peter Kjellerstedt
 

-----Original Message-----
From: licensing@lists.yoctoproject.org <licensing@lists.yoctoproject.org>
On Behalf Of Mikko Murto
Sent: den 2 juni 2021 11:50
To: Richard Purdie <richard.purdie@linuxfoundation.org>; Joshua Watt
<JPEWhacker@gmail.com>; licensing@lists.yoctoproject.org
Subject: VS: [licensing] Integrating meta-doubleopen into OE

Hi,

Given the recent interest in SBOM support for OE-core, I'm going to
start looking at integrating meta-doubleopen as an SBOM solution for
OE.

Sounds great, I'd be happy to assist in any way I can!

It's possibly that meta-doubleopen does more that what would be
considered a minimum viable SBOM solution for OE, but it seems quite
useful and I suspect any extra functionality are things we would
want anyway. I'd be curious for the Mikko (the author) to chime in
and let us know what works well and what doesn't with the layer.
In addition to what I think works well and what doesn't I'll try and
describe some of the things that are saved and conventions we're trying
out here also. If something seems odd, please let me know!

The basic data about packages works decently. Three different types of
packages are saved to the packages field in the SPDX, one package
describing the final image, packages for the recipes built and packages
for the sub-packages of the recipes. These are differentiated with
different SPDXIDs: core-image-minimal's images package is "SPDXRef-Image-
core-image-minimal-qemux86-64...", zlib's recipe is "SPDXRef-Recipe-zlib"
and the sub-package zlib-dev is "SPDXRef-Package-zlib-dev. "SPDXRef" is
required by the SPDX spec and "Image", "Recipe" and "Package" identifies
which of the three the package is. For the recipes we currently save the
declared licensing information, but for the sub-packages we don't save
anything. We could save the information of the recipe for its sub-packages
as well; do you think this information would be accurate?
The licenses of packages default to the same LICENSE as specified for
the recipe. However, it is possible for a recipe to specify different
licenses for the packages by using, e.g., LICENSE_${PN}-dev = "...".
The licenses specified for a package must be a subset of the licenses
specified for the recipe. This is typically done when a recipe produces
a library that is, e.g., using the LGPL-2.1 license, while the main
application may be GPL-3.0. In that case the recipe's LICENSE would
be "GPL-3.0 & LGPL-2.1", while the lib package would use
LICENSE_lib${PN} = "LGPL-2.1". It could even be that it is only the
lib that uses LGPL-2.1, in which case the main package might have a
LICENSE_${PN} = "GPL-3.0".

//Peter

These packages are linked to each other with SPDX relationships. Each
recipe is related to its sub-packages such as "SPDXRef-Recipe-zlib
GENERATES SPDXRef-Package-zlib-dev". The image and sub-packages are linked
with relationships like "SPDXRef-Package-zlib-dev PACKAGE_OF SPDXRef-
Image-core-image-minimal", which are extracted from the IMAGE_MANIFEST.
This all works decently well, I think. One thing that could possibly be
useful additional information here would be some sort of dependency
information describing that a package is included because some other
package depends on it.

In the files we save two different types of files, files included in the
recipes' source and files packaged with the sub-packages. These are again
differentiated with the id, for example "SPDXRef-SourceFile-zlib-1" being
a file in zlib's source and "SPDXRef-PackagedFile-zlib-dev-1" being a file
packaged with the sub-package zlib-dev. These are also linked to the
packages with relationships such as "SPDXRef-Recipe-zlib CONTAINS SPDXRef-
SourceFile-1" and "SPDXRef-Package-zlib-dev CONTAINS SPDXRef-PackagedFile-
1". This seems to also be in a decent shape, if I've understood everything
correctly.

The next bit of information is the one where I'm maybe the most uncertain.
For the binary files, we run the dwarfsrcfiles-utility to try to determine
the source files used to build those binaries. Then we try to find those
source files and link them to the binaries with relationships like
"SPDXRef-PackagedFile-zlib-dev-1 GENERATED_FROM SPDXRef-SourceFile-zlib-
1". This is done across package borders, so binaries are related to source
files from glib also for example. Locating these source files based on the
information from dwarfscrfiles may have some problems. Not all files are
found. The logic for getting the file information is at
https://github.com/doubleopen-project/meta-
doubleopen/blob/d4e1d9a4e566ba6e74789f8a9d2376dea808eef3/classes/create-
srclist.bbclass#L48-L66 and the not found files are logged at
https://github.com/doubleopen-project/meta-
doubleopen/blob/d4e1d9a4e566ba6e74789f8a9d2376dea808eef3/classes/combine-
spdx.bbclass#L70.

Is this of any help? As said, if I can help in any way, please let me
know. If a call would be easier at some point, I'm available that way as
well.

I'm also worried that if we generate the complex SPDX files that meta-
doubleopen does, we'll have people running away. We may need to
default to something simpler with the option of adding in a lot of the
information as unless you're handing things off to fossology or other
tools, it probably is overkill for most users and if default may
actually put people off?

I agree that some of the data we currently gather may be quite a lot. We
started with that as it's required for the project we're working on, but
some feature gates could perhaps be used to limit what data is collected
and saved. For a lot of projects, just the packages and their declared
licensing data may very well be enough. For the project currently at hand,
detailed file level information is required. Just the packages may be a
sane default though.

Best regards,
Mikko