VS: [licensing] Integrating meta-doubleopen into OE

Mikko Murto


Given the recent interest in SBOM support for OE-core, I'm going to
start looking at integrating meta-doubleopen as an SBOM solution for OE.
Sounds great, I'd be happy to assist in any way I can!

It's possibly that meta-doubleopen does more that what would be
considered a minimum viable SBOM solution for OE, but it seems quite
useful and I suspect any extra functionality are things we would
want anyway. I'd be curious for the Mikko (the author) to chime in
and let us know what works well and what doesn't with the layer.
In addition to what I think works well and what doesn't I'll try and describe some of the things that are saved and conventions we're trying out here also. If something seems odd, please let me know!

The basic data about packages works decently. Three different types of packages are saved to the packages field in the SPDX, one package describing the final image, packages for the recipes built and packages for the sub-packages of the recipes. These are differentiated with different SPDXIDs: core-image-minimal's images package is "SPDXRef-Image-core-image-minimal-qemux86-64...", zlib's recipe is "SPDXRef-Recipe-zlib" and the sub-package zlib-dev is "SPDXRef-Package-zlib-dev. "SPDXRef" is required by the SPDX spec and "Image", "Recipe" and "Package" identifies which of the three the package is. For the recipes we currently save the declared licensing information, but for the sub-packages we don't save anything. We could save the information of the recipe for its sub-packages as well; do you think this information would be accurate?

These packages are linked to each other with SPDX relationships. Each recipe is related to its sub-packages such as "SPDXRef-Recipe-zlib GENERATES SPDXRef-Package-zlib-dev". The image and sub-packages are linked with relationships like "SPDXRef-Package-zlib-dev PACKAGE_OF SPDXRef-Image-core-image-minimal", which are extracted from the IMAGE_MANIFEST. This all works decently well, I think. One thing that could possibly be useful additional information here would be some sort of dependency information describing that a package is included because some other package depends on it.

In the files we save two different types of files, files included in the recipes' source and files packaged with the sub-packages. These are again differentiated with the id, for example "SPDXRef-SourceFile-zlib-1" being a file in zlib's source and "SPDXRef-PackagedFile-zlib-dev-1" being a file packaged with the sub-package zlib-dev. These are also linked to the packages with relationships such as "SPDXRef-Recipe-zlib CONTAINS SPDXRef-SourceFile-1" and "SPDXRef-Package-zlib-dev CONTAINS SPDXRef-PackagedFile-1". This seems to also be in a decent shape, if I've understood everything correctly.

The next bit of information is the one where I'm maybe the most uncertain. For the binary files, we run the dwarfsrcfiles-utility to try to determine the source files used to build those binaries. Then we try to find those source files and link them to the binaries with relationships like "SPDXRef-PackagedFile-zlib-dev-1 GENERATED_FROM SPDXRef-SourceFile-zlib-1". This is done across package borders, so binaries are related to source files from glib also for example. Locating these source files based on the information from dwarfscrfiles may have some problems. Not all files are found. The logic for getting the file information is at https://github.com/doubleopen-project/meta-doubleopen/blob/d4e1d9a4e566ba6e74789f8a9d2376dea808eef3/classes/create-srclist.bbclass#L48-L66 and the not found files are logged at https://github.com/doubleopen-project/meta-doubleopen/blob/d4e1d9a4e566ba6e74789f8a9d2376dea808eef3/classes/combine-spdx.bbclass#L70.

Is this of any help? As said, if I can help in any way, please let me know. If a call would be easier at some point, I'm available that way as well.

I'm also worried that if we generate the complex SPDX files that meta-
doubleopen does, we'll have people running away. We may need to
default to something simpler with the option of adding in a lot of the
information as unless you're handing things off to fossology or other
tools, it probably is overkill for most users and if default may actually put people off?
I agree that some of the data we currently gather may be quite a lot. We started with that as it's required for the project we're working on, but some feature gates could perhaps be used to limit what data is collected and saved. For a lot of projects, just the packages and their declared licensing data may very well be enough. For the project currently at hand, detailed file level information is required. Just the packages may be a sane default though.

Best regards,