[yocto] Is curated SPDX data sharing a thing?

Richard Purdie

On Fri, 2020-12-18 at 16:51 -0500, Jérôme Carretero wrote:
On Fri, 18 Dec 2020 20:34:01 +0000
"Richard Purdie" <richard.purdie@linuxfoundation.org> wrote:

The challenge is that Yocto Project lets you build your own custom
software, which means you also end up in your own BoM situation. We
generally therefore provide tooling that can help you generate the
information you need but there usually isn't "one size fits all".
Of course different choices can be made regarding obligations (where
licenses are shown, how sources are distributed) but it in the same
way that today ${LICENSE_DIRECTORY}/${P}/recipeinfo contains a
LICENSE key which is very useful figuring out obligations, SPDX could
be used to have more information and more trust.
Its going to take someone to stand up and provide the first "version"
of that and I'm not sure anyone wants to step up and be that

In most of my experience, a product mostly contains F/LOSS code from
major Yocto/OE layers, maybe a couple of other 3rd party libraries, a
couple of patches here and there, and a few 100kSLOC of "original"
the BoM consists... in an image manifest file.

A huge portion of the SPDX data could be reused, to get an
almost-complete better BoM.
It does depend on which data we're talking about. You also have the
issue that its fine to generate this tons of data but at some point you
have to interpret what it means too...

I would mention the meta-spdxscanner layer as having
support/integration for some of the more recent scanning and
generation tools.
Yeah, I used it. I can see that it mostly works except for the fact
that you either spend a lifetime doing source code analysis, or just
a few years because you trust the agreement of multiple robots on the
license verdict, which only leaves you the ambiguous files to process
(and that's time-consuming work).
I watched and helped our older LICENSE field work and I can say its a
thankless task which its very hard to get people to do. I fear that the
SPDX scans you refer to are so complex it will be hard to do this
consistently across the codebase. I'm actually hoping things may go a
slightly different route such as ultimately a majority of code having
license identifiers in it (we've tried to ensure YP code has them).

I'm sure there are services provided, particularly by some of the
member OSVs but as I mention above, its hard to have a one size
all since you can patch or reconfigure the sources at will.
SPDX data contains package and also source file info (based on
so if a patch is applied, an analysis would only need to concern
modified files. Provided a development history and a baseline SPDX
available, it would significantly reduce the amount of work one would
Sure, how do we get people to build such a baseline though?

We are hoping to have better tools integration where the build
may be able to generation better SBoM and SPDX information
Unfortunately its an area its hard to find people willing to
It's certainly easy to verify after do_patch (or after do_compile in
some cases) that sources correspond to existing SPDX files, or to
lookup SPDX files in an external database based on hashes of sources,
but automatically generating SPDX:

- is very time-consuming and I don't see it as something that one
even do eg. in continuous integration;
- is not perfect; I don't think the build process could automatically
generate more than "candidate SPDX" information except maybe for a
couple of really-clean packages where the developers care about
There are certainly ways it could be done, if there are people who
agree on a common objective and are willing/able to contribute time to

Is there is a more focused discussion list on that topic or here is
I may have a lot of questions/ideas but don't want to cause off-topic
We did set one up so there is
https://lists.yoctoproject.org/g/licensing/topics but it hasn't really
taken off (yet?)...