Outreachy internship project - license tracing enhancement
Paul Eggleton <paul.eggleton@...>
I'd like to propose we put forward the following project proposal for an Outreachy internship (https://www.outreachy.org/communities/cfp/). I'm prepared to be the mentor for the project and Microsoft will provide funding. (Note that we haven't got our community re-registered with Outreachy or set this up as an intern project proposal yet - deadline for YP community registration is September 17th and for project submissions is September 24th). Here's the brief:
Yocto Project License tracing enhancement
The Yocto Project build system is typically used to build customised Linux images from source for embedded applications. Along with the image, a manifest of packages and their corresponding licenses is prepared, however the accuracy of the license information is dependent on the accuracy of the metadata we have for each package (i.e. what is in the recipe file). As part of the build, we have an internal mapping from output files to source files which is currently used to prepare source packages to aid in debugging, however with the presence of SPDX headers in source files it could also be used to allow tracing the license of sources used in building a package/image to help improve our metadata and future license manifests. A proof-of concept implementation of this has been put together  - during this internship a successful intern will:
1) take the proof-of-concept implementation and get it to a state where it can be merged into the poky repository
2) use the functionality to examine the accuracy of our license tagging (LICENSE fields in recipes); look for errors / noise in the comparison, and produce a simple report with the results
3) run a check over sources in a world build looking for percentage coverage of SPDX headers, and run it for several past releases to see the change over time
Bonus: assess the current state of meta-spdx-scanner; investigate what it would take to produce SPDX documents from build output (would likely require integration with Fossology).
I'm making the assumption that we're OK with merging the PoC functionality in rather than just keeping it separate and using it for analysis - let me know if otherwise. What I'd really like to know is do people think that this is sufficient for a 3-month internship, assuming that the intern has limited to moderate familiarity with our codebase? Do we need to flesh it out further? Any modifications that you'd suggest to the work?