Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation of distro and pkg data sets to create a searchable DB #203

Open
TheFoxAtWork opened this issue Jul 6, 2022 · 4 comments
Open

Comments

@TheFoxAtWork
Copy link

Background: As more vulnerabilities to continue to be discovered in packages and libraries that are present in various distributions, practitioners working across their organizations need a single place to query for a particular dependency, package, or other component and discover which distributions and their version contain that (or vice versa).

Comparable Queries: The following is a variety of various tools or resources that have some functionality of a desired search tool (in order of best match to the use case described in the background):

Proposal: Ideally there'd be a single system which supports libraries.io and pkgs.org. pkgs.org API access requires a membership and may be worth the OpenSSF funding in order to query both APIs and bring them into a single too (or other financial offset to allow the pkgs.org API to be free). We should look to include as many distros in this central tool.

Why this issue on package-feeds?
It is unclear what is the best group to tackle this project, given package feed appears to have initial functionality, this issue is being submitted as best-possible-match for a home this could be created under or as an extension to.

For more information on the discussion that sparked this issue:
https://openssf.slack.com/archives/C019M98JSHK/p1657119043352399

@alilleybrinker
Copy link

Commented in the thread, but repeating here:

I think a solution which incorporates existing systems, rather than building a new package-finding system from scratch, is definitely the ideal. It would also enable us to cast the widest net in supporting many different platforms (both OS and language package managers). That said, imagining a tool which queries multiple sources, we'd want to be clear in the UI where information is being sourced. So if a result for a package comes from, say, libraries.io, the end-user should be informed.

@TheFoxAtWork
Copy link
Author

💯

@bureado
Copy link

bureado commented Jul 6, 2022

Only tangentially related, ossf/wg-securing-critical-projects#41. There is some overlap with the component/threat intelligence elements in certain commercial vendors/offerings, so it'd be interesting to ask members and commercial entities more broadly about this, too. Also https://ossindex.sonatype.org/, https://deps.dev/, and here's a list of by-hash links I collected a while ago over at https://github.com/bureado/awesome-software-supply-chain-security#dependency-intelligence:

It'd be good to model the query keys. Should we expect to pass a string, or a purl and it'll give us CPEs? Or a hash, or a filename and it gives us purls? Or will it help us normalize a partial search? See https://github.com/repology/repology-rules. And what kind of information about a package? For example, I don't think Repology would give us e.g., debtags or buildinfo files that we could bring in from dedicated Debian infrastructure, or even what the UDD does (I'm sure there are similar data sources for OBS, Koji, etc.)

Edit: forgot https://artifacthub.io/docs/topics/repositories/

@scovetta
Copy link

scovetta commented Jul 8, 2022

If helpful, you're welcome to leverage the logic (or implementation) we built into https://github.com/Microsoft/OSSGadget, which handles at least some of this abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants