Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some possible data sources to identify package managers, build systems and compilers (build toolchain) #41

Open
bureado opened this issue Jan 11, 2022 · 0 comments

Comments

@bureado
Copy link

bureado commented Jan 11, 2022

The current spreadsheet shows package managers as candidate projects, and has build toolchains (generally comprising build systems, compilers and associated tooling) in the considered list. While the list is not overwhelmingly big, I suggest using existing taxonomies to seed this list. Here are a few examples:

WikiData

https://en.wikipedia.org/wiki/List_of_software_package_management_systems

Since that list doesn't "sound" structured, see https://www.wikidata.org/wiki/Q6639720 and then something like https://www.wikidata.org/wiki/Q98400269 leading to this in-wikidata index or this WikiData query which can help with the transitive closure (e.g., package managers/compilers/tools involved in delivering another critical component)

GitHub topics

Note that a significant number of components in this category predate GitHub (and git) and might not have a mirror or otherwise have a clear footprint in the following list.

https://github.com/topics/package-manager

debtags

Several Debian packages are tagged via debtags, relevant facets include:

https://debtags.debian.org/reports/taginfo/devel::compiler (n=150)
https://debtags.debian.org/reports/taginfo/devel::buildtools (n=150)
https://debtags.debian.org/reports/taginfo/devel::runtime (n=50)
https://debtags.debian.org/reports/taginfo/admin::package-management (n=150)
https://debtags.debian.org/reports/taginfo/works-with::software:package (n=160)
https://debtags.debian.org/reports/taginfo/works-with::software:source (n=400)

Such tags can be queried via a local debtags utility, a facility like apt-xapian-index, a point-in-time snapshot of the UDD database (e.g., in a Postgres instance) A benefit of using UDD is that the tagged package names can be joined with other data such as the upstream project URL which can aid in resolving the Debian package name to something more universal. Alternatively, approaches such as https://github.com/repology/repology-rules can be used.

A few tags can also help approximate a "critical to trust" definition. I'll open another issue on that topic in particular.

In general, the n= of the tags above hints that the work can also be done manually or crowdsourced; one example of a more direct list of package managers/ecosystems of interest would be https://libraries.io/languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant