Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the Scorecard API for historical data #376

Open
naveensrinivasan opened this issue Sep 17, 2022 · 1 comment
Open

Extending the Scorecard API for historical data #376

naveensrinivasan opened this issue Sep 17, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@naveensrinivasan
Copy link
Member

The scorecard API https://api.securityscorecards.dev/ on (8/13/22) provides an endpoint to GET the latest Scorecard RUN for the result.

It also provides an option to pass additional commit SHA to fetch the result by a specific SHA.

Historical Data

The Scorecard BQ has historical data (SELECT count(*) FROM openssf.scorecardcron.scorecard-v2) 35,337,975. This historical data isn't accessible via the API.

Also, end users only have the option to view the latest results even though we are storing additional new results also by commit SHA. End users aren't aware of the list of commit SHA available for them to query based on.

Scorecard historical data can determine the OSS repository's health and give a perspective on trends. It helps our customers to understand the historical timeline of any given repository. At the moment, this can be done only via BigQuery.

Proposed Solution

  • Following the similar to deps.dev(bigquery-public-data.deps_dev_v1.Snapshots) provide an endpoint for scans which will return a list of dates scorecard cron scans run. An API Endpoint that will return the scan dates.
  • Store the CRON run with a date prefix in the path https://github.com/ossf/scorecard/blob/bde0ae166a7f56f957d983a60e7316054255624f/cron/internal/worker/main.go#L147-L150 similar to commit SHA. Example - GCS/SCANDATE/github.com/kubernetes/kubernetes
  • Provide an option to query this data by including an additional new parameter called "scandate" similar to commit SHA, which the consumers can pass.
  • For the existing data in BQ, write an export JOB that will dump all of the historical data into the GCS bucket with the date and commit SHA prefixed path.

This help customers analyze without jumping through the hoops of understanding the pattern, and it is a single API.

@naveensrinivasan naveensrinivasan added the enhancement New feature or request label Sep 17, 2022
@rheironimus
Copy link

This solution will be very beneficial for end users (like my team) that leverage scorecard in our compliance pipeline to evaluate dependencies that are not the latest version by passing a specific SHA. It also will help avoid hitting the GitHub rate limits.

@naveensrinivasan naveensrinivasan transferred this issue from ossf/scorecard Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants