Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcode known wrapper checksums to avoid network requests #167

Closed
wants to merge 1 commit into from

Conversation

Marcono1234
Copy link
Contributor

@Marcono1234 Marcono1234 commented Jan 27, 2024

Fixes #161

If a checksum is not found in the hardcoded list, the action falls back to fetching the checksums from the Gradle API, as before.

For now there is no logic for automatically updating the list of known checksums; that has to be done manually. But maybe it could be automated in some way in the future.

Here is my somewhat hacky (but hopefully bug-free) Java code which I used to generate the list of checksums:

Checksums list creator
import java.io.IOException;
import java.lang.reflect.Type;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Locale;
import java.util.Set;

import com.google.gson.Gson;
import com.google.gson.JsonDeserializationContext;
import com.google.gson.JsonDeserializer;
import com.google.gson.JsonElement;
import com.google.gson.JsonParseException;
import com.google.gson.annotations.JsonAdapter;
import com.google.gson.reflect.TypeToken;

import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.Response;

public class GradleChecksumsFetcher {
    private static final OkHttpClient client = new OkHttpClient();

    private static String fetch(String url) throws IOException {
      Request request = new Request.Builder()
          .url(url)
          .build();

      try (Response response = client.newCall(request).execute()) {
        return response.body().string();
      }
    }

    static class VersionData {
        String version;
        String wrapperChecksumUrl;
        @JsonAdapter(BuildTimeAdapter.class)
        Instant buildTime;

        boolean snapshot;
        boolean nightly;
        boolean releaseNightly;

        private static class BuildTimeAdapter implements JsonDeserializer<Instant> {
            private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyyMMddHHmmssxx", Locale.ROOT);

            @Override
            public Instant deserialize(JsonElement json, Type typeOfT, JsonDeserializationContext context)
                    throws JsonParseException {
                return FORMATTER.parse(json.getAsString(), Instant::from);
            }
        }
    }

    private static List<String> parseVersion(String v) {
        int i = v.indexOf('-');
        if (i != -1) {
            v = v.substring(0, i);
        }

        return List.of(v.split("\\."));
    }

    private static int compareVersions(String a, String b) {
        List<String> aParts = parseVersion(a);
        List<String> bParts = parseVersion(b);

        for (int i = 0; i < Math.min(aParts.size(), bParts.size()); i++) {
            int aPart = Integer.parseInt(aParts.get(i));
            int bPart = Integer.parseInt(bParts.get(i));

            if (aPart < bPart) {
                return -1;
            }
            if (aPart > bPart) {
                return 1;
            }
        }
        return Integer.compare(aParts.size(), bParts.size());
    }

    public static void main(String[] args) throws Exception {
        Gson gson = new Gson();
        List<VersionData> versions = gson.fromJson(fetch("https://services.gradle.org/versions/all"), new TypeToken<>() {});
        // First sort by version number, then by build time
        // That should (in most cases) achieve version ranges where all hashes only belong to one range
        Comparator<VersionData> comparator = (v1, v2) -> compareVersions(v1.version, v2.version);
        versions.sort(comparator.thenComparing(Comparator.comparing(v -> v.buildTime)));

        String firstVersionName = null;
        String lastVersionName = null;
        String lastHash = null;

        Set<String> seenHashes = new HashSet<>();

        for (VersionData version : versions) {
            if (version.wrapperChecksumUrl == null || version.snapshot || version.nightly || version.releaseNightly) {
                continue;
            }
            String versionName = version.version;
            String hash = fetch(version.wrapperChecksumUrl);

            if (lastHash == null) {
                firstVersionName = versionName;
                lastVersionName = versionName;
                lastHash = hash;
            } else {
                if (hash.equals(lastHash)) {
                    lastVersionName = versionName;
                } else {
                    if (firstVersionName.equals(lastVersionName)) {
                        System.out.println("// " + firstVersionName);
                    } else {
                        System.out.println("// " + firstVersionName + " - " + lastVersionName);
                    }
                    System.out.println("\"" + lastHash + "\",");

                    firstVersionName = versionName;
                    lastVersionName = versionName;
                    lastHash = hash;

                    // This acts mainly as assertion that the version sorting logic is correct
                    // There seems to be one case though where the version range `8.0-milestone-1 - 8.0-milestone-3`
                    // is using the same hash as a previous version range
                    if (!seenHashes.add(hash)) {
                        System.out.println("// WARNING: Duplicate hash: " + hash + "; for version " + versionName);
                    }
                }
            }
        }

        System.out.println("// " + firstVersionName + " - " + lastVersionName);
        System.out.println("\"" + lastHash + "\",");
    }

}

⚠️ I am not that familiar with TypeScript, so any feedback is appreciated!

Copy link
Contributor

@JLLeitschuh JLLeitschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks like it's headed in the right direction!

I think the biggest ask is tooling/automation to keep the list up-to-date more easily, and ideally, with a GitHub action

src/checksums.ts Outdated
@@ -6,9 +6,165 @@ const httpc = new httpm.HttpClient(
{allowRetries: true, maxRetries: 3}
)

/** Known checksums from previously published Wrapper versions */
export const KNOWN_VALID_CHECKSUMS = new Set([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to pull this from a resource file instead of hard-coding it into the source code.

Ideally, there would be a resource file, and there would also be a script that CI could automatically run to update this file and would auto-push then auto-release when there was an update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would be much better. If the file was formatted as JSON, it could be trivially loaded into an object: that way we could include the Gradle version as well as the checksum in the metadata.

@Marcono1234 you can see here an example of loading a resource file in Typescript. It may not be the best way, but it works!
Then, you can use JSON.parse(fileContents) and use it as a typed Javascript object. Here are some examples.

Copy link
Member

@bigdaz bigdaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. If we can extract the checksums list into a resource file, then I'm OK if this needs to be maintained by hand (for now). New Gradle versions will continue to require a network call until the wrapper action is updated.

As @JLLeitschuh mentioned, it would be ideal if we had a job that ran in this repository to update the versions file when a new Gradle version is released. But we'll still need to release a new action version each time.

An additional optimization would be to store any "discovered" checksums in the GitHub Actions cache. We read the set of known versions from the resource, add any discovered versions, and write the JSON file directly to the cache.

src/checksums.ts Outdated
@@ -6,9 +6,165 @@ const httpc = new httpm.HttpClient(
{allowRetries: true, maxRetries: 3}
)

/** Known checksums from previously published Wrapper versions */
export const KNOWN_VALID_CHECKSUMS = new Set([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would be much better. If the file was formatted as JSON, it could be trivially loaded into an object: that way we could include the Gradle version as well as the checksum in the metadata.

@Marcono1234 you can see here an example of loading a resource file in Typescript. It may not be the best way, but it works!
Then, you can use JSON.parse(fileContents) and use it as a typed Javascript object. Here are some examples.

@bigdaz
Copy link
Member

bigdaz commented Jan 30, 2024

@Marcono1234 The changes to dist/index.js make the PR very difficult to merge. Can you please update the commit to remove that change?

@Marcono1234 Marcono1234 marked this pull request as draft January 30, 2024 22:43
@Marcono1234
Copy link
Contributor Author

Thanks for the feedback! I will try to adjust this in the next days.

Would it make sense though to use a plaintext / custom text format where for example each line represents a checksum, except if it is blank or starts with #? For example something like:

# 1.0
87e50531ca7aab675f5bb65755ef78328afd64cf0877e37ad876047a8a014055
# 1.1
22c56a9780daeee00e5bf31621f991b68e73eff6fe8afca628a1fe2c50c6038e
# 1.2
5c91fa893665f3051eae14578fac2df14e737423387e75ffbeccd35f335a3d8b
...

JSON has the disadvantage that it doesn't support comments, and just having a large list of checksums without any indication to which versions they belong might make reviewing it difficult.

However, with #145 maybe it would make sense to use JSON instead of a custom text format, but then include the version numbers. For example:

{
  "87e50531ca7aab675f5bb65755ef78328afd64cf0877e37ad876047a8a014055": [
    "1.0"
  ],
  "22c56a9780daeee00e5bf31621f991b68e73eff6fe8afca628a1fe2c50c6038e": [
    "1.1",
  ]
}

@bigdaz
Copy link
Member

bigdaz commented Jan 31, 2024

However, with #145 maybe it would make sense to use JSON instead of a custom text format, but then include the version numbers.

Yes that's what I meant. Something like:

[
 { "version": "1.0", "checksum": "....." },
 { "version": "1.1", "checksum": "....." },
  ... etc ..
]

That way you could use JSON.parse to read this directly into an array of Typescript object with a version and checksum attribute. Take a look at the links I posted for examples.

Once you have this array, you can get the list of checksums on their own using arrayOfObjects.map { it.checksum }.

@Marcono1234
Copy link
Contributor Author

Marcono1234 commented Jan 31, 2024

I have performed the following changes now:

  • Included checksums as JSON file
    The JSON file is imported as module instead of using path.resolve(__dirname, ...). This seems more reliable to me because depending on whether unit tests or the action (from index.js) is run __dirname differs (I assume), and therefore both src/... and dist/... would need to have the same nesting level for this to work.
    I am not sure though if importing the JSON as module could become a performance problem (possibly during development only).
  • Changed known KNOWN_VALID_CHECKSUMS to be a Map<string, Set<string>>, mapping from checksum to set of version names. This is for compatibility with #145; could then for example calculate the checksum for the JAR, and check whether the expected version is in the set for that checksum.
  • Added GitHub workflow for updating checksums file and creating a pull request
    This runs every week but can also be run manually. Here is an example how the created pull request looks like: https://github.com/Marcono1234/wrapper-validation-action/pull/6.

I hope that is ok like this. Feedback is appreciated!

@Marcono1234 Marcono1234 marked this pull request as ready for review January 31, 2024 19:24
- name: Create or update pull request
uses: peter-evans/create-pull-request@v6
with:
branch: wrapper-checksums-update
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the branch name should use a prefix such as bot/ to prevent accidentally manually editing this branch, which might then be overwritten by this workflow.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds completely reasonable to prefix it with bot/

@bigdaz
Copy link
Member

bigdaz commented Jan 31, 2024

Thanks @Marcono1234 . I'm not going to have capacity to take this further until Feb 13 at the earliest. But it's on my list!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Was thinking about writing this myself and wasn't looking forward to it. Solid bit of work here!

Javascript is definitely not my strongest area, ngl. 😆

@@ -6,7 +6,8 @@
"rootDir": "./src", /* Specify the root directory of input files. Use to control the output directory structure with --outDir. */
"strict": true, /* Enable all strict type-checking options. */
"noImplicitAny": true, /* Raise error on expressions and declarations with an implied 'any' type. */
"esModuleInterop": true /* Enables emit interoperability between CommonJS and ES Modules via creation of namespace objects for all imports. Implies 'allowSyntheticDefaultImports'. */
"esModuleInterop": true, /* Enables emit interoperability between CommonJS and ES Modules via creation of namespace objects for all imports. Implies 'allowSyntheticDefaultImports'. */
"resolveJsonModule": true, /* Enable importing JSON files as module; used for importing wrapper checksums JSON */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was what I was missing when I took an early crack at this! 🎉

*
* Maps from the checksum to the names of the Gradle versions whose wrapper has this checksum.
*/
export const KNOWN_VALID_CHECKSUMS = getKnownValidChecksums()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test that verifies that this is never empty, and always contains some reasonable versions as a spot-check that this logic never breaks?

Copy link
Member

@bigdaz bigdaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never really worked on this code, and I don't want to delay the merge. So I'm going to approve (and merge) based on @JLLeitschuh approval.
I'd like to include this in the first RC of v2.0.0, since that mitigates some of the risk of the change. (People have to explicitly update to get the new version).

@bigdaz
Copy link
Member

bigdaz commented Feb 1, 2024

This has been merged manually. I need to find a better process for merging external PRs.

@bigdaz
Copy link
Member

bigdaz commented Feb 1, 2024

'd like to include this in the first RC of v2.0.0, since that mitigates some of the risk of the change. (People have to explicitly update to get the new version).

Actually, I forgot that I already released v2.0.0-rc.1 🤦🏼 .
This change will need to wait for a v2.1.0 release.

@JLLeitschuh
Copy link
Contributor

@Marcono1234 seriously, thank you so much for working this problem out. Really truly. You can see by the number of issues that @bigdaz closed how much pain our network connection logic has caused over the years.

You've just significantly increased the stability and usability of this action for all of the users of it.

Truly, thank you so much for your contribution. I'm incredibly appreciative you took the time to contribute it.

@Marcono1234
Copy link
Contributor Author

Thanks for the kind words and the feedback!

I have addressed the feedback (adding tests, and bot/ branch prefix) in #178.

@bigdaz
Copy link
Member

bigdaz commented Feb 7, 2024

I've just released v2.1.0 containing this change. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hardcode list of known checksums to avoid network requests in most cases
3 participants