Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded purl components as per the PURL specification. #23

Merged
merged 3 commits into from
May 25, 2023

Conversation

morended
Copy link
Contributor

As per the PURL specification, some components are allowed to use other characters beyond ASCII: these components must then be UTF-8-encoded strings and percent-encoded.

This PR includes changes to encode version, subpath, name and qualifier values.

"purl": "pkg:docker/customer/dockerimage@sha256:244fd47e07d1004f0aed9c?repository_url=gcr.io",
"canonical_purl": "pkg:docker/customer/dockerimage@sha256:244fd47e07d1004f0aed9c?repository_url=gcr.io",
"purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
"canonical_purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io",
Copy link
Collaborator

@am11 am11 May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne according to the spec text:

  • the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. They may need to be encoded elsewhere
  • the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere

here ':' is part of version value and not separator. Should we interpret the They may need to be encoded elsewhere part as it is optional to encode them since both Python and JavaScript implementations honor the un-encoded form?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh - interesting point. It looks like both the Python and Javascript implementations simply call .Replace("%3A", ":") after percent-encoding the name, version, and subpath components.

Looking at qualifier values, the Javascript implementation also allows to be unencoded /, but still encodes :, whereas the Python implemetation does the opposite for qualifier values - so opinions seem to vary there.

@morended What do you think about using .Replace("%3A", ":") to allow : to stay unencoded in the name, version and subpath components (to match the other implementations)? We could punt on / and : in qualifier values for now since there isn't a consensus in those implementations, and leaving special characters unencoded is optional after all.

Copy link
Contributor Author

@morended morended May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 I have addressed this as per @pmalmsten's comments in my latest revision. Let me know if there is anything.

Copy link
Contributor

@pmalmsten pmalmsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but we should be careful about which components we allow / to serialize unencoded for.

@@ -205,7 +209,7 @@ private void Parse(string purl)
if (remainder.Contains("#"))
{ // subpath is optional - check for existence
int index = remainder.LastIndexOf("#");
Subpath = ValidateSubpath(remainder.Substring(index + 1));
Subpath = WebUtility.UrlDecode(ValidateSubpath(remainder.Substring(index + 1)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought as below:

Suggested change
Subpath = WebUtility.UrlDecode(ValidateSubpath(remainder.Substring(index + 1)));
Subpath = ValidateSubpath(WebUtility.UrlDecode(remainder.Substring(index + 1)));

@morended morended force-pushed the morended/encode_purl_components branch from e2dec97 to ebb4d0f Compare May 23, 2023 18:02
Copy link
Collaborator

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thank you.

@am11 am11 merged commit a42c0b8 into package-url:master May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants