-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoded purl components as per the PURL specification. #23
Encoded purl components as per the PURL specification. #23
Conversation
… PURL specifation.
"purl": "pkg:docker/customer/dockerimage@sha256:244fd47e07d1004f0aed9c?repository_url=gcr.io", | ||
"canonical_purl": "pkg:docker/customer/dockerimage@sha256:244fd47e07d1004f0aed9c?repository_url=gcr.io", | ||
"purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io", | ||
"canonical_purl": "pkg:docker/customer/dockerimage@sha256%3A244fd47e07d1004f0aed9c?repository_url=gcr.io", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne according to the spec text:
- the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. They may need to be encoded elsewhere
- the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere
here ':' is part of version
value and not separator. Should we interpret the They may need to be encoded elsewhere
part as it is optional to encode them
since both Python and JavaScript implementations honor the un-encoded form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh - interesting point. It looks like both the Python and Javascript implementations simply call .Replace("%3A", ":")
after percent-encoding the name
, version
, and subpath
components.
Looking at qualifier values, the Javascript implementation also allows to be unencoded /
, but still encodes :
, whereas the Python implemetation does the opposite for qualifier values - so opinions seem to vary there.
@morended What do you think about using .Replace("%3A", ":")
to allow :
to stay unencoded in the name
, version
and subpath
components (to match the other implementations)? We could punt on /
and :
in qualifier values for now since there isn't a consensus in those implementations, and leaving special characters unencoded is optional after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@am11 I have addressed this as per @pmalmsten's comments in my latest revision. Let me know if there is anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, but we should be careful about which components we allow /
to serialize unencoded for.
src/PackageUrl.cs
Outdated
@@ -205,7 +209,7 @@ private void Parse(string purl) | |||
if (remainder.Contains("#")) | |||
{ // subpath is optional - check for existence | |||
int index = remainder.LastIndexOf("#"); | |||
Subpath = ValidateSubpath(remainder.Substring(index + 1)); | |||
Subpath = WebUtility.UrlDecode(ValidateSubpath(remainder.Substring(index + 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thought as below:
Subpath = WebUtility.UrlDecode(ValidateSubpath(remainder.Substring(index + 1))); | |
Subpath = ValidateSubpath(WebUtility.UrlDecode(remainder.Substring(index + 1))); |
e2dec97
to
ebb4d0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Thank you.
As per the PURL specification, some components are allowed to use other characters beyond ASCII: these components must then be UTF-8-encoded strings and percent-encoded.
This PR includes changes to encode version, subpath, name and qualifier values.