Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent unnessary b' prefix in parse_isodate exception messages #1122

Merged
merged 1 commit into from May 20, 2021

Conversation

pawl
Copy link
Contributor

@pawl pawl commented May 4, 2021

Summary of changes

I started working on #383 by adding type hints to this branch: https://github.com/dateutil/dateutil/compare/master...pawl:monkeytype_merge_pyi?expand=1#diff-fba91b4117b31a0a29957adeb621fe79541fdb2424d10359e444df0d1cea2473R4

When I run mypy to see if those type hints work, I get a bunch of errors (a lot of which are probably my fault): https://gist.github.com/pawl/7ad1e9ce650b3241cc106d238d7edf04

This PR fixes one of those errors:

dateutil/parser/isoparser.py:167: error: On Python 3 '{}'.format(b'abc') produces "b'abc'", not 'abc'; use '{!r}'.format(b'abc') if this is desired behavior

That error is saying isoparser().parse_isotime('2014-04-19T') will give an error that looks like this with the b' prefix around the input:

String contains unknown ISO components: b'2014-04-19T'

This is happening because the _takes_ascii decorator turns the string input into a bytestring, and python 3 represents bytestrings with b' when you pass them into .format().

This PR fixes the issue by using .decode() on the bytestring before using it with .format() for the error message.

Pull Request Checklist

  • Changes have tests
  • Authors have been added to AUTHORS.md
  • News fragment added in changelog.d. See CONTRIBUTING.md for details

Copy link
Member

@mariocj89 mariocj89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR! It's almost there.

@@ -159,7 +159,7 @@ def parse_isodate(self, datestr):
components, pos = self._parse_isodate(datestr)
if pos < len(datestr):
raise ValueError('String contains unknown ISO ' +
'components: {}'.format(datestr))
'components: {}'.format(datestr.decode()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is happening as the function uses _takes_ascii. Can you pass "ascii" as the codec to decode the bytes? In case the default encoding is another one.

isoparser().parse_isodate(isostr)

# ensure the error message does not contain b' prefixes
assert "b'{}'".format(isostr) not in str(excinfo.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than checking for the not presence of b'X', would you mind adding a new test case for this (feel free to copy this one) and validate the fully generated exception str?

@pawl
Copy link
Contributor Author

pawl commented May 20, 2021

@mariocj89 Thanks so much for the good feedback! I've added fixes for the things you've mentioned. Let me know if I need to make any additional tweaks.

@@ -159,7 +159,7 @@ def parse_isodate(self, datestr):
components, pos = self._parse_isodate(datestr)
if pos < len(datestr):
raise ValueError('String contains unknown ISO ' +
'components: {}'.format(datestr))
'components: {}'.format(datestr.decode('ascii')))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it'd be even easier to read if we passed here: 'components: {!r}'.

That will put ' around the text that failed (Say for example that we have some invisible characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: f28e639

Definitely agree with removing the quotes making it more difficult to detect things like whitespace.

As the function is wrapped around a decorator that converts the input to
bytes, when an exception is raised that includes the value, it contains
the value as bytes (b"<val>"). To provide the expected str in Python 3,
decode that to unicode.

This will result in `u""` for Python 2 callers, but given that we plan
to drop support for Python 2 in the short term, it feels acceptable.
@mariocj89
Copy link
Member

Thanks, I've added a fixup for your tests to pass in Py2. I'll land this later today.

@mariocj89 mariocj89 merged commit 2ac5012 into dateutil:master May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants