Parser may be interpreting this string incorrectly #1029

jeffka11 · 2020-04-10T17:30:11Z

below is using python 3.8 and dateutil 2.8.1

from dateutil import parser
parser.parse("£14.99 (25% off, until April 20)", fuzzy_with_tokens=True)
(datetime.datetime(2014, 4, 25, 20, 0), ('£', '(', '% off, until ', ' ', ')'))

I understand April 20 is vague so some assumptions have to be made. Removing the %, £, 14, .99, 25, and various combinations of those all give various forms of 14, 99, or 25 in the date. Below is the most detail I can retain to get an answer with a valid assumption.

>>> parser.parse("£. (% off, until April 20)", fuzzy_with_tokens=True)
(datetime.datetime(2020, 4, 20, 0, 0), ('£. (% off, until ', ' ', ')'))

Is dateutil expected to parse that string? Do you have any suggestions on how I should parse it? My first thought is to filter out regexes like [0-9]+% and [$|£]+[0-9]+(.[0-9])*\s to remove percentage numbers and currency + value combinations, but you may have run into better ways to manage this.

The text was updated successfully, but these errors were encountered:

ffe4 · 2020-04-10T19:03:40Z

No, dateutil is not expected to parse that type of string, although this is a common question that we should address in the docs. The parser basically identifies all elements that could be part of the date and then tries to find out where they go in the datetime object. The parser assumes that you pass it a string that contains mostly just the date, without giving it a hard time. The main purpose is to parse different date formats without having to define every possible format beforehand.

You will have to sanitize your input before passing it to the parser, although the best strategy for that will depend on what kinds of strings you want to parse. It will usually be stray numbers that cause problems, so if you know that you can reliably filter them out with regex that would be a good solution. Splitting your strings on punctuation marks and deciding which token is more likely to be a date (e.g. by identifying month names or prepositions) could also work. However, if your input is less predictable, you might want to consider natural language processing instead.

jbrockmendel · 2020-04-11T20:49:49Z

While this isn't likely to be supported anytime soon, "maybe someday" cases like this are collected in test.test_parser.TestParseUnimplementedCases, could add a test there.

Add and xfail unhandled case #1029

ffe4 added enhancement parser labels Jun 17, 2020

ffe4 added a commit to ffe4/dateutil that referenced this issue Jun 17, 2020

Add and xfail unhandled case dateutil#1029

8f2fdab

ffe4 mentioned this issue Jun 17, 2020

Add and xfail unhandled case #1029 #1056

Merged

2 tasks

ffe4 added a commit to ffe4/dateutil that referenced this issue Jun 18, 2020

Add and xfail unhandled case dateutil#1029

6c6ef34

ffe4 linked a pull request Jun 18, 2020 that will close this issue

Add and xfail unhandled case #1029 #1056

Merged

2 tasks

mariocj89 closed this as completed in #1056 Jul 5, 2021

mariocj89 added a commit that referenced this issue Jul 5, 2021

Merge pull request #1056 from ffe4/issue_1029

9c2ad8f

Add and xfail unhandled case #1029

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser may be interpreting this string incorrectly #1029

Parser may be interpreting this string incorrectly #1029

jeffka11 commented Apr 10, 2020 •

edited

ffe4 commented Apr 10, 2020

jbrockmendel commented Apr 11, 2020

Parser may be interpreting this string incorrectly #1029

Parser may be interpreting this string incorrectly #1029

Comments

jeffka11 commented Apr 10, 2020 • edited

ffe4 commented Apr 10, 2020

jbrockmendel commented Apr 11, 2020

jeffka11 commented Apr 10, 2020 •

edited