New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser may be interpreting this string incorrectly #1029
Comments
No, You will have to sanitize your input before passing it to the parser, although the best strategy for that will depend on what kinds of strings you want to parse. It will usually be stray numbers that cause problems, so if you know that you can reliably filter them out with regex that would be a good solution. Splitting your strings on punctuation marks and deciding which token is more likely to be a date (e.g. by identifying month names or prepositions) could also work. However, if your input is less predictable, you might want to consider natural language processing instead. |
While this isn't likely to be supported anytime soon, "maybe someday" cases like this are collected in |
below is using python 3.8 and dateutil 2.8.1
I understand April 20 is vague so some assumptions have to be made. Removing the %, £, 14, .99, 25, and various combinations of those all give various forms of 14, 99, or 25 in the date. Below is the most detail I can retain to get an answer with a valid assumption.
Is dateutil expected to parse that string? Do you have any suggestions on how I should parse it? My first thought is to filter out regexes like [0-9]+% and [$|£]+[0-9]+(.[0-9])*\s to remove percentage numbers and currency + value combinations, but you may have run into better ways to manage this.
The text was updated successfully, but these errors were encountered: