-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSV-147] Better error message during faulty CSV record read #347
Conversation
I'm OK with adding the position but I am guessing someone will create a security issue for data exfiltration. |
@garydgregory : |
@elharo: Thank you for the feedback. Changes are updated in the PR now. |
@garydgregory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gbidsilva
Thank you for your updates. Please see my comments.
@garydgregory @elharo : |
@garydgregory : |
@garydgregory : let us know if there is anymore change to be done in this PR. |
@garydgregory @elharo |
@@ -367,8 +367,7 @@ private Token parseEncapsulatedToken(final Token token) throws IOException { | |||
} | |||
if (!Character.isWhitespace((char)c)) { | |||
// error invalid char between token and next delimiter | |||
throw new IOException("(line " + getCurrentLineNumber() + | |||
") invalid char between encapsulated token and delimiter"); | |||
throw new IOException("Invalid char between encapsulated token and delimiter at line: " + getCurrentLineNumber() + ", position: " + getCharacterPosition()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be an IOException but that issue is not new with this PR
.build(); | ||
|
||
CSVParser csvParser = csvFormat.parse(stringReader); | ||
Exception exception = assertThrows(UncheckedIOException.class, () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UncheckedIOException is not right either, but again not new in this PR
@garydgregory : Checkstyle issue has been fixed. |
@garydgregory : Anything pending from development side for this to be merged ? |
Completed. |
Codecov Report
@@ Coverage Diff @@
## master #347 +/- ##
=========================================
Coverage 97.87% 97.87%
Complexity 549 549
=========================================
Files 11 11
Lines 1178 1179 +1
Branches 204 204
=========================================
+ Hits 1153 1154 +1
Misses 13 13
Partials 12 12
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This fix is related to : https://issues.apache.org/jira/browse/CSV-147.
If we have some faulty data in the CSV then the current error message which we are getting is something similar to below.
java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
With this fix, what we will be getting is something similar to below,
java.io.IOException: An error occurred while tying to parse the CSV content. Error in line: 2, position: 94, last parsed content: ...rec4,rec5,rec6,rec7,rec8
Update
It has been decided to only to add the record position into the exception message and treat
getLastParsedContent
method as a new feature. Therefore this PR only contains the position related changes.