Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSV-147] Better error message during faulty CSV record read #347

Merged
merged 18 commits into from
Sep 13, 2023

Conversation

gbidsilva
Copy link
Contributor

@gbidsilva gbidsilva commented Aug 30, 2023

This fix is related to : https://issues.apache.org/jira/browse/CSV-147.

If we have some faulty data in the CSV then the current error message which we are getting is something similar to below.
java.io.IOException: (line 2) invalid char between encapsulated token and delimiter

With this fix, what we will be getting is something similar to below,
java.io.IOException: An error occurred while tying to parse the CSV content. Error in line: 2, position: 94, last parsed content: ...rec4,rec5,rec6,rec7,rec8

Update
It has been decided to only to add the record position into the exception message and treat getLastParsedContent method as a new feature. Therefore this PR only contains the position related changes.

@garydgregory
Copy link
Member

I'm OK with adding the position but I am guessing someone will create a security issue for data exfiltration.

@gbidsilva
Copy link
Contributor Author

@garydgregory :
That is a good point. But IMO, whenever someone log the exception, at that point they should consider the security risks.
Isn't usually exception provide such details? as I remember I have came across such scenarios.
What would be the best practice in this case ? - good to know :-)

src/main/java/org/apache/commons/csv/CSVParser.java Outdated Show resolved Hide resolved
src/main/java/org/apache/commons/csv/CSVParser.java Outdated Show resolved Hide resolved
src/main/java/org/apache/commons/csv/CSVParser.java Outdated Show resolved Hide resolved
@gbidsilva
Copy link
Contributor Author

@elharo: Thank you for the feedback. Changes are updated in the PR now.

@gbidsilva
Copy link
Contributor Author

@garydgregory:
I have looked around what would be the best practice in adding data records in Exceptions. It seems that arguments are there for both the sides. However, as you suggested, we should avoid adding such details in Exception and keep it consistent with how commons were creating exceptions so far.
To rectify this, I have removed the content part from the exception message and made 'getLastParsedContent()' method public. Therefore, if there is any need then anyone can call that method and can get the last parsed content.
Test cases are added to the PR :-)
Appreciate your feedback on this.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbidsilva
Thank you for your updates. Please see my comments.

src/main/java/org/apache/commons/csv/CSVParser.java Outdated Show resolved Hide resolved
src/main/java/org/apache/commons/csv/CSVParser.java Outdated Show resolved Hide resolved
@gbidsilva
Copy link
Contributor Author

@garydgregory @elharo :
Requested changes are done.

@gbidsilva
Copy link
Contributor Author

@garydgregory :
I understand your point. In this case, we will treat getLastParsedContent method as a new feature and decide whether it is actually needed or not in the product.
To make this PR work, I have removed the getLastParsedContent method content and only kept the positioning related changes.
Appreciate your feedback :-)

@gbidsilva
Copy link
Contributor Author

@garydgregory : let us know if there is anymore change to be done in this PR.

@gbidsilva
Copy link
Contributor Author

@garydgregory @elharo
Exception handling moved to Lexer class.

@@ -367,8 +367,7 @@ private Token parseEncapsulatedToken(final Token token) throws IOException {
}
if (!Character.isWhitespace((char)c)) {
// error invalid char between token and next delimiter
throw new IOException("(line " + getCurrentLineNumber() +
") invalid char between encapsulated token and delimiter");
throw new IOException("Invalid char between encapsulated token and delimiter at line: " + getCurrentLineNumber() + ", position: " + getCharacterPosition());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be an IOException but that issue is not new with this PR

.build();

CSVParser csvParser = csvFormat.parse(stringReader);
Exception exception = assertThrows(UncheckedIOException.class, () -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UncheckedIOException is not right either, but again not new in this PR

@gbidsilva
Copy link
Contributor Author

@garydgregory : Checkstyle issue has been fixed.

@gbidsilva
Copy link
Contributor Author

@garydgregory : Anything pending from development side for this to be merged ?

@gbidsilva
Copy link
Contributor Author

@gbidsilva Thank you for your updates. Please see my comments.

Completed.

@codecov-commenter
Copy link

Codecov Report

Merging #347 (c0759cd) into master (7104598) will increase coverage by 0.00%.
Report is 13 commits behind head on master.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master     #347   +/-   ##
=========================================
  Coverage     97.87%   97.87%           
  Complexity      549      549           
=========================================
  Files            11       11           
  Lines          1178     1179    +1     
  Branches        204      204           
=========================================
+ Hits           1153     1154    +1     
  Misses           13       13           
  Partials         12       12           
Files Changed Coverage Δ
src/main/java/org/apache/commons/csv/Lexer.java 98.81% <100.00%> (+<0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@garydgregory garydgregory merged commit a33b24c into apache:master Sep 13, 2023
6 checks passed
@garydgregory garydgregory changed the title [CSV-147] Error message optimization during faulty CSV record read [CSV-147] Better error message during faulty CSV record read Sep 13, 2023
asfgit pushed a commit that referenced this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants