Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch csv and tsv method 'sv' from ReadAll() to stream each record with Read() #355

Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
fe45260
switch csv and tsv method 'sv' from ReadAll() to stream each record w…
viprerk Dec 16, 2022
6677711
add testcases for one/two line csv/tsv
viprerk Dec 17, 2022
6a0aa42
Bump codecov/codecov-action from 3.1.0 to 3.1.1 (#328)
dependabot[bot] Feb 21, 2023
b1f8105
Bump actions/setup-go from 3.3.0 to 3.5.0 (#351)
dependabot[bot] Feb 21, 2023
a60ff30
Bump actions/checkout from 3.0.2 to 3.3.0 (#357)
dependabot[bot] Feb 21, 2023
6102fd9
Bump github/codeql-action from 2.1.22 to 2.2.4 (#364)
dependabot[bot] Feb 21, 2023
f42eeae
Bump golangci/golangci-lint-action from 3.2.0 to 3.4.0 (#360)
dependabot[bot] Feb 21, 2023
e3ac364
Upgrade golang.org/x/net dependency (#365)
gabriel-vasile Feb 21, 2023
7780377
Bump github/codeql-action from 2.2.4 to 2.2.5 (#366)
dependabot[bot] Feb 28, 2023
44bb566
Bump golang.org/x/net from 0.7.0 to 0.8.0 (#367)
dependabot[bot] Mar 8, 2023
dbcd5ed
replace deprecated ioutil calls (#392)
testwill May 31, 2023
3b8e5ee
Bump golang.org/x/net from 0.8.0 to 0.10.0 (#387)
dependabot[bot] Jun 5, 2023
79aa93c
Cosmetics - no code behaviour changes (#397)
gabriel-vasile Jun 5, 2023
8bdb082
Remove old travis build status link from readme (#407)
gabriel-vasile Jun 28, 2023
189fcfd
Group all dependabot PRs together (#409)
gabriel-vasile Jul 5, 2023
35b2558
Bump the gomod group with 1 update
dependabot[bot] Jul 6, 2023
c6bfce2
Bump the github-actions group with 5 updates
dependabot[bot] Jul 17, 2023
c3ad178
Remove some test files to CSV, TSV
gabriel-vasile Oct 10, 2023
2d392b2
Merge branch 'master' into streaming-csv-tsv-detection
gabriel-vasile Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 14 additions & 2 deletions internal/magic/text_csv.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package magic
import (
"bytes"
"encoding/csv"
"errors"
"io"
)

Expand All @@ -23,8 +24,19 @@ func sv(in []byte, comma rune, limit uint32) bool {
r.LazyQuotes = true
r.Comment = '#'

lines, err := r.ReadAll()
return err == nil && r.FieldsPerRecord > 1 && len(lines) > 1
lines := 0
for {
_, err := r.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
return false
}
lines++
}

return r.FieldsPerRecord > 1 && lines > 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this only checks for "lines > 1": The loop above could just break out, and there is no point at all to exhaust the reader (which could still be millions of lines?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReadAll() method would return error if any line in the file was not valid CSV. For this reason the loop calls Read() on each line and returns false overall if any line is not valid CSV.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking out of the loop when lines > 1 means input like this would wrongfully pass as csv:

val1, val2, val3
val4, val5, val6

some text that is not related to the csv above

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking and replying!

}

// dropLastLine drops the last incomplete line from b.
Expand Down