New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for issue 125 - lz4 data corruption when concurrency is used #127
Conversation
It appears to take the shotgun approach and just alloc+copy everything. This will be very slow. The problem appears to be that For s2 I use a |
writer.go
Outdated
@@ -173,7 +173,10 @@ func (z *Writer) writeHeader() error { | |||
|
|||
// Write compresses data from the supplied buffer into the underlying io.Writer. | |||
// Write does not return until the data has been written. | |||
func (z *Writer) Write(buf []byte) (int, error) { | |||
func (z *Writer) Write(buffer []byte) (int, error) { | |||
buf := make([]byte, len(buffer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying here is not required.
writer.go
Outdated
@@ -223,7 +226,10 @@ func (z *Writer) Write(buf []byte) (int, error) { | |||
} | |||
|
|||
// compressBlock compresses a block. | |||
func (z *Writer) compressBlock(data []byte) error { | |||
func (z *Writer) compressBlock(dataBlock []byte) error { | |||
data := make([]byte, len(dataBlock)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it up before this is called and use the getBuffer function call,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I missed this comment, it seems we might be able to use those pools that are already defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated it, I am new to the code and didn't know there were already pools defined for each block size :)
I agree with @klauspost please use the pool of buffers. |
thanks for the feedback @klauspost and @pierrec, that makes a lot of sense! I have made the change to use |
LGTM |
This is an attempt to fix #125. I noticed corruption of the LZ4 stream happens only when concurrency is used (everything works fine without it)
When compressing, the buffer gets sent to a go routine, but when concurrency is used, the underlying array might get modified before the execution starts. causing the output of the lz4 Writer to be corrupt. Creating a copy of the buffer before solves this issue.
I also did the same thing on the main
Write()
func because according to the documentation of Writer:The existing tests don't seem to have run into this because we read the entire file into memory and then use
bytes.NewReader()
to create the reader for theCopy()
operation, however the implementation shows that the data is copied into a new buffer and returned, which mitigates this issue. The tests included in this PR useos.Open()
similarly to the implementation incmd/lz4c
.Test before the fix:
After the fix: