Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide size of uncompressed data #159

Open
jtmoon79 opened this issue May 6, 2024 · 5 comments
Open

provide size of uncompressed data #159

jtmoon79 opened this issue May 6, 2024 · 5 comments

Comments

@jtmoon79
Copy link

jtmoon79 commented May 6, 2024

Can lz4_flex provide an API for getting the uncompressed size of data before decompressing that data?
I assume the uncompressed size is in some kind of block header or something like that? (I don't know the LZ4 binary format; just guessing)

This would be particularly helpful for lz4 compressed files.

jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 6, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 6, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 7, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 7, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 7, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

README.md mention lz4 and update comparison table

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 7, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

README.md mention lz4 and update comparison table

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
jtmoon79 added a commit to jtmoon79/super-speedy-syslog-searcher that referenced this issue May 7, 2024
support parsing lz4 compressed files `.lz4`

refactor processing sequential-only syslogs (compressed logs that can only
be read linearly; no binary searching)

add logs for LZ4 `.lz4`, and old LZMA `.lz`, and variations of those

add tests in compare-current-and-expected

add lz4 flamegraph in flamegraphs.sh

Fix Issue #201 with tests that had unknown panics for blockreader
processing gzip files.

README.md mention lz4 and update comparison table

Issue PSeitz/lz4_flex#159
Issue #201
Issue #128
Issue #291
Issue #283
@PSeitz
Copy link
Owner

PSeitz commented May 10, 2024

There's the content_size flag in the frame format header, but it's existence is optional.
Also there may be multiple frames. So it would be the sum of the content_size.

This would be particularly helpful for lz4 compressed files.

Why would that be helpful?

@jtmoon79
Copy link
Author

There's the content_size flag in the frame format header, but it's existence is optional.

How do I access the content_size for some file? Can you complete this code example:

use std::fs::File;
use std::fs::OpenOptions;
use std::io::prelude::Read;
use std::io::BufReader;
use std::path::Path;
use ::lz4_flex;

fn main() {
    let mut open_options = OpenOptions::new();
    let path = String::from("/tmp/file.lz4");
    let path_std = Path::new(&path);
    let file_lz: File = match open_options.read(true).open(path_std) {
        Ok(val) => val,
        Err(err) => panic!("{}", err),
    };
    let bufreader = BufReader::<File>::new(file_lz);
    let mut lz4_decoder = lz4_flex::frame::FrameDecoder::new(bufreader);
    // ... how to access content_size ?
}

I created /tmp/file.lz4 with

$ lz4c --version
*** LZ4 command line interface 64-bits v1.9.3, by Yann Collet ***
$ lz4c -9kv /var/log/syslog -c > /tmp/file.lz4

@PSeitz
Copy link
Owner

PSeitz commented May 28, 2024

current_frame_info in FrameDecoder is private and depends on the current frame

@jtmoon79
Copy link
Author

jtmoon79 commented May 28, 2024

There's the content_size flag in the frame format header, but it's existence is optional... So it would be the sum of the content_size.
current_frame_info in FrameDecoder is private and depends on the current frame

I'm confused. Could you please provide a code example?

@PSeitz
Copy link
Owner

PSeitz commented May 28, 2024

It's not accessible currently. It's existence is optional, so not sure what it can be used for externally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants