Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate ZIP Size #241

Merged
merged 1 commit into from
Apr 27, 2023
Merged

Calculate ZIP Size #241

merged 1 commit into from
Apr 27, 2023

Conversation

maennchen
Copy link
Owner

@maennchen maennchen commented Jan 15, 2023

Calculates the resulting file size with the minimal amount of work that is required.

Implements basics for #89

Follow up of discussion in stechstudio/laravel-zipstream#83

TODO

  • Replace \RuntimeError with proper error
  • Simplify API interface
  • Support files without resolving to stream
  • Expand Test Coverage

Sorry, something went wrong.

@maennchen maennchen self-assigned this Jan 15, 2023
@maennchen
Copy link
Owner Author

@jszobody Can you have a look if this is going into a direction that would work for you?

I introduced the parameter exactSize for the use case where the file size is known outside of the stream.

The PR is not done. I just wanted to try the points we talked about. The API for the user is quite bad at the moment and it would probably make sense to implement that completely differently.

@coveralls
Copy link

coveralls commented Jan 15, 2023

Pull Request Test Coverage Report for Build 4749066365

  • 143 of 143 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 4747870081: 0%
Covered Lines: 1044
Relevant Lines: 1044

💛 - Coveralls

@jszobody
Copy link

jszobody commented Jan 16, 2023

I love the lax/strict simulation modes, those look very useful. Nice improvement over what I was doing before.

One issue: it looks like your File class requires an open $stream right up front. If I'm adding thousands of files to a zip (quite common for our company!) this will grind to a halt. Memory issues, "too many files open" errors, etc. You need a way to accept a File without an already-opened stream, for the purposes of running a simulation first.

What do you think about optionally accepting a closure/callable for the $stream argument?

new File(
    ...
    stream: fn() => fopen($myFilePath, 'r')
);

This way you can run the simulation with all the files up front, before streams are opened. This would only work in strict mode, obviously. Then when zipping up the files for real, you can retrieve the actual stream from the closure.

My own package would then provide Adapters for different file sources, like S3. I would add files to your ZipStream with closures, and inside those closures handle the appropriate stream logic.

Thoughts?

@maennchen
Copy link
Owner Author

@jszobody I want to keep the amount of branching as low as possible to make the code as simple as possible.

I'll think about a way to add a simulated file with exactSize and completely without a stream or any other resource.

We could probably make the stream completely optional for File if the operation mode is strict simulation and have a new method addSimulatedFile($name, $comment, $exactSize) on the main class.


Do you have an idea on how to structure the API on this feature?

In the best case I would like to automatically set a content-length header if enabled. But I don't really want to store all the file parameter anywhere since that could cause some issues with larger zips / a lot of files.

@maennchen
Copy link
Owner Author

maennchen commented Feb 27, 2023

I have a better idea on the infterface of the changes:

  • When the OperationMode is set to SIMULATE_*, all files are recorded. (NORMAL mode does not record and directly stream)
  • We add a new function addCallbackFile for the concerns voiced in the Calculate ZIP Size #241 (comment)
    • In this case, the exactSize has to be supplied and a CLosure to resolve the file contents later when needed
  • finish returns the file size
  • A new function switches over the OperationMode to NORMAL after finish and can replay the recorded files
  • If headers are enabled and the file was simulated first, the Content-Length header is also set automatically.

This would look something like this:

$zip = new ZipStream(
  outputName: 'some_files.zip'
  operationMode: OperationMode::SIMULATE_STRICT
);

// Add files as usual
$zip->addFileFromPath(fileName: 'example.jpg', path: '/data/example.jpg');

// Add callback files (S3 use-case)
foreach($s3Files as $fileName => $fileSize) {
  $zip->addCallbackFile(
    fileName: $fileName,
    exactSize: $fileSize,
    callback: funtion() use($fileName) {
      // Returns `resource` / `StreamInterface` / raw data
     return getDataFromS3($fileName);
    }
}

$size = $zip->finish(); // Returns size & switches to `OperationMode` `Normal`

$zip->replaySimulation(); // Sets headers and replays recorded files

The dowside would be that if thousands of files are simulated, they could potentially clog up memory.

Alternatively, we can leave things as is in the PR and put the abstraction of it to the user.

WDYT?

In any case, I think this should be part of a new 3.1 release and we should go ahead with the release already as discussed in #237.

@maennchen maennchen added this to the v3.1.0 milestone Feb 27, 2023
@maennchen maennchen force-pushed the jm/calculate-zip-size branch 3 times, most recently from c1c6047 to 803fdf0 Compare April 19, 2023 21:55
@maennchen
Copy link
Owner Author

@jszobody Do you have any feedback to my last two comments? I would like to wrap this PR up :)

@jszobody
Copy link

@maennchen Apologies! I missed your previous update.

This approach looks very reasonable to me. Just a couple very minor thoughts on naming (which you are welcome to ignore and use whatever you think best!):

  1. First, if your method is addFileFromPath then it seems more consistent to use addFileFromCallback or just addFileFrom or even addFile. I lean towards that last one. You have explicit methods when adding from a path or stream, and one generic addFile that requires a callback to provide the file data.

2)replaySimulation feels like an odd method name if this is now building the zip and streaming it out for real. It's no longer a simulation at that point, right?

What about:

$size = $zip->finishSimulation(); // Returns size & switches to `OperationMode` `Normal`

$zip->replay(); // Sets headers and replays recorded files, building the zip for real this time

Again, super minor thoughts. I'm going to wrap this API anyway in my Laravel package so it doesn't ultimately matter all that much.

Thanks again, this is looking awesome!

@maennchen maennchen force-pushed the jm/calculate-zip-size branch 2 times, most recently from 0dcf9d4 to 3389870 Compare April 19, 2023 23:55
@maennchen
Copy link
Owner Author

@jszobody I think all the functionality should now be in place.

There's still a few details that I'm unhappy about:

  • When running a simulation and the generation, executeSimulation is called after finish. That seems a bit strange.
    • I also don't like the idea of adding finishSimulation which essentially is also just finish. That's just two functions doing the same thing.
    • Ideas...?
  • A simulation accumulates all files.
    • Should we also provide a completely stateless variation of it where executeSimulation is not available?
    • Otherwise memory could overflow for archives with really large amounts of files.

It would be cool if you could give it a spin and let me know what you think :)

@maennchen maennchen force-pushed the jm/calculate-zip-size branch from 3389870 to 577ebd8 Compare April 20, 2023 00:10
@maennchen maennchen marked this pull request as ready for review April 20, 2023 00:10
@maennchen
Copy link
Owner Author

If there's no more feedback for the moment, I'll merge it and publish it as 3.1.0-beta.

@maennchen maennchen force-pushed the jm/calculate-zip-size branch from 577ebd8 to 99b4179 Compare April 27, 2023 07:19
Calculates the resulting file size with the minimal amount of work
that is required.

Fixes #89
@maennchen maennchen force-pushed the jm/calculate-zip-size branch from 99b4179 to 54bfe26 Compare April 27, 2023 07:33
@maennchen maennchen merged commit c9a2e4f into main Apr 27, 2023
@maennchen maennchen deleted the jm/calculate-zip-size branch April 27, 2023 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants