Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving TIFF in chunks #4669

Open
DexterHill0 opened this issue Jun 5, 2020 · 10 comments
Open

Saving TIFF in chunks #4669

DexterHill0 opened this issue Jun 5, 2020 · 10 comments

Comments

@DexterHill0
Copy link

DexterHill0 commented Jun 5, 2020

What did you do?

I need to save to save huge image files (approx. 819200x460800 RGBA). This is too much for anyone's RAM so I have to save it in chunks from disk. I start by saving the array to an HDF5 file. I then loop over the array in large steps, and parse a slice of the array into .fromarray(). I then save this to a tiff file. Once it loops again, it will add more to the tiff file and so on.

What did you expect to happen?

It should create an tiff image that is very large.

What actually happened?

It errored out while saving, giving me the error:

ile "c:\Users\Dexter\Desktop\workspace\test.py", line 68, in test
    a.save("ff.tiff", format="tiff", quality=1)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\Image.py", line 2134, in save
    save_handler(self, fp, filename)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 1629, in _save
    offset = ifd.save(fp)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 865, in save
    result = self.tobytes(offset)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 808, in tobytes
    data = self._write_dispatch[typ](self, *values)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 644, in <lambda>
    b"".join(self._pack(fmt, value) for value in values)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 644, in <genexpr>
    b"".join(self._pack(fmt, value) for value in values)
  File "C:\Users\Dexter\AppData\Local\Programs\Python\Python38\lib\site-packages\PIL\TiffImagePlugin.py", line 611, in _pack
    return struct.pack(self._endian + fmt, *values)
struct.error: argument out of range

What are your OS, Python and Pillow versions?

  • OS: Windows 10 V. 1909
  • Python: 3.8.1
  • Pillow: Latest (7.1.2)
def test():
    f = h5py.File("test.hdf5", "w")
    dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')

    shp = dset.shape
    step = 25000

    for i in range(step, shp[0]+step, step):
        a = Image.fromarray(dset[:i])
        a.save("out.tiff", format="tiff", quality=80) #should error out here
        del a
        gc.collect()

    f.close()
test()

I mention the size 819200x460800 - that's the maximum possible size. I also get the error on the size shown above.
If the size of the image is lower (for instance, 10000x10000) with a step size of 1000, it will not error and will produce an output image in about 4 seconds.

@radarhere
Copy link
Member

radarhere commented Jun 6, 2020

Trying to replicate this,
on Windows, I get

ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

on Ubuntu, I get

MemoryError: Unable to allocate array with shape (25000, 100000, 4) and data type uint8

on my macOS, it is just killed.

Is there some other code that you have run before the pasted code that prevents these errors?

@DexterHill0
Copy link
Author

DexterHill0 commented Jun 6, 2020

Interesting. That isn't the full code, but I left out what I thought wouldn't be necessary.
This is all of the code:

def factors(x):
    result = []
    i = 1
    while i*i <= x:
        if x % i == 0:
            result.append(i)
            if x//i != i:
                result.append(x//i)
        i += 1
    return result
def get_step(shp):
    fctrs = sorted(factors(shp[0]))[::-1]
    i = 0
    while True:
        try:
	    a = np.zeros((fctrs[i], fctrs[i], 4))
	    return fctrs[i]
	except MemoryError:
	    pass
        i += 1

def test():
    f = h5py.File("test.hdf5", "w")
    dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')

    shp = dset.shape	
    step = get_step(shp)

    for i in range(step, shp[0]+step, step):
        a = Image.fromarray(dset[:i])
	a.save("out.tiff", format="tiff", quality=80)
	del a
	gc.collect()

    f.close()
test()

What it does it it gets all the factors of the number in the shape of the numpy array (i.e. if it was shape 100,100,4 it gets the factors of 100). It then loops through the factors from highest to lowest, and finds the largest possible factor that will allow the numpy array to be split up into.
This means, not only (should) will it not run out of memory, the numpy array will be split up evenly.

EDIT: On Ubuntu, your error mentions the shape of the array. It says it's shape (25000, 100000, 4). If I wanted it in chunks then technically I would want it like (25000, 25000, 4).
I changed this line:

a = Image.fromarray(dset[:i])

to

a = Image.fromarray(dset[:i,:i])

Still got the same error sadly.

@DexterHill0
Copy link
Author

DexterHill0 commented Jun 6, 2020

It occurred to me what the issue was while I was trying it out with CV2.
In the for loop, if I print the values (step=25000), it goes:

25000
50000
75000
10000

When I'm slicing the array, like arr[:i] that means slice from 0 to i. So once it gets to a value say, 75000, its slicing between 0 and 75000 and that's too much for memory, so it errors.
With a smaller array, you would be fooled into thinking it's working because it's a small array so the whole thing can be stored in memory. The for loop is now this:

for i in range(0, shp[0], step):	
	a = Image.fromarray(dset[i:i+step,i:i+step])
	a.save("out.tiff", format="tiff", quality=80)
	del a
	gc.collect()

What it is doing now though, is it's overwriting the previous image data every time I save it.
In the source code of the save function, I see an if statement that checks if there's an "append" parameter. I tried including that but it didn't work.

a.save("out.tiff", format="tiff", quality=80, params={"append", True})

Making it false also doesn't work.

@radarhere
Copy link
Member

The way to specify the "append" parameter that you have linked to is

a.save("out.tiff", format="tiff", quality=80, append=True)

However, when Pillow talks about appending, it's talking about adding another image. A second page of a PDF, for example. It may come as a surprise, but yes, TIFF can also contain multiple images.

Pillow isn't currently set up to be able to help you batch process a single image and then combine the result without loading the complete image into memory. If you would like to be able to do that, this is a feature request.

@DexterHill0
Copy link
Author

Yeah, reading the documentation pointed that out to me. And also the fact that the output tiff file is corrupted. I can only assume that every time it appends it creates a new header (or something along those lines) so when anything tries to read it, it looks corrupted.

To have this feature as a feature request, would I need to open a new issue?

@radarhere
Copy link
Member

No, you don't need to create a new issue. I was just pointing that out.

@radarhere radarhere changed the title Saving tiff in chunks errors out (struct.error: argument out of range) Saving TIFF in chunks Jun 6, 2020
@radarhere radarhere added the TIFF label Jun 11, 2020
@radarhere
Copy link
Member

radarhere commented Dec 29, 2023

When I run your initial code,

import h5py
import gc
from PIL import Image
import numpy as np

def test():
    f = h5py.File("test.hdf5", "w")
    dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')

    shp = dset.shape
    step = 25000

    for i in range(step, shp[0]+step, step):
        a = Image.fromarray(dset[:i])
        a.save("out.tiff", format="tiff", quality=80) #should error out here
        del a
        gc.collect()

    f.close()
test()

I get

Traceback (most recent call last):
  File "demo.py", line 20, in <module>
    test()
  File "demo.py", line 15, in test
    a.save("out.tiff", format="tiff", quality=80, strip_size=65536*65536) #should error out here
  File "PIL/Image.py", line 2440, in save
    save_handler(self, fp, filename)
  File "PIL/TiffImagePlugin.py", line 1857, in _save
    offset = ifd.save(fp)
  File "PIL/TiffImagePlugin.py", line 956, in save
    result = self.tobytes(offset)
  File "PIL/TiffImagePlugin.py", line 901, in tobytes
    data = self._write_dispatch[typ](self, *values)
  File "PIL/TiffImagePlugin.py", line 708, in <lambda>
    b"".join(self._pack(fmt, value) for value in values)
  File "PIL/TiffImagePlugin.py", line 708, in <genexpr>
    b"".join(self._pack(fmt, value) for value in values)
  File "PIL/TiffImagePlugin.py", line 675, in _pack
    return struct.pack(self._endian + fmt, *values)
struct.error: 'L' format requires 0 <= number <= 4294967295

Pillow is calculating StripByteCounts as 10000000000. The tag can be a SHORT or LONG, but the maximum for LONG looks like 4294967295, less than 10000000000. So the error is just because a limit of the TIFF specification has been hit.

@radarhere
Copy link
Member

radarhere commented Dec 29, 2023

You might be interested to know that because your image isn't being saved with any compression, the quality argument isn't having any effect.

However, because your image isn't using any compression, the saving process is simpler. I've created #7650 to allow saving TIFF images without compression in chunks.

With that PR, the following should work.

from PIL import Image, TiffImagePlugin

im = Image.open("Tests/images/hopper.png")
with open("out.tiff", "wb") as fp:
  for i, chunk in enumerate([
    im.crop((0, 0, 128, 32)),
    im.crop((0, 32, 128, 64)),
    im.crop((0, 64, 128, 96)),
    im.crop((0, 96, 128, 128)),
  ]):
    if i == 0:
      chunk.save(fp, "TIFF", tiffinfo={
        TiffImagePlugin.IMAGEWIDTH: 128,
        TiffImagePlugin.IMAGELENGTH: 128
      })
    else:
      fp.write(chunk.tobytes())

@radarhere
Copy link
Member

Pillow 10.2.0 has now been released with #7650.

@DexterHill0 is this working now?

@radarhere
Copy link
Member

radarhere commented Jan 21, 2024

Following through the comments above, here is your last version.

import h5py
from PIL import Image
import numpy as np
import gc

def factors(x):
    result = []
    i = 1
    while i*i <= x:
        if x % i == 0:
            result.append(i)
            if x//i != i:
                result.append(x//i)
        i += 1
    return result

def get_step(shp):
    fctrs = sorted(factors(shp[0]))[::-1]
    i = 0
    while True:
        try:
            a = np.zeros((fctrs[i], fctrs[i], 4))
            return fctrs[i]
        except MemoryError:
            pass
        i += 1

def test():
    f = h5py.File("test.hdf5", "w")
    dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')

    shp = dset.shape
    step = get_step(shp)

    for i in range(0, shp[0], step):    
        a = Image.fromarray(dset[i:i+step,i:i+step])
        a.save("out.tiff", format="tiff", quality=80)
        del a
        gc.collect()

    f.close()
test()

If you run the following with Pillow 10.2.0 with a reduced final size, it runs successfully, saving TIFF in chunks.

import h5py
from PIL import Image, TiffImagePlugin
import numpy as np
import gc

def factors(x):
    result = []
    i = 1
    while i*i <= x:
        if x % i == 0:
            result.append(i)
            if x//i != i:
                result.append(x//i)
        i += 1
    return result

def get_step(shp):
    return 2500

def test():
    f = h5py.File("test.hdf5", "w")
    dset = f.create_dataset("test", (10000,1000,4), dtype=np.uint8, compression='gzip')

    shp = dset.shape
    step = get_step(shp)

    with open("out.tiff", "wb") as fp:
        for i in range(0, shp[0], step):
            print(i)
            a = Image.fromarray(dset[i:i+step,i:i+step])
            if i == 0:
                a.save(fp, format="tiff", quality=80, tiffinfo={
                    TiffImagePlugin.IMAGEWIDTH: 10000,
                    TiffImagePlugin.IMAGELENGTH: 1000
                })
            else:
                fp.write(a.tobytes())
            del a
            gc.collect()

    f.close()
test()

https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pdf

The largest possible TIFF file is 2**32 bytes in length.

This means that (100000, 100000, 4) cannot be saved as an uncompressed TIFF file.

@radarhere radarhere reopened this Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants