New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving TIFF in chunks #4669
Comments
Trying to replicate this,
on Ubuntu, I get
on my macOS, it is just killed. Is there some other code that you have run before the pasted code that prevents these errors? |
Interesting. That isn't the full code, but I left out what I thought wouldn't be necessary. def factors(x):
result = []
i = 1
while i*i <= x:
if x % i == 0:
result.append(i)
if x//i != i:
result.append(x//i)
i += 1
return result
def get_step(shp):
fctrs = sorted(factors(shp[0]))[::-1]
i = 0
while True:
try:
a = np.zeros((fctrs[i], fctrs[i], 4))
return fctrs[i]
except MemoryError:
pass
i += 1
def test():
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')
shp = dset.shape
step = get_step(shp)
for i in range(step, shp[0]+step, step):
a = Image.fromarray(dset[:i])
a.save("out.tiff", format="tiff", quality=80)
del a
gc.collect()
f.close()
test() What it does it it gets all the factors of the number in the shape of the numpy array (i.e. if it was shape 100,100,4 it gets the factors of 100). It then loops through the factors from highest to lowest, and finds the largest possible factor that will allow the numpy array to be split up into. EDIT: On Ubuntu, your error mentions the shape of the array. It says it's shape a = Image.fromarray(dset[:i]) to a = Image.fromarray(dset[:i,:i]) Still got the same error sadly. |
It occurred to me what the issue was while I was trying it out with CV2.
When I'm slicing the array, like for i in range(0, shp[0], step):
a = Image.fromarray(dset[i:i+step,i:i+step])
a.save("out.tiff", format="tiff", quality=80)
del a
gc.collect() What it is doing now though, is it's overwriting the previous image data every time I save it. a.save("out.tiff", format="tiff", quality=80, params={"append", True}) Making it false also doesn't work. |
The way to specify the "append" parameter that you have linked to is a.save("out.tiff", format="tiff", quality=80, append=True) However, when Pillow talks about appending, it's talking about adding another image. A second page of a PDF, for example. It may come as a surprise, but yes, TIFF can also contain multiple images. Pillow isn't currently set up to be able to help you batch process a single image and then combine the result without loading the complete image into memory. If you would like to be able to do that, this is a feature request. |
Yeah, reading the documentation pointed that out to me. And also the fact that the output tiff file is corrupted. I can only assume that every time it appends it creates a new header (or something along those lines) so when anything tries to read it, it looks corrupted. To have this feature as a feature request, would I need to open a new issue? |
No, you don't need to create a new issue. I was just pointing that out. |
When I run your initial code, import h5py
import gc
from PIL import Image
import numpy as np
def test():
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')
shp = dset.shape
step = 25000
for i in range(step, shp[0]+step, step):
a = Image.fromarray(dset[:i])
a.save("out.tiff", format="tiff", quality=80) #should error out here
del a
gc.collect()
f.close()
test() I get Traceback (most recent call last):
File "demo.py", line 20, in <module>
test()
File "demo.py", line 15, in test
a.save("out.tiff", format="tiff", quality=80, strip_size=65536*65536) #should error out here
File "PIL/Image.py", line 2440, in save
save_handler(self, fp, filename)
File "PIL/TiffImagePlugin.py", line 1857, in _save
offset = ifd.save(fp)
File "PIL/TiffImagePlugin.py", line 956, in save
result = self.tobytes(offset)
File "PIL/TiffImagePlugin.py", line 901, in tobytes
data = self._write_dispatch[typ](self, *values)
File "PIL/TiffImagePlugin.py", line 708, in <lambda>
b"".join(self._pack(fmt, value) for value in values)
File "PIL/TiffImagePlugin.py", line 708, in <genexpr>
b"".join(self._pack(fmt, value) for value in values)
File "PIL/TiffImagePlugin.py", line 675, in _pack
return struct.pack(self._endian + fmt, *values)
struct.error: 'L' format requires 0 <= number <= 4294967295 Pillow is calculating StripByteCounts as 10000000000. The tag can be a SHORT or LONG, but the maximum for LONG looks like 4294967295, less than 10000000000. So the error is just because a limit of the TIFF specification has been hit. |
You might be interested to know that because your image isn't being saved with any compression, the However, because your image isn't using any compression, the saving process is simpler. I've created #7650 to allow saving TIFF images without compression in chunks. With that PR, the following should work. from PIL import Image, TiffImagePlugin
im = Image.open("Tests/images/hopper.png")
with open("out.tiff", "wb") as fp:
for i, chunk in enumerate([
im.crop((0, 0, 128, 32)),
im.crop((0, 32, 128, 64)),
im.crop((0, 64, 128, 96)),
im.crop((0, 96, 128, 128)),
]):
if i == 0:
chunk.save(fp, "TIFF", tiffinfo={
TiffImagePlugin.IMAGEWIDTH: 128,
TiffImagePlugin.IMAGELENGTH: 128
})
else:
fp.write(chunk.tobytes()) |
Pillow 10.2.0 has now been released with #7650. @DexterHill0 is this working now? |
Following through the comments above, here is your last version. import h5py
from PIL import Image
import numpy as np
import gc
def factors(x):
result = []
i = 1
while i*i <= x:
if x % i == 0:
result.append(i)
if x//i != i:
result.append(x//i)
i += 1
return result
def get_step(shp):
fctrs = sorted(factors(shp[0]))[::-1]
i = 0
while True:
try:
a = np.zeros((fctrs[i], fctrs[i], 4))
return fctrs[i]
except MemoryError:
pass
i += 1
def test():
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (100000,100000,4), dtype=np.uint8, compression='gzip')
shp = dset.shape
step = get_step(shp)
for i in range(0, shp[0], step):
a = Image.fromarray(dset[i:i+step,i:i+step])
a.save("out.tiff", format="tiff", quality=80)
del a
gc.collect()
f.close()
test() If you run the following with Pillow 10.2.0 with a reduced final size, it runs successfully, saving TIFF in chunks. import h5py
from PIL import Image, TiffImagePlugin
import numpy as np
import gc
def factors(x):
result = []
i = 1
while i*i <= x:
if x % i == 0:
result.append(i)
if x//i != i:
result.append(x//i)
i += 1
return result
def get_step(shp):
return 2500
def test():
f = h5py.File("test.hdf5", "w")
dset = f.create_dataset("test", (10000,1000,4), dtype=np.uint8, compression='gzip')
shp = dset.shape
step = get_step(shp)
with open("out.tiff", "wb") as fp:
for i in range(0, shp[0], step):
print(i)
a = Image.fromarray(dset[i:i+step,i:i+step])
if i == 0:
a.save(fp, format="tiff", quality=80, tiffinfo={
TiffImagePlugin.IMAGEWIDTH: 10000,
TiffImagePlugin.IMAGELENGTH: 1000
})
else:
fp.write(a.tobytes())
del a
gc.collect()
f.close()
test() https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pdf
This means that |
What did you do?
I need to save to save huge image files (approx. 819200x460800 RGBA). This is too much for anyone's RAM so I have to save it in chunks from disk. I start by saving the array to an HDF5 file. I then loop over the array in large steps, and parse a slice of the array into .fromarray(). I then save this to a tiff file. Once it loops again, it will add more to the tiff file and so on.
What did you expect to happen?
It should create an tiff image that is very large.
What actually happened?
It errored out while saving, giving me the error:
What are your OS, Python and Pillow versions?
I mention the size 819200x460800 - that's the maximum possible size. I also get the error on the size shown above.
If the size of the image is lower (for instance, 10000x10000) with a step size of 1000, it will not error and will produce an output image in about 4 seconds.
The text was updated successfully, but these errors were encountered: