Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF #7042

Closed
philip-bl opened this issue Mar 29, 2023 · 10 comments · Fixed by #7199
Closed
Labels
Anaconda Issues with Anaconda's Pillow TIFF

Comments

@philip-bl
Copy link

philip-bl commented Mar 29, 2023

If I open an LZW-compressed binary TIFF file using pillow and load its data, pillow calls the "close" syscall on one of the two relevant file descriptors twice. The second call results in a "EBADF (Bad file descriptor)" error. The program doesn't crash and everything seems to work ok. But if multithreading is used and a race condition happens, this might crash the program.

In this example, the expected behaviour would be that there is only one "close" syscall on each of the two relevant file descriptors and that there is no "EBADF" error. Below, I show how to reproduce the problem.

$ cat create_tiff_file.py
#!/usr/bin/env python

import numpy as np
import PIL.Image as Image

Image.fromarray(np.random.rand(500, 600) > 0.9) \
    .save("example.tif", format="TIFF", compression="tiff_lzw")
$ cat read_tiff_file.py
#!/usr/bin/env python

import PIL.Image as Image

print("Right before the with-block", flush=True)
with Image.open("example.tif") as mask:
    print("Right in the beginning of the with-block", flush=True)
    mask.load()
    print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
$ mamba create -n pillow_double_close_env --override-channels --channel anaconda python=3.11  # omitting the output
$ mamba activate pillow_double_close_env  # omitting the output
$ pip install numpy pillow  # omitting the output
$ conda list
# packages in environment at /home/philip/soft/mambaforge/envs/pillow_double_close_env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    anaconda
_openmp_mutex             5.1                       1_gnu    anaconda
bzip2                     1.0.8                h7b6447c_0    anaconda
ca-certificates           2023.01.10           h06a4308_0    anaconda
certifi                   2022.9.24       py311h06a4308_0    anaconda
ld_impl_linux-64          2.38                 h1181459_1    anaconda
libffi                    3.4.2                h6a678d5_6    anaconda
libgcc-ng                 11.2.0               h1234567_1    anaconda
libgomp                   11.2.0               h1234567_1    anaconda
libstdcxx-ng              11.2.0               h1234567_1    anaconda
libuuid                   1.41.5               h5eee18b_0    anaconda
ncurses                   6.4                  h6a678d5_0    anaconda
numpy                     1.24.2                   pypi_0    pypi
openssl                   1.1.1s               h7f8727e_0    anaconda
pillow                    9.4.0                    pypi_0    pypi
pip                       22.2.2          py311h06a4308_0    anaconda
python                    3.11.0               h7a1cb2a_2    anaconda
readline                  8.2                  h5eee18b_0    anaconda
setuptools                65.5.0          py311h06a4308_0    anaconda
sqlite                    3.40.1               h5082296_0    anaconda
tk                        8.6.12               h1ccaba5_0    anaconda
tzdata                    2022a                hda174b7_0    anaconda
wheel                     0.37.1             pyhd3eb1b0_0    anaconda
xz                        5.2.10               h5eee18b_1    anaconda
zlib                      1.2.13               h5eee18b_0    anaconda
$ pip freeze
certifi @ file:///croot/certifi_1669749558946/work/certifi
numpy==1.24.2
Pillow==9.4.0
$ python create_tiff_file.py
$ strace -e decode-fds=path -e trace=openat,close ./read_tiff_file.py
# omitting all the output except for the few last lines, which are the most relevant
Right in the beginning of the with-block
close(4</pillow_double_close_bug_report/example.tif>) = 0
close(4)                                = -1 EBADF (Bad file descriptor)
close(3</pillow_double_close_bug_report/example.tif>) = 0
Right after load()
Right after the with-block
+++ exited with 0 +++

Also, I want to note that this caused my training of a neural network to crash once. It happened because Pytorch's DataLoader uses multithreading and a race condition happened.

@nurumaik
Copy link

nurumaik commented Mar 29, 2023

Updated example that actually leads to bugs

create_tiff_file.py

#!/usr/bin/env python

import numpy as np
import PIL.Image as Image

# Create really big file so it will take time to load. But even bigger file will cause libtiff to complain that we are trying to zip-bomb it
Image.fromarray(np.random.rand(5000, 30000) > 0.9) \  
    .save("example.tif", format="TIFF", compression="tiff_lzw")

reproduce.py

#!/usr/bin/env python
from threading import Thread
import os
from collections import deque

finish = False

def load_files():
    descriptors = deque()
    while not finish:
        try:
            fd = os.open("test", os.O_RDONLY)
            descriptors.append(fd)
        except:  # in case of too many descriptors
            os.close(descriptors.pop())
    
    for d in descriptors:
        os.close(d)
    
    print("success")


t = Thread(target=load_files)

import PIL.Image as Image

print("Right before the with-block", flush=True)

with Image.open("example.tif") as mask:
    print("Right in the beginning of the with-block", flush=True)
    t.start()
    mask.load()  # replace it with time.sleep(10) and it works just fine without exceptions
    print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
finish = True
t.join()

Steps to reproduce:

touch test
./create_tiff_file.py
./reproduce.py

Note: you need to delete example.tif and test before every iteration of running this. otherwise the OS might cache the files and thus the problem won't reproduce because libtiff will load file too fast

Expected behavior:
load_files outputs "success"

Actual behavior:

Right in the beginning of the with-block
Right after load()
Right after the with-block
Exception in thread Thread-1 (load_files):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "reproduce.py", line 17, in load_files
    os.close(d)
OSError: [Errno 9] Bad file descriptor

Cause of bug

  1. tiff loader creates file duplicate of file descriptor for use in external c library code

    fp = hasattr(self.fp, "fileno") and os.dup(self.fp.fileno())

  2. As comments say "libtiff closes the file descriptor, so pass in a dup.", libtiff probably closes fp after decoding here

    n, err = decoder.decode(b"fpfp")

  3. Pillow closes fp again

    os.close(fp)

If another thread decides to open new file between steps 2 and 3 (like load_files in example), it may get file descriptor that was just closed by libtiff. Then, in step 3, pillow closes this file, making descriptor invalid

@nurumaik
Copy link

nurumaik commented Mar 29, 2023

This fixes the bug, but I'm not sure that libtiff will always close the descriptor, even if some error occurs

diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..d5ea99075 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1296,6 +1296,8 @@ class TiffImageFile(ImageFile.ImageFile):
                 self.fp.seek(0)
             # 4 bytes, otherwise the trace might error out
             n, err = decoder.decode(b"fpfp")
+            # fp should be closed after call to decode
+            fp = None
         else:
             # we have something else.
             logger.debug("don't have fileno or getvalue. just reading")

@radarhere radarhere added the TIFF label Mar 29, 2023
@radarhere
Copy link
Member

Because it's unclear if libtiff will close the file descriptor or not, what about this as an alternative solution? It checks if the file descriptor is open first, by trying to read 0 bytes.

I've tested it against create_tiff_file.py and reproduce.py and it works.

diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..97d184f5f 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1305,6 +1305,9 @@ class TiffImageFile(ImageFile.ImageFile):
 
         if fp:
             try:
+                # Test if the file descriptor is still open
+                os.read(fp, 0)
+
                 os.close(fp)
             except OSError:
                 pass

@nurumaik
Copy link

nurumaik commented Mar 30, 2023

I reproduced the problem with this fix. Unfortunately, reproduction code doesn't work in all cases, and I don't know how to make a better version without hooking something in libtiff. I can only suggest storing file "test" on some very slow drive to ensure read is really slow

How this can still lead to a bug:

  1. We call os.dup on self.fd and get descriptor, let's say 4
  2. decode finishes and calls close(4)
  3. Other thread opens another file, and os gives it the same number 4 as descriptor, since it's already freed
  4. We call os.read(4, 0) on file descriptor and it successfully reads that other file
  5. We close file that doesn't belong to us

I checked libtiff code and it should always close the descriptor after performing decode. It will stay open only if libtiff contains a bug (but this will probably lead to much more serious errors anyway)

https://github.com/python-pillow/Pillow/blob/main/src/libImaging/TiffDecode.c#L723

If libtiff may not close the fd in some cases, the proper fix would be to either:

  1. Somehow report back if it closed fd or not
  2. Disable fd closing in libtiff

@radarhere
Copy link
Member

As to why we are closing the file pointer ourselves in the first place, it is because of #5936

@wiredfool
Copy link
Member

wiredfool commented Apr 7, 2023

As to why we're duping the file pointer in the firster place, I think the original logic was:

  • Libtiff always closes the pointer when read on an image is complete.
  • If we have a multiple frames in an image, we're looping over calls into libtiff to decode individual frames. If libtiff closes on the first frame, we don't get subsequent frames
  • Our original source for the file pointer may be one of a filename, tempfile, or open file like object. We can't necessarily re-open a file once it's been closed.

One option was to use python/our code for all the byte shuffling at the file level. IIRC there was a pretty decent performance and complexity improvement for cases when we did have a file pointer to be able to send that into libtiff and let it manage all of that.

(edit: yeah, that's definitely my comment style)

@crazyyao0
Copy link

crazyyao0 commented Apr 20, 2023

I met the same problem. Agree with @nurumaik.
A workaround is changing the code from

im = Image.open(file_path)

to

im = Image.open(io.BytesIO(open(file_path, "rb").read()))

@radarhere
Copy link
Member

@crazyyao0 do you have a different way to reproduce the bug that you could share in a self-contained example?

@aclark4life aclark4life added the Anaconda Issues with Anaconda's Pillow label May 19, 2023
@radarhere
Copy link
Member

To be clear, applying the suggested fix from #7042 (comment) does cause the scenario from #5936 to fail, which would otherwise have passed.

@radarhere radarhere changed the title Pillow performs the "close" syscall twice on the same file descriptor when loading a binary LZW-compressed TIFF file "close" syscall run twice on same file descriptor when loading binary LZW-compressed TIFF May 21, 2023
@radarhere
Copy link
Member

radarhere commented Jun 5, 2023

I've created PR #7199 to stop duplicating the file pointer in the first place. Does that resolve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Anaconda Issues with Anaconda's Pillow TIFF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants