Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG image being identified as MPO #1138

Closed
xtagon opened this issue Mar 20, 2015 · 43 comments
Closed

JPEG image being identified as MPO #1138

xtagon opened this issue Mar 20, 2015 · 43 comments

Comments

@xtagon
Copy link

xtagon commented Mar 20, 2015

Hi,

We're using Pilbox which depends on Pillow and performs image resizing as a service. A few of our images in production are failing because Pillow is returning the image format as MPO and not JPEG, so Pilbox throws an error because MPO is not a supported format.

The image identifies as a JPEG in ImageMagick, PhotoShop, etc. and opens fine in those programs as a JPEG. However, if we open the failing image in PhotoShop and save a new copy as a new JPEG file, the new image no longer reproduces the issue. Obviously there's something funky going on with the image file, but we can't identify what.

Here is a minimal test case showing the issue:

Image (should be JPEG, identifies as MPO): https://drive.google.com/file/d/0Bx-VFyfqu1DlaFlKVVk1ZDh6S2lPbGFwNUFlN3liQW1PRER3/view?usp=sharing

import sys
import PIL.Image

img = PIL.Image.open(sys.argv[1])
print(img.format)

Save the image to example.jpg and the python script to jpeg_mpo_test.py, and run:

python jpeg_mpo_test.py example.jpg

It prints "MPO" and not "JPEG".

We'd really appreciate any help in figuring this out as we are depending on these libraries on some live apps. Let me know what other information you need. Thanks in advance!

@aclark4life
Copy link
Member

Works for me. I can't reproduce:

alexclark@MH01936448MACLT:~/Developer/Pillow/ > bin/python
Python 2.7.9 (default, Dec 30 2014, 18:28:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL.Image
>>> img = PIL.Image.open('/Users/alexclark/Downloads/bob 15.jpg')
>>> img.format
'JPEG'

What Pillow and operating system version?

@aclark4life
Copy link
Member

May be something to do with MPO sharing a factory with JPEG? E.g. https://github.com/python-pillow/Pillow/blob/master/PIL/MpoImagePlugin.py#L38

@wiredfool
Copy link
Member

MPO is pretty similar to JPEG, there's an exif tag that determines it, and then there's a bit extra in the data.

If we detect MPO, we should be able to decode it. OTOH, I'm not totally sure what Pillow would do with an MPO file before we had support for it.

Edit: not seeing it here either.

@aclark4life
Copy link
Member

@wiredfool I wonder what accounts for the difference in results between myself and @xtagon. If I could reproduce, that would indicate Pillow's failure to identify the JPEG correctly. Since I can't, I wonder if some environmental factor is having some effect e.g. operating system, Pillow version, etc.

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

@aclark4life I'm double checking my environment/versions, and I'll check to see if Google Drive altered the file at all too. Stand by.

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

OS: GNU/Linux Gentoo 3.12.13 locally, and the server that found the issue is Ubuntu 14.04.1 LTS
Python 3.3.5, also tried Python 2.7.7

Okay...I pip installed Pillow 2.7.0, but PIL.VERSION in the Python REPL says 1.1.7 - is it installing the wrong version, and why would it? I tried uninstalling with pip and installing, and also tried installing from requirements.txt.

I'm still able to reproduce with this image, even if I re-download it from Google Drive, so Drive isn't altering it like I thought it might be.

@aclark4life
Copy link
Member

aclark4life commented Mar 20, 2015

How about PIL.PILLOW_VERSION?

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

Ah, thank you, that does return '2.7.0'

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

Your REPL says "darwin" suggesting you're on a Mac, could you try to reproduce on Linux?

@aclark4life
Copy link
Member

Yep, same thing. Can't reproduce:


# bin/python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL.Image
>>> img = PIL.Image.open('bob 15.jpg')
>>> img.format
'JPEG'
>>> PIL.PILLOW_VERSION
'2.7.0'

Sorry! Not sure what is going on.

@hugovk
Copy link
Member

hugovk commented Mar 20, 2015

Reproducible on Windows 7:

Python 2.7.8 (default, Jun 30 2014, 16:08:48) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL.Image
>>> img = PIL.Image.open('bob 15.jpg')
>>> img.format
'MPO'
>>> PIL.PILLOW_VERSION
'2.7.0'

@hugovk
Copy link
Member

hugovk commented Mar 20, 2015

Further, in JpegImagePlugin.py's jpeg_factory(), mpheader[45057] == 2, so it thinks it's MPO.

https://github.com/python-pillow/Pillow/blob/master/PIL/JpegImagePlugin.py#L718

mpheader in full:

{45056: '0100', 45057: 2, 45058: [{'Attribute': {'Reserved': 0L, 'ImageDataFormat': 'JPEG', 'MPType': 'Baseline MP Primary Image', 'DependentChildImageFlag': False, 'DependentParentImageFlag': True, 'RepresentativeImageFlag': False}, 'EntryNo1': 2, 'DataOffset': 0, 'EntryNo2': 0, 'Size': 4850597}, {'Attribute': {'Reserved': 0, 'ImageDataFormat': 'JPEG', 'MPType': 'Large Thumbnail (Full HD Equivalent)', 'DependentChildImageFlag': True, 'DependentParentImageFlag': False, 'RepresentativeImageFlag': False}, 'EntryNo1': 0, 'DataOffset': 4836492, 'EntryNo2': 0, 'Size': 621306}], 45059: '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 45060: 1}

@hugovk
Copy link
Member

hugovk commented Mar 20, 2015

Also reproducible in a Ubuntu 12.04 LTS VM:

Python 2.7.3 (default, Dec 18 2014, 19:10:20)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL.Image
>>> img = PIL.Image.open('bob 15.jpg')
>>> print(img.format)
MPO
>>> PIL.PILLOW_VERSION
'2.7.0'

When downloading the test image from Google Docs, make sure to click the down-arrow in the top bar to get the 5.21 MB (4320x3240) original and not the 265 KB (1105x829) resized one.

@homm
Copy link
Member

homm commented Mar 20, 2015

@aclark4life have you downloaded original image or just save the preview from Google Docs?

@homm
Copy link
Member

homm commented Mar 20, 2015

MPO is pretty same as JPEG. I just added MPO to the list of supported formats when upgraded to 2.7. But I also think what there should no be different in indication for MPO and JPEGs files. They are both JPEGs.

@aclark4life
Copy link
Member

aclark4life commented Mar 20, 2015

Ah! Now I get it, thanks all. So on to the actual problem I guess. 😄

@aclark4life
Copy link
Member

@xtagon Sorry about that! I should have examined the image better.

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

@aclark4life No problem, glad you can reproduce now :)

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

So correct me if I'm wrong, an MPO is still a valid JPEG but with extra data (more image frames)? Perhaps Pillow could report the format as JPEG but set an extra flag that indicates that it's an MPO?

@wiredfool
Copy link
Member

I suspect that if you did something like:

# Hideous monkeypatch. ymmv
from PIL import JpegImagePlugin
JpegImagePlugin._getmp = lambda: None

before importing any images, you'd never get an MPO file out of it.

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

Python 3.3.5 (default, Dec 12 2014, 18:22:37)
[GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL.Image
>>> from PIL import JpegImagePlugin
>>> JpegImagePlugin._getmp = lambda: None
>>> img = PIL.Image.open("bob 15.jpg")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/site-packages/PIL/Image.py", line 2274, in open
    % (filename if filename else fp))
OSError: cannot identify image file 'bob 15.jpg'
>>>

@wiredfool
Copy link
Member

oops, try lambda x: None. There's a self arg that I didn't see.

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

Woohoo, that returned JPEG! Let me find out if this monkeypatch is workable in Pilbox and get back to you

@xtagon
Copy link
Author

xtagon commented Mar 20, 2015

@wiredfool I dropped those lines into the top of Pilbox's config.py file, and guess what? It resized my MPO problem image just fine! Thank you so much!

@wiredfool
Copy link
Member

Sweet.

@aclark4life
Copy link
Member

🍻

@dim-0
Copy link

dim-0 commented Nov 7, 2016

The same issue now comes up in phatch under Ubuntu (16.04 and 16.10).
Obviously some changes in the library are now being reflected in depending software.

honzajavorek added a commit to honzajavorek/danube-delta that referenced this issue Feb 5, 2018
@paullovessearch
Copy link

So this is still happening. PIL.PILLOW_VERSION = 5.0.0

@jcpayne
Copy link

jcpayne commented Mar 31, 2021

I recently hit a similar problem where a malformed JPG file was identified as an MPO file. Oddly, img = PIL.Image.open(<file>) didn't throw any errors when opening the file. I was doing an AI project and I converted the file to a numpy array using img_array = np.array(img). This also worked without throwing errors, and it was only when I tried to convert the array to a Tensor that it failed. I discovered subsequently that the image wouldn't open with Preview on OS X. My interpretation is thatPIL.Image.open() allows so much variation in image file structure that it may open images that other programs can't handle. I don't know if that's good or bad, but it may be worth remembering that you can't necessarily rely on Image.open() to catch errors.

@nopria
Copy link

nopria commented Jun 30, 2021

I ran into the same problem of a JPG being identified as MPO, and solved adding to script

from PIL import JpegImagePlugin
JpegImagePlugin._getmp = lambda x: None

as suggested above. After such addition the JPG file format was correctly identified.

@radarhere
Copy link
Member

For users still encountering this issue, please attach a copy of the problem image.

@nopria
Copy link

nopria commented Jul 1, 2021

@radarhere
Here is the JPEG image reported as MPO:
test

@radarhere
Copy link
Member

When I run exiftool image.jpeg | grep 'MP Image', I get

MP Image Flags                  : Dependent child image
MP Image Format                 : JPEG
MP Image Type                   : Large Thumbnail (full HD equivalent)
MP Image Length                 : 703515
MP Image Start                  : 3278608

You can read in the Multi-Picture format documentation about the 'Dependent child image' and 'Large Thumbnail' values.

If you feel that it should be a JPEG because the extension is '.jpg', the same documentation states that

To clarify the compatible range, a file consisting only of a main image and monitor display image (extension .JPG) is a Baseline MP file, while a file including other types (extension .MPO) is an Extended MP file.

Also, as a general rule, Pillow does not interpret images according to their file extensions. It reads the data and acts accordingly.

So I think this is indeed an MPO.

@nopria
Copy link

nopria commented Jul 1, 2021

@radarhere
I'm not an image expert, but imagemagick identify gives Format: JPEG (Joint Photographic Experts Group JFIF format), so I reported it. By the way, if it is NOT a JPEG, why it is recognized by Pillow as a JPEG if I add

from PIL import JpegImagePlugin
JpegImagePlugin._getmp = lambda x: None

to my script?

@radarhere
Copy link
Member

The MPO format is based on JPEG. By adding that line of code in, you're overriding the method that detects if it is an MPO instead of a JPEG.

mpheader = im._getmp()
if mpheader[45057] > 1:
# It's actually an MPO
from .MpoImagePlugin import MpoImageFile

As to why ImageMagick doesn't identify it, I would guess it's because MPO is not in their list of supported formats - https://imagemagick.org/script/formats.php

@nopria
Copy link

nopria commented Jul 1, 2021

@radarhere
So using that workaround is not correct, it's better to accept the fact that it is an MPO with incorrect extension and deal with it.

@radarhere
Copy link
Member

...actually (and you might think this is crazy), from the way that I read the spec (or the simpler https://wiki.mobileread.com/wiki/MPO), your image is a Baseline MP, which should have a JPG extension. So the extension isn't incorrect - it's just that not every file with a JPG extension is purely in the JPEG format.

If there's something that you feel like you can't do with an MPO with Pillow that you can do with a JPEG in Pillow, let us know.

menuRivera added a commit to menuRivera/google-colab-training that referenced this issue Jan 12, 2022
… creating tf records because of this problem python-pillow/Pillow#1138, just fixed it up
bourdakos1 pushed a commit to cloud-annotations/google-colab-training that referenced this issue Jan 12, 2022
… creating tf records because of this problem python-pillow/Pillow#1138, just fixed it up (#5)
liZe added a commit to Kozea/WeasyPrint that referenced this issue Dec 27, 2022
Some cameras store multiple images in JPEG photos, using the MPO format in EXIF
metadata. As Pillow supports this format, it sets the "MPO" format attribute to
the image object, not the "JPEG" one.

This leads to a useless PNG conversion, as MPO are (often, always?) supposed to
be "normal" JPEG files.

See python-pillow/Pillow#1138.

Fix #1777.
@xinhong99
Copy link

DSC06669
i'm having the same problem with the attached pic. With MPO type, img.size reports half the width and height, and img.show() gives an error saying the file is corrupted.

@xinhong99
Copy link

i'm on pillow 10.2.0

@radarhere
Copy link
Member

radarhere commented Feb 21, 2024

Using exiftool to inspect your image, it says it is a 'Dependent child image'.

You can read in the Multi-Picture format documentation about the 'Dependent child image' and 'Large Thumbnail' values.

If you feel that it should be a JPEG because the extension is '.jpg', the same documentation states that

To clarify the compatible range, a file consisting only of a main image and monitor display image (extension .JPG) is a Baseline MP file, while a file including other types (extension .MPO) is an Extended MP file.

Also, as a general rule, Pillow does not interpret images according to their file extensions. It reads the data and acts accordingly.

So I think this is indeed an MPO. What makes you think it is not?

If I run

from PIL import Image
with Image.open("input.jpeg") as im:
    print(im)

with your image, I get <PIL.MpoImagePlugin.MpoImageFile image mode=RGB size=4912x3264 at 0x1005DF700>

When I open your in macOS Preview, or inspect it with exiftool, they both say the image is 4912px by 3264px. So I don't see a discrepancy in the size. What were you expecting? If it is different than my experience, what program are you using that makes you think it should be a different size?

I've created #7821 to fix the img.show() error. If you would like an immediate fix, you can use im.copy().show()

@radarhere
Copy link
Member

#7821 has now been merged.

@a0s
Copy link

a0s commented May 16, 2024

Lastly i found that iphone's "live photo" is detecting as MPO.

(Btw, its little tricky to get original file. If you try to send it with AirDrop - it will convert to HEIC (HEIF). When you try to send it like a file with for .e.g. Telegram - it will ask you to choose HEIC or JPEG. So the only way i found - send it as raw http form upload data. This is my case)

@wiredfool's trick switch img.format from MPO to JPEG successful.
But, img.show() shows it (with mac's Preview) like a PNG , not JPEG.

@radarhere
Copy link
Member

Lastly i found that iphone's "live photo" is detecting as MPO.

If you'd like to upload an image here, we can take a look at it, but do you have any reason to believe it's not an MPO? As I've said, MPO images might still have a '.jpg' extension

But, img.show() shows it (with mac's Preview) like a PNG , not JPEG.

https://pillow.readthedocs.io/en/stable/reference/ImageShow.html states that

All default viewers convert the image to be shown to PNG format.

The fact that the PNG format is used is intended as a feature, not a bug. Pillow supports some obscure formats, and not all applications support opening them. For example, on my Mac, Preview can't display MIC files, but it can display PNG files. By standardising the format, the support becomes better. 'But wait', you say, 'JPEG isn't obscure!' - true, but JPEG is a lossy format. This means that when an image is saved, it doesn't come out exactly the same. PNG, however, is lossless, meaning it does. https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.show says im.show() is

This method is mainly intended for debugging purposes.

If I'm debugging an image, I want the saved file to be exactly the same, not slightly different.

If you really do want to be able to show() in the JPEG format, you can

from PIL import Image, ImageShow
im = Image.open("Tests/images/hopper.jpg")

viewer = ImageShow.MacViewer()
del viewer.options["save_all"]
viewer.format = "JPEG"
viewer.show(im)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests