Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG images failling, being identified as MPO #34

Closed
xtagon opened this issue Mar 20, 2015 · 18 comments
Closed

JPEG images failling, being identified as MPO #34

xtagon opened this issue Mar 20, 2015 · 18 comments

Comments

@xtagon
Copy link
Contributor

xtagon commented Mar 20, 2015

Hi,

Some of our images are failing with the error "Unknown format: MPO". The images are JPEG but Pillow is for some reason thinking they are MPO. I've reported this issue to Pilllow here: python-pillow/Pillow#1138

I realize this is not your issue, it's Pillows, but I thought I'd let you know about the problem. Maybe there's a workaround you can do, where if it detects an MPO it tries to process it as a JPEG anyway? If not, I understand. Thanks for reading.

@aclark4life
Copy link

@xtagon You haven't demonstrated this issue to be Pillow's problem yet. It very well may be, but let's be sure about it first.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

@aclark4life Right you are, I didn't intend blame, that's just my best guess based on the fact that Pillow is returning "MPO" as the format, and Pilbox only throws this particular error if the format returned is not in Pilbox's list of known formats. If not Pillow's error, the blame would probably fall on my images being invalid.

@aclark4life
Copy link

@xtagon No problem. And as I said, "probably" is not particularly helpful in this case 😄. We need to know exactly what is going on and why. Additionally it sounds like pilbox may want to consider adding support for MPO in the event that "real" MPO images are discovered. That's not a workaround, it's a feature add. (Disclaimer: other than its similarity to JPEG I know nothing about MPO. Furthermore, I have no low-level knowledge of any image format; I'm the Pillow fork-author and project leader speaking on behalf of Pillow. @wiredfool is your best hope of getting programming help from the Pillow team.)

@agschwender
Copy link
Owner

Admittedly, I don't know much (anything) about MPO files, but here's what Pillow supports for it:

Pillow identifies and reads Multi Picture Object (MPO) files, loading the primary image 
when first opened. The seek() and tell() methods may be used to read other pictures
from the file. The pictures are zero-indexed and random access is supported.

So it looks like Pilbox could access the individual JPEG files that comprise the MPO file and Pilbox could resize those files, but I'm not sure that Pillow supports saving the correspondingly resized MPO file. If that's the case, the output for these file types would have to default to JPEG for only the first image in the MPO and MPO would not be an allowable output format. I suspect that would satisfy most needs, but seems less than the ideal behavior. Pilbox is really intended to return the images in format that it receives them and this would complicate the output format logic. Anyway, I'll do more research on it this weekend.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

I can't speak for everyone's needs, but that behavior would suit our needs just fine. As far as our clients are concerned they're giving us a JPEG and expect a JPEG back. If the problem images are truly MPO and not JPEG, they don't know it.

@agschwender
Copy link
Owner

From my perspective though, this is a client problem in that you should be validating the images and only serving the ones that are supported. I'm not sure what kind of validation you have in place for that, but it would need to be more than a simple extension check. For example, this could have just as easily been a JPS file, which I don't think Pillow supports. With that in mind, if you did want to support MPO, you could convert that to a JPEG and serve that as the source image to be resized.

Alternatively, I'd want to expand Pilbox to support more of these read-only formats, but I'm still not entirely convinced of the need yet.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

We don't have any validation other than MIME type/file extension checks before upload. The reason is because these images go directly from the browser to our Amazon S3 storage. The only processing that happens to these images happens via Pilbox behind a CDN. Before we did this, we had a background job triggered after each upload that loaded each image into memory and resized it with rmagick, and as much as we tried to optimize this process it was extremely slow and memory hungry. Pilbox solved this issue and we were super happy to get rid of that background worker all-together and just process images on the fly. The images are NEVER processed on our app server, only on a single Pilbox server for every app. Makes it easy to scale.

So, in order to convert from MPO to JPEG before it hits pilbox, or to load it in memory and check the EXIF/metadata, we would have to re-implement this background job (slowing down the upload workflow again, going a step backwards and costing us more to run another server process other than Pilbox). If there is a way to just make Pilbox do it's best with what it's given, we would prefer it. If that doesn't make sense for the Pilbox philosphy, let me know.

Thanks for bearing with me... :)

@agschwender
Copy link
Owner

Not sure what language your application is in, but if it's python, you could use python-magic to check the actual mime type without having to open the entire file in memory (it can just look at the first 1KB of file headers). Though I haven't confirmed that it can correctly identify these MPO files for you.

That said, if you can't accurately identify the file types, there's always going to be a risk that Pilbox can't support the supplied format (whether it supports MPO or not). In that case, you'll need to always expect and handle resizing errors. I do this in my applications by having a placeholder image drawn using a :before pseudo element and render images using background-image style property, or loading the image with javascript, or various other ways using HTML/CSS/JavaScript. As I mentioned in #35, you can also have an error/not found image that is returned via nginx/apache.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

The app dealing with uploads is written in Ruby. We can run background workers in other languages, but another thing for us is that we want images to be loadable as soon as possible after they're uploaded. If we have a background step, then when the user load's the page for the first time, they see that the images failed to load. With Pilbox only in the pipeline, all images load the first time, even if there is a small delay in processing. Still, you're right that this is going to be an issue with other image formats that purport to be JPEG.

We're working on client-side fallbacks right now. I thought we'd attack it from the server-side angle also, so thank you again for your nginx example, it will be useful. I didn't think of using a :before pseudo-element. That will be a great solution for a generic fallback-image. We'll still have to use JavaScript or Nginx rewriting to get the other functionality we want (fall back to noop of the original image if the processed image can't load).

@wiredfool provided a workaround solution. Adding the following to the top of my Pilbox instance's config.py makes my problem images process without error:

from PIL import JpegImagePlugin
JpegImagePlugin._getmp = lambda x: None

This makes img.format return "JPEG" instead of "MPO". This confirms my initial suspicion, which was that if Pillow returns it as a JPEG it would work as a JPEG.

@agschwender
Copy link
Owner

One more alternative, you can create your own custom application which catches specific errors raised from self.render_image and calls self.render_image again, but ensures that get_argument returns noop when an error has been encountered.

Here's a generic custom application:

https://github.com/agschwender/pilbox#extension

If you need more guidance on how to accomplish what you're trying to achieve with the noop operation, let me know.

@agschwender
Copy link
Owner

Actually, since it wasn't much work, here's what it would look like (warning: wrote this without any testing whatsoever)

#!/usr/bin/env python

import tornado.gen

from pilbox.app import PilboxApplication, ImageHandler, \
    start_server, parse_command_line
from pilbox.errors import ImageFormatError


class CustomApplication(PilboxApplication):
    def get_handlers(self):
        return [(r"/", CustomImageHandler)]


class CustomImageHandler(ImageHandler):
    def prepare(self):
        self.has_error = False

    @tornado.gen.coroutine
    def get(self):
        self.validate_request()
        resp = yield self.fetch_image()

        try: 
            self.render_image(resp)
        except ImageFormatError:
            self.has_error = True
            self.render_image(resp)

    def get_argument(self, name, default=None):
        if self.has_error and name == "op":
            return "noop"
        return super(CustomImageHandler, self).get_argument(name, default)


if __name__ == "__main__":
    parse_command_line()
    start_server(CustomApplication())

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

Sweet, I'll try it. I was starting to write my own but my Python is rusty.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

I got it to work (had to change def get to only take one argument). When it catches the exception, it doesn't log the original error, what's the best way to fix that?

@agschwender
Copy link
Owner

Looking at the tornado code, I believe it's just calling this:

self.log_exception(*sys.exc_info())

You'd want to put that as the first line in the except block. You'll also need to import sys at the top of the program.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

@agschwender Works great. I have it working only if fallback=noop is present in the original args. Thanks for holding my hand through this. Two last things:

  1. What's a good way to return a temporary redirect to a specific URL from my custom handler?
  2. When returning a fallback image, I don't want the CDN to cache it. Can I modify the cache headers sent in that case?

@agschwender
Copy link
Owner

No problem.

  1. The CustomHandler is just a Tornado RequestHandler so you can call anything it supports: redirect
  2. That's gonna be tough, you can put a self.set_header call in the except block before you call self.render_image, but if render_image sets any headers to the contrary, you'd be out of luck. Basically, you'd have to run the internals of self.render_image yourself and write your own headers before drawing the content. But that's dangerous because you'd be depending on "private" methods.

EDIT: That said, if I were you, I'd only do the fallback on permanent failures, e.g. ImageFormatError, that way, since they'll always fail, caching wouldn't matter, as opposed to a network related failure.

@xtagon
Copy link
Contributor Author

xtagon commented Mar 20, 2015

I'll have to rethink the caching thing then. Maybe instead of rendering a fallback, I should just redirect to the original image URL. Thanks again.

@xtagon
Copy link
Contributor Author

xtagon commented Oct 18, 2017

Hi @agschwender - I don't remember why I left this hanging but the workaround for this issue has been serving us well.

BTW I would like to thank you for all the support and for writing this library!

@xtagon xtagon closed this as completed Oct 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants