Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized ImageStat.Stat.count #7599

Merged
merged 4 commits into from Dec 4, 2023

Conversation

florath
Copy link
Contributor

@florath florath commented Dec 4, 2023

The optimized function improves the performance. The new implementation uses "sum" instead of the construct
"functools.reduce(operator.add, ...)". Tests showed that the new function is about three times faster than the original. Also it is shorter and easier to read and the dependency to functools and operator modules can be removed.

Changes proposed in this pull request:

  • Performance optimized ImageStat _getcount function

The new implementation uses "sum" instead of the construct
"functools.reduce(operator.add, ...)". Test showed that the
new function is about three times faster than the original.
Also it is shorter and easier to read.

Signed-off-by: Andreas Florath <andreas@florath.net>
Signed-off-by: Andreas Florath <andreas@florath.net>
Signed-off-by: Andreas Florath <andreas@florath.net>
@florath
Copy link
Contributor Author

florath commented Dec 4, 2023

The setup and measurement was done in the same way as for #7593

Setting up a virtualenv with the original and optimized function side by side:

    def _getcount_orig(self):
        """Get total number of pixels in each layer"""

        v = []
        for i in range(0, len(self.h), 256):
            v.append(functools.reduce(operator.add, self.h[i : i + 256]))
        return v

    def _getcount(self):
        """Get total number of pixels in each layer"""

        return [sum(self.h[i: i + 256]) for i in range(0, len(self.h), 256)]

Run the tests on the ImageNet dataset which revealed a speed improvement of about 3. Here is the adapted script which can be run using the Pillow test images:

import pathlib
import timeit
from PIL import Image, ImageStat

IMAGEDIR="../Pillow/Tests/images"

testdir = pathlib.Path(IMAGEDIR)

NUMBER=10000
REPEAT=10

for image_file_name in testdir.rglob("*"):
    # Skip broken images
    try:
        img = Image.open(image_file_name)
        stat = ImageStat.Stat(img)
    except Exception as ex:
        continue

    # Check for correctness
    res_orig = stat._getcount_orig()
    res_opt = stat._getcount()
    assert res_orig == res_opt

    # Measure improvement factor
    exec_times_orig = timeit.repeat(
        stmt=stat._getcount_orig, repeat=REPEAT, number=NUMBER)
    exec_times_opt = timeit.repeat(
        stmt=stat._getcount, repeat=REPEAT, number=NUMBER)

    print("%10.4f - %s" % (
        min(exec_times_orig) / min(exec_times_opt), image_file_name))

A typical output (partial):

    3.2243 - ../Pillow/Tests/images/itxt_chunks.png
    4.0136 - ../Pillow/Tests/images/tiff_wrong_bits_per_sample_3.tiff
    4.2918 - ../Pillow/Tests/images/hopper.iccprofile.tif
    3.3560 - ../Pillow/Tests/images/mmap_error.bmp
    4.2634 - ../Pillow/Tests/images/hopper.dds
    3.7895 - ../Pillow/Tests/images/uncompressed_rgb.png
    2.8594 - ../Pillow/Tests/images/imagedraw_polygon_kite_L.png
    3.5236 - ../Pillow/Tests/images/colr_bungee.png
    3.2473 - ../Pillow/Tests/images/imagedraw_rounded_rectangle_corners_yyny.png
    3.3909 - ../Pillow/Tests/images/clipboard_target.png
    3.7058 - ../Pillow/Tests/images/bc5s.png
    4.2684 - ../Pillow/Tests/images/hopper.sgi
    3.9689 - ../Pillow/Tests/images/pil123rgba.qoi
    3.2465 - ../Pillow/Tests/images/imagedraw_polygon_kite_RGB.png
    3.5031 - ../Pillow/Tests/images/dispose_bgnd_transparency.gif
    3.1712 - ../Pillow/Tests/images/palette_sepia.png
    3.3890 - ../Pillow/Tests/images/imagedraw_rounded_rectangle_corners_ynyn.png
    4.0848 - ../Pillow/Tests/images/balloon.jpf
    2.5656 - ../Pillow/Tests/images/no_palette_with_transparency.gif
    2.8980 - ../Pillow/Tests/images/imagedraw2_text.png
    4.2880 - ../Pillow/Tests/images/test_anchor_multiline_mm_right.png

The first number is the speedup-factor: the factor how much faster the proposed function is measured against the original.

@florath
Copy link
Contributor Author

florath commented Dec 4, 2023

Hmmm.
Strange failure of the codecov. All lines of the patched function are checked. Any idea?

@hugovk
Copy link
Member

hugovk commented Dec 4, 2023

ImageStat.py is 100% covered: https://app.codecov.io/gh/python-pillow/Pillow/pull/7599/blob/src/PIL/ImageStat.py

Coverage percentage can decrease if you delete covered lines.

For example, imagine 3 covered out of 4 total lines = 75%.
Delete 1 covered line: 2 / 3 = 67%.

So we're fine for coverage :)

@nulano
Copy link
Contributor

nulano commented Dec 4, 2023

Additionally, looking at the codecov report for the PR, most of the Windows jobs are missing coverage.
Most of the missing coverage under "indirect changes" seems to be due to these missing Windows uploads.

I can see an error during the upload: https://github.com/python-pillow/Pillow/actions/runs/7084804763/job/19280550192?pr=7599#step:31:37

[2023-12-04T10:04:53.331Z] ['info'] => Project root located at: D:/a/Pillow/Pillow
[2023-12-04T10:04:53.335Z] ['info'] -> No token specified or token is empty
[2023-12-04T10:04:53.513Z] ['info'] Searching for coverage files...
[2023-12-04T10:04:53.565Z] ['info'] => Found 1 possible coverage files:
  ./coverage.xml
[2023-12-04T10:04:53.565Z] ['info'] Processing ./coverage.xml...
[2023-12-04T10:04:53.589Z] ['info'] Detected GitHub Actions as the CI provider.
[2023-12-04T10:04:54.163Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.4-uploader-0.7.1&token=*******&branch=ImageStat_getcount_opt&build=7084804763&build_url=https%3A%2F%2Fgithub.com%2Fpython-pillow%2FPillow%2Factions%2Fruns%2F7084804763&commit=90e1e945[30](https://github.com/python-pillow/Pillow/actions/runs/7084804763/job/19280550192?pr=7599#step:31:31)3a8a05f1dee3a6821bd40af2d2a384&job=Test+Windows&pr=7599&service=github-actions&slug=python-pillow%2FPillow&name=Windows+Python+3.12&tag=&flags=GHA_Windows&parent=
[2023-12-04T10:04:54.371Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

But it did not happen for all of the Windows jobs, pypy3.10 is fine.
The codecov report for main also looks complete.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Nice to drop two imports as well :)

@hugovk hugovk merged commit fe26900 into python-pillow:main Dec 4, 2023
55 of 56 checks passed
@radarhere radarhere changed the title Optimization of ImageStat.Stat._getcount method Optimized ImageStat.Stat.count Dec 4, 2023
radarhere added a commit to radarhere/Pillow that referenced this pull request Dec 6, 2023
@radarhere
Copy link
Member

I've created #7605 to include this in the release notes.

radarhere added a commit to radarhere/Pillow that referenced this pull request Dec 7, 2023
radarhere added a commit to radarhere/Pillow that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants