Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in pyproj._transformer._Transformer._transform from 3.4.1 to 3.5.0? #1268

Closed
vengroff opened this issue Apr 7, 2023 · 5 comments
Labels
proj Bug or issue related to PROJ
Milestone

Comments

@vengroff
Copy link

vengroff commented Apr 7, 2023

I believe there has been a performance regression from 3.4.1 to 3.5.0.

There are more details and a small repro at https://github.com/vengroff/pyproj-perf.

Code Sample

This code appears in repro.py in the repo mentioned above. But you can also copy and paste it from here if you wish. You should run it in a virtual env with Python 3.9 and censusdis 0.12.3, which will pull in pyproj 3.4.1. The profiling should run in under a second. But if you upgrade to pyproj 3.5.0 it will take much longer.

"""Quick repro case of performance change from pyproj 3.4.1 to 3.5.0."""
import cProfile

import censusdis.data as ced
import pyproj
from censusdis.states import ALL_STATES_AND_DC

# Get all the counties in the 50 states and DC.
gdf_counties = ced.download(
    'acs/acs5', 2020, ['NAME'], state=ALL_STATES_AND_DC, county='*',
    with_geometry=True
)

print(f"pyproj.__version__ = {pyproj.__version__}")

cProfile.run("gdf_counties.to_crs(epsg=9311)", sort='time')

Problem description

When I run these, the bulk of the run time is spent in pyproj._transformer._Transformer._transform. In the 3.4.1 case it is 0.335 seconds and in 3.5.0 it is 75.181 seconds. That's a slowdown of 224 times.

See output files https://github.com/vengroff/pyproj-perf/blob/main/perf-3.4.1.txt and https://github.com/vengroff/pyproj-perf/blob/main/perf-3.5.0.txt for details.

Expected Output

I expected upgrading to 3.5.0 to not cause any noticeable performance differences from 3.4.1. But what I saw was

If you compare the two, the 3.4.1 run looks something like

pyproj.__version__ = 3.4.1
         2678 function calls (2617 primitive calls) in 0.457 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.335    0.168    0.335    0.168 {method '_transform' of 'pyproj._transformer._Transformer' objects}
        1    0.096    0.096    0.096    0.096 {pyproj._transformer.from_crs}
        2    0.011    0.005    0.011    0.005 {built-in method shapely.lib.set_coordinates}
        2    0.005    0.002    0.005    0.002 {built-in method shapely.lib.get_coordinates}
...

3.5.0 looks like

pyproj.__version__ = 3.5.0
         2710 function calls (2649 primitive calls) in 75.326 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2   75.181   37.591   75.181   37.591 {method '_transform' of 'pyproj._transformer._Transformer' objects}
        1    0.119    0.119    0.119    0.119 {pyproj._transformer.from_crs}
        2    0.011    0.005    0.011    0.005 {built-in method shapely.lib.set_coordinates}
        2    0.005    0.002    0.005    0.002 {built-in method shapely.lib.get_coordinates}
...

Notice that in both case the bulk of the run time is spent in pyproj._transformer._Transformer._transform. But in the 3.4.1 case it is 0.335 seconds and in 3.5.0 it is 75.181 seconds. That's a slowdown of 224 times.

This is a big enough difference that it makes the censusdis.maps.plot_us function so slow that it is painful for users of censusdis.

Environment Information

Python 3.9, pyproj 3.4.1 and 3.50, on a
2021 MacBook Pro with an M1 Max processor and 64GB of
RAM.

Installation method

I used poetry, but plain pip should work fine too.

@vengroff vengroff added the bug label Apr 7, 2023
@snowman2 snowman2 added proj Bug or issue related to PROJ and removed bug labels Apr 7, 2023
@snowman2
Copy link
Member

snowman2 commented Apr 7, 2023

Likely related: OSGeo/PROJ#3661

@snowman2
Copy link
Member

snowman2 commented Jun 8, 2023

3.6.0.dev0 wheels available with PROJ 9.2.1 to test:

@snowman2
Copy link
Member

snowman2 commented Jun 9, 2023

I believe this issue is resolved with PROJ 9.2.1. See perf-3.6.0.txt

pyproj.__version__ = 3.6.0rc0
         2710 function calls (2649 primitive calls) in 0.961 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.534    0.267    0.534    0.267 {method '_transform' of 'pyproj._transformer._Transformer' objects}
        1    0.370    0.370    0.370    0.370 {pyproj._transformer.from_crs}
        2    0.026    0.013    0.026    0.013 {built-in method shapely.lib.set_coordinates}

@snowman2 snowman2 closed this as completed Jun 9, 2023
@snowman2 snowman2 added this to the 3.6.0 milestone Jun 9, 2023
@snowman2
Copy link
Member

snowman2 commented Jun 9, 2023

#1291

@vengroff
Copy link
Author

Confirmed that upgrading the dependency to pyproj 3.6.0 fixes the originally reported regression. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proj Bug or issue related to PROJ
Projects
None yet
Development

No branches or pull requests

2 participants