Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pairwise #711

Closed
pochmann opened this issue May 17, 2023 · 2 comments
Closed

Optimize pairwise #711

pochmann opened this issue May 17, 2023 · 2 comments
Labels
pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it!

Comments

@pochmann
Copy link
Contributor

pochmann commented May 17, 2023

Or rather "Don't artificially slow it down with a generator". Times for pairwise on a million elements (with Python 3.10.8):

34.7 ± 0.2 ms  itertools
65.4 ± 0.5 ms  more_itertools

The current code:

def _pairwise(iterable):
"""Returns an iterator of paired items, overlapping, from the original
>>> take(4, pairwise(count()))
[(0, 1), (1, 2), (2, 3), (3, 4)]
On Python 3.10 and above, this is an alias for :func:`itertools.pairwise`.
"""
a, b = tee(iterable)
next(b, None)
yield from zip(a, b)
try:
from itertools import pairwise as itertools_pairwise
except ImportError:
pairwise = _pairwise
else:
def pairwise(iterable):
yield from itertools_pairwise(iterable)
pairwise.__doc__ = _pairwise.__doc__

In both cases, yield from can be replaced with return.

I saw the Grudgingly satisfy mypy commit that made them generators and I can reproduce a mypy error with that old code using functools.partial, but with my above return suggestion, mypy has no issue.

Benchmark script
import itertools
import more_itertools
from time import perf_counter as time
from statistics import mean, stdev

modules = itertools, more_itertools

times = {m: [] for m in modules}
def stats(m):
    ts = [t * 1e3 for t in sorted(times[m])[:10]]
    return f'{mean(ts):4.1f} ± {stdev(ts):3.1f} ms '

for _ in range(100):
    for m in modules:
        iterable = itertools.repeat(None, 10**6)
        t0 = time()
        pairs = m.pairwise(iterable)
        more_itertools.consume(pairs)
        times[m].append(time() - t0)

for m in sorted(modules, key=stats):
    print(stats(m), m.__name__)
@bbayles bbayles added the pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it! label May 17, 2023
@bbayles
Copy link
Collaborator

bbayles commented May 17, 2023

PR is welcome - I presume your benchmark is running pre-3.10?

@pochmann
Copy link
Contributor Author

As-is, it doesn't run in pre-3.10, since it uses itertools.pairwise. Here's a version I ran with 3.8.12 (maybe on a different machine, not comparable with the above results):

22.7 ± 0.4 ms  proposal
47.4 ± 1.3 ms  more_itertools
code
import itertools
import more_itertools
from time import perf_counter as time
from statistics import mean, stdev

class proposal:
    def pairwise(iterable):
        a, b = itertools.tee(iterable) 
        next(b, None) 
        return zip(a, b)

modules = more_itertools, proposal

times = {m: [] for m in modules}
def stats(m):
    ts = [t * 1e3 for t in sorted(times[m])[:10]]
    return f'{mean(ts):4.1f} ± {stdev(ts):3.1f} ms '

for _ in range(100):
    for m in modules:
        iterable = itertools.repeat(None, 10**6)
        t0 = time()
        pairs = m.pairwise(iterable)
        more_itertools.consume(pairs)
        times[m].append(time() - t0)

for m in sorted(modules, key=stats):
    print(stats(m), m.__name__)

bbayles added a commit that referenced this issue May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it!
Projects
None yet
Development

No branches or pull requests

2 participants