Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve partition #677

Closed
pochmann opened this issue Feb 20, 2023 · 7 comments
Closed

Improve partition #677

pochmann opened this issue Feb 20, 2023 · 7 comments
Labels
pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it!

Comments

@pochmann
Copy link
Contributor

pochmann commented Feb 20, 2023

I propose a new implementation. Benchmark results with the documentation's example predicate is_odd on the example's range(10) and on range(10000):

range(0, 10)
  4.98 μs ± 0.01 μs  official Python recipe
  5.24 μs ± 0.01 μs  new proposal
  6.54 μs ± 0.04 μs  current more-itertools

range(0, 10000)
  1.89 ms ± 0.01 ms  new proposal
  2.41 ms ± 0.01 ms  official Python recipe
  3.17 ms ± 0.01 ms  current more-itertools

Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]
Benchmark script
from more_itertools import partition, consume
import operator
from itertools import compress, tee, filterfalse
from timeit import timeit
from statistics import mean, stdev
import sys

def partition_new(pred, iterable):
    if pred is None:
        pred = bool
    t1, t2, p = tee(iterable, 3)
    p1, p2 = tee(map(pred, p))
    return (
        compress(t1, map(operator.not_, p1)),
        compress(t2, p2)
    )

def partition_recipe(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

funcs = partition, partition_new, partition_recipe

funcs = {
    partition: 'current more-itertools',
    partition_recipe: 'official Python recipe',
    partition_new: 'new proposal',
}

def test(args, number, unit, scale):
    print(args[1])

    # Correctness
    expect = None
    for f in funcs:
        result = list(map(list, f(*args)))
        expect = expect or result
        assert result == expect

    # Speed
    times = {f: [] for f in funcs}
    def stats(f):
        ts = [t*scale for t in sorted(times[f])[:5]]
        return f'{mean(ts):6.2f} {unit} ± {stdev(ts):4.2f} {unit} '
    for _ in range(25):
        for f in funcs:
            t = timeit(lambda: consume(map(consume, f(*args))), number=number) / number
            times[f].append(t)
    for f in sorted(funcs, key=stats):
        print(stats(f), funcs[f])
    print()

is_odd = lambda x: x % 2 != 0
iterable = range(10)
test((is_odd, range(10)), 10000, 'μs', 1e6)
test((is_odd, range(10000)), 10, 'ms', 1e3)

print('Python version:')
print(sys.version)

My new proposal:

def partition_new(pred, iterable):
    ...
    t1, t2, p = tee(iterable, 3)
    p1, p2 = tee(map(pred, p))
    return (
        compress(t1, map(operator.not_, p1)),
        compress(t2, p2)
    )

The current more-itertools implementation:

def partition(pred, iterable):
    ...
    evaluations = ((pred(x), x) for x in iterable)
    t1, t2 = tee(evaluations)
    return (
        (x for (cond, x) in t1 if not cond),
        (x for (cond, x) in t2 if cond),
    )

The author @stevecj of the current implementation said that the use of generator expressions isn't intentional:

I know that the code in this branch is different in style from other code in the project because it uses generator expressions rather than reusing other functions. It was not obvious to me, however, how to accomplish the same thing more directly with reuse than with these expressions.

Python's official recipe:

def partition(pred, iterable):
    ...
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)
@pochmann pochmann changed the title Improve partition Improve partition Feb 20, 2023
@bbayles
Copy link
Collaborator

bbayles commented Feb 20, 2023

Am I missing the comparison of the results? Thanks for the analysis.

@pochmann
Copy link
Contributor Author

What do you mean?

@bbayles
Copy link
Collaborator

bbayles commented Feb 20, 2023

I was missing it; I understand now.

It does look noticeably faster, but I'm not sure if the difference is enough to justify changing one implementation for a generally similar one. Are there people using partition in tight loops?

@pochmann
Copy link
Contributor Author

I don't know, but why wouldn't they? Or do you mean something like

for ... in ...:
    falses, trues = partition(...)

instead of this?:

falses, trues = partition(...)
for f in falses:
    ...
for t in trues:
    ...

I believe my version also saves memory, as I believe my additional tee iterators take less memory than the current implementation's additional tuples. At least for large input iterables, I saw mine take half as much memory. (But for smaller iterables, mine took a bit more memory, and I don't yet know why, so I'm not claiming a memory advantage yet.)

@rhettinger
Copy link
Contributor

rhettinger commented May 7, 2023

ISTM the premise of the current implementation is that the user pred function is slow relative to the overhead added by the two generators. For fast predicates like str.isspace and someset.__contains__, the extra overhead causes a net slowing as compared to the original filterfalse/filter version.

I like Stefan's proposed improvement because it avoids the creation and destruction of one tuple per element and because the C-speed itertools will give consistent performance relative to using pure Python generators.

FWIW, there is a another way to avoid a repeated call to pred. Just cache the result. This is supposed to be the one obvious way to avoid recomputing slow functions.

def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    pred = pred if pred is None else functools.cache(pred)
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

Note 1: If the returned iterators get consumed in parallel with one another (like we recommend for any use of tee), then a small lru_cache would suffice. Also, it would use that additional memory overhead is capped.

Note 2: The performance of the filterfalse/filter tools is better when pred is None than when pred is bool. The latter is logically equivalent, but slower.

@pochmann
Copy link
Contributor Author

pochmann commented May 7, 2023

@rhettinger Good idea to compare with Python's official recipe as well. I've included it in the first post's benchmark. Already for the docstring's is_odd = lambda x: x % 2 != 0 predicate it was faster than the current more-itertools implementation. For iterable length 10 it also was faster than my proposal, but for length 10000, mine was faster.

And here's a benchmark with str.isspace:

list('. ' * 5)
  3.15 μs ± 0.01 μs  official Python recipe
  4.24 μs ± 0.03 μs  new proposal
  5.52 μs ± 0.06 μs  current more-itertools

list('. ' * 5000)
  0.57 ms ± 0.00 ms  official Python recipe
  0.85 ms ± 0.00 ms  new proposal
  1.98 ms ± 0.03 ms  current more-itertools

Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]

I'd say the bigger issue with functools.cache is that it only works for hashable values.

Are filterfalse/filter really faster with None than with bool? Both filter and filterfalse optimize so that bool is detected and not executed, checking this:

lz->func == Py_None || lz->func == (PyObject *)&PyBool_Type

Ostensibly that checks Py_None first, but with how compilers and CPUs optimize, I'm not sure that's what actually happens, and I'm unable to measure any difference between using bool and None:

  3.95 ± 0.00 ns  filter_None
  3.95 ± 0.00 ns  filter_bool
  4.66 ± 0.00 ns  filterfalse_None
  4.66 ± 0.00 ns  filterfalse_bool

3.10.6 (main, Jan  7 2023, 10:15:17) [GCC 12.2.0]

That's time per element (using elements False for filter and True for filterfalse).

I don't know why filterfalse was slower, that might be worth looking into. Their next functions look almost identical.

Benchmark code
from timeit import timeit
from statistics import mean, stdev
from itertools import repeat, filterfalse
import sys

def filter_None():
    return filter(None, repeat(False, n))

def filter_bool():
    return filter(bool, repeat(False, n))

def filterfalse_None():
    return filterfalse(None, repeat(True, n))

def filterfalse_bool():
    return filterfalse(None, repeat(True, n))

funcs = filter_None, filter_bool, filterfalse_None, filterfalse_bool

n = 10**6

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:10]]
    return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ns '

for _ in range(1000):
    for f in funcs:
        t = timeit(lambda: next(f(), None), number=1) / n
        times[f].append(t)

for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)

print(sys.version)

Attempt This Online!

@bbayles bbayles added the pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it! label May 10, 2023
@bbayles
Copy link
Collaborator

bbayles commented May 10, 2023

I marked this as pr-welcome; feel free to submit the implementation.

pochmann added a commit to pochmann/more-itertools that referenced this issue May 15, 2023
bbayles added a commit that referenced this issue May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it!
Projects
None yet
Development

No branches or pull requests

3 participants