Improve `partition` #677

pochmann · 2023-02-20T13:18:16Z

I propose a new implementation. Benchmark results with the documentation's example predicate is_odd on the example's range(10) and on range(10000):

range(0, 10)
  4.98 μs ± 0.01 μs  official Python recipe
  5.24 μs ± 0.01 μs  new proposal
  6.54 μs ± 0.04 μs  current more-itertools

range(0, 10000)
  1.89 ms ± 0.01 ms  new proposal
  2.41 ms ± 0.01 ms  official Python recipe
  3.17 ms ± 0.01 ms  current more-itertools

Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]

Benchmark script

from more_itertools import partition, consume
import operator
from itertools import compress, tee, filterfalse
from timeit import timeit
from statistics import mean, stdev
import sys

def partition_new(pred, iterable):
    if pred is None:
        pred = bool
    t1, t2, p = tee(iterable, 3)
    p1, p2 = tee(map(pred, p))
    return (
        compress(t1, map(operator.not_, p1)),
        compress(t2, p2)
    )

def partition_recipe(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

funcs = partition, partition_new, partition_recipe

funcs = {
    partition: 'current more-itertools',
    partition_recipe: 'official Python recipe',
    partition_new: 'new proposal',
}

def test(args, number, unit, scale):
    print(args[1])

    # Correctness
    expect = None
    for f in funcs:
        result = list(map(list, f(*args)))
        expect = expect or result
        assert result == expect

    # Speed
    times = {f: [] for f in funcs}
    def stats(f):
        ts = [t*scale for t in sorted(times[f])[:5]]
        return f'{mean(ts):6.2f} {unit} ± {stdev(ts):4.2f} {unit} '
    for _ in range(25):
        for f in funcs:
            t = timeit(lambda: consume(map(consume, f(*args))), number=number) / number
            times[f].append(t)
    for f in sorted(funcs, key=stats):
        print(stats(f), funcs[f])
    print()

is_odd = lambda x: x % 2 != 0
iterable = range(10)
test((is_odd, range(10)), 10000, 'μs', 1e6)
test((is_odd, range(10000)), 10, 'ms', 1e3)

print('Python version:')
print(sys.version)

My new proposal:

def partition_new(pred, iterable):
    ...
    t1, t2, p = tee(iterable, 3)
    p1, p2 = tee(map(pred, p))
    return (
        compress(t1, map(operator.not_, p1)),
        compress(t2, p2)
    )

The current more-itertools implementation:

def partition(pred, iterable):
    ...
    evaluations = ((pred(x), x) for x in iterable)
    t1, t2 = tee(evaluations)
    return (
        (x for (cond, x) in t1 if not cond),
        (x for (cond, x) in t2 if cond),
    )

The author @stevecj of the current implementation said that the use of generator expressions isn't intentional:

I know that the code in this branch is different in style from other code in the project because it uses generator expressions rather than reusing other functions. It was not obvious to me, however, how to accomplish the same thing more directly with reuse than with these expressions.

Python's official recipe:

def partition(pred, iterable):
    ...
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

The text was updated successfully, but these errors were encountered:

bbayles · 2023-02-20T14:16:58Z

Am I missing the comparison of the results? Thanks for the analysis.

pochmann · 2023-02-20T14:19:50Z

What do you mean?

bbayles · 2023-02-20T14:28:22Z

I was missing it; I understand now.

It does look noticeably faster, but I'm not sure if the difference is enough to justify changing one implementation for a generally similar one. Are there people using partition in tight loops?

pochmann · 2023-02-20T14:45:11Z

I don't know, but why wouldn't they? Or do you mean something like

for ... in ...:
    falses, trues = partition(...)

instead of this?:

falses, trues = partition(...)
for f in falses:
    ...
for t in trues:
    ...

I believe my version also saves memory, as I believe my additional tee iterators take less memory than the current implementation's additional tuples. At least for large input iterables, I saw mine take half as much memory. (But for smaller iterables, mine took a bit more memory, and I don't yet know why, so I'm not claiming a memory advantage yet.)

rhettinger · 2023-05-07T14:59:42Z

ISTM the premise of the current implementation is that the user pred function is slow relative to the overhead added by the two generators. For fast predicates like str.isspace and someset.__contains__, the extra overhead causes a net slowing as compared to the original filterfalse/filter version.

I like Stefan's proposed improvement because it avoids the creation and destruction of one tuple per element and because the C-speed itertools will give consistent performance relative to using pure Python generators.

FWIW, there is a another way to avoid a repeated call to pred. Just cache the result. This is supposed to be the one obvious way to avoid recomputing slow functions.

def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    pred = pred if pred is None else functools.cache(pred)
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

Note 1: If the returned iterators get consumed in parallel with one another (like we recommend for any use of tee), then a small lru_cache would suffice. Also, it would use that additional memory overhead is capped.

Note 2: The performance of the filterfalse/filter tools is better when pred is None than when pred is bool. The latter is logically equivalent, but slower.

pochmann · 2023-05-07T22:50:42Z

@rhettinger Good idea to compare with Python's official recipe as well. I've included it in the first post's benchmark. Already for the docstring's is_odd = lambda x: x % 2 != 0 predicate it was faster than the current more-itertools implementation. For iterable length 10 it also was faster than my proposal, but for length 10000, mine was faster.

And here's a benchmark with str.isspace:

list('. ' * 5)
  3.15 μs ± 0.01 μs  official Python recipe
  4.24 μs ± 0.03 μs  new proposal
  5.52 μs ± 0.06 μs  current more-itertools

list('. ' * 5000)
  0.57 ms ± 0.00 ms  official Python recipe
  0.85 ms ± 0.00 ms  new proposal
  1.98 ms ± 0.03 ms  current more-itertools

Python version:
3.10.8 (main, Oct 11 2022, 11:35:05) [GCC 11.3.0]

I'd say the bigger issue with functools.cache is that it only works for hashable values.

Are filterfalse/filter really faster with None than with bool? Both filter and filterfalse optimize so that bool is detected and not executed, checking this:

lz->func == Py_None || lz->func == (PyObject *)&PyBool_Type

Ostensibly that checks Py_None first, but with how compilers and CPUs optimize, I'm not sure that's what actually happens, and I'm unable to measure any difference between using bool and None:

  3.95 ± 0.00 ns  filter_None
  3.95 ± 0.00 ns  filter_bool
  4.66 ± 0.00 ns  filterfalse_None
  4.66 ± 0.00 ns  filterfalse_bool

3.10.6 (main, Jan  7 2023, 10:15:17) [GCC 12.2.0]

That's time per element (using elements False for filter and True for filterfalse).

I don't know why filterfalse was slower, that might be worth looking into. Their next functions look almost identical.

Benchmark code

from timeit import timeit
from statistics import mean, stdev
from itertools import repeat, filterfalse
import sys

def filter_None():
    return filter(None, repeat(False, n))

def filter_bool():
    return filter(bool, repeat(False, n))

def filterfalse_None():
    return filterfalse(None, repeat(True, n))

def filterfalse_bool():
    return filterfalse(None, repeat(True, n))

funcs = filter_None, filter_bool, filterfalse_None, filterfalse_bool

n = 10**6

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:10]]
    return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} ns '

for _ in range(1000):
    for f in funcs:
        t = timeit(lambda: next(f(), None), number=1) / n
        times[f].append(t)

for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)

print(sys.version)

Attempt This Online!

bbayles · 2023-05-10T20:27:43Z

I marked this as pr-welcome; feel free to submit the implementation.

Issue #677: Improve `partition`

pochmann changed the title ~~Improve partition~~ Improve partition Feb 20, 2023

pochmann mentioned this issue Feb 23, 2023

Optimize iter_index itertools recipe python/cpython#102088

Closed

bbayles added the pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it! label May 10, 2023

pochmann added a commit to pochmann/more-itertools that referenced this issue May 15, 2023

Issue more-itertools#677: Improve partition

3142520

pochmann mentioned this issue May 15, 2023

Issue #677: Improve partition #710

Merged

bbayles added a commit that referenced this issue May 16, 2023

Merge pull request #710 from pochmann/patch-3

3ac895f

Issue #677: Improve `partition`

pochmann closed this as completed May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `partition` #677

Improve `partition` #677

pochmann commented Feb 20, 2023 •

edited

bbayles commented Feb 20, 2023

pochmann commented Feb 20, 2023

bbayles commented Feb 20, 2023

pochmann commented Feb 20, 2023

rhettinger commented May 7, 2023 •

edited

pochmann commented May 7, 2023 •

edited

bbayles commented May 10, 2023

Improve partition #677

Improve partition #677

Comments

pochmann commented Feb 20, 2023 • edited

bbayles commented Feb 20, 2023

pochmann commented Feb 20, 2023

bbayles commented Feb 20, 2023

pochmann commented Feb 20, 2023

rhettinger commented May 7, 2023 • edited

pochmann commented May 7, 2023 • edited

bbayles commented May 10, 2023

Improve `partition` #677

Improve `partition` #677

pochmann commented Feb 20, 2023 •

edited

rhettinger commented May 7, 2023 •

edited

pochmann commented May 7, 2023 •

edited