Quadratic scaling in expression depth when collecting expressions #16224

wence- · 2024-05-14T15:50:30Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import sys
import time
import polars as pl

expr = pl.col("a")

for i in range(int(sys.argv[1])):
    expr += pl.col("a")


df = pl.LazyFrame({"a": [1]})

q = df.select(expr)

ldf = q._ldf.optimization_toggle(
            True,
            True,
            True,
            True,
            True,
            True,
            True,
            False,
            False,
)

start = time.perf_counter()
ldf.collect()

print(f"{sys.argv[1]}: {time.perf_counter() - start}")

for i in 1 200 400 800 1600; do python run.py $i; done
1: 0.0029712169998674653
200: 0.0200193919990852
materialized names collided in common subexpression elimination.
400: 0.07028934900154127
materialized names collided in common subexpression elimination.
800: 0.28725086899794405
materialized names collided in common subexpression elimination.
1600: 1.2589194770007452

Log output

 backtrace and run without CSE
 backtrace and run without CSE
 backtrace and run without CSE

Issue description

Optimising, I think, the logical plan has performance that is quadratic in the expression depth. I think the culprit is AExpr::to_field which is called to attach a dtype at every node in the graph. But since the result is not cached, this is O(N) for a node, and done O(N) times.

Running samply python run.py 1600 shows almost all the time is in get_arithmetic_field -> to_field -> stacker::maybe_grow -> get_arithmetic_field -> ...

Expected behavior

Although this is somewhat pathological, I would expect this to be linear in the expression depth.

Installed versions

--------Version info---------
Polars:               0.20.26
Index type:           UInt32
Platform:             Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Python:               3.11.9 (main, Apr  3 2024, 16:33:49) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

ritchie46 · 2024-05-14T16:03:11Z

How does it perform if you turn off comm_subexpr_elim?

And how do you get so deep expressions? 😉

wence- · 2024-05-14T16:15:09Z

How does it perform if you turn off comm_subexpr_elim?

Same, indeed with all optimisations off it's still quadratic, just faster overall:

for i in 1 200 400 800 1600; do python slow-optimise.py ${i}; done
1: 0.0027743840000766795
200: 0.010377078000601614
400: 0.030803929999819957
800: 0.12257709299956332
1600: 0.47342776200093795

And how do you get so deep expressions? 😉

I've seen things...

It was more that I was transpiling the plan and noticed performance slowdowns when annotating the logical plan with dtypes for every expression node.

ritchie46 · 2024-05-14T18:43:10Z

Will see if we can apply some memoization in the to_field call.

ritchie46 · 2024-05-15T09:17:24Z

I am not entirely sure the tree itself isn't quadratic. I need to do double the to_field calls of the depth. Because at every depth, we branch.

wence- · 2024-05-15T10:18:37Z

I don't think so. The expression is (for $N = 4$)

((((a + a) + a) + a) + a)

Or

(+)
 | \
(+) a
 | \
(+) a
 | \
(+) a
 | \
 a  a

So if you need to call to_field on every node, that's $2 N - 1$ calls, which is why you're getting double the depth, I think. But at every node, to_field needs to recurse into the subtree if we're not memoising, or otherwise doing a one-pass bottom-up production of the dtypes of every node given a schema context.

wence- · 2024-05-15T10:24:07Z

BTW: should I expect that the AExpr processing treats expressions with common sub-expressions like DAGs, or like trees (increasing the perceived size?).

That is, if I write:

expr = pl.col("a")

expr = expr + expr
expr2 = expr + expr

Which is:

    (+)
    / \
    \ /
    (+)
    / \
    \ /
     a

Is it "seen" as:

           (+)
          /   \
        (+)   (+)
       /  |   |  \
      a   a   a   a

ritchie46 · 2024-05-20T07:24:56Z

It is seen as trees. They are turned into DAGS during execution if CSE recognized it.

wence- · 2024-05-21T08:08:06Z

It is seen as trees. They are turned into DAGS during execution if CSE recognized it.

Thanks. I suspect that means there are (perhaps pathological) cases where expression processing time can be exponential: any time there is sharing in the expression. Though perhaps that is an unlikely case for typical queries.

wence- added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 14, 2024

wence- mentioned this issue May 14, 2024

stack overflow in physical executor with very deep expression trees #16225

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic scaling in expression depth when collecting expressions #16224

Quadratic scaling in expression depth when collecting expressions #16224

wence- commented May 14, 2024 •

edited

ritchie46 commented May 14, 2024 •

edited

wence- commented May 14, 2024

ritchie46 commented May 14, 2024

ritchie46 commented May 15, 2024

wence- commented May 15, 2024

wence- commented May 15, 2024 •

edited

ritchie46 commented May 20, 2024

wence- commented May 21, 2024

Quadratic scaling in expression depth when collecting expressions #16224

Quadratic scaling in expression depth when collecting expressions #16224

Comments

wence- commented May 14, 2024 • edited

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

ritchie46 commented May 14, 2024 • edited

wence- commented May 14, 2024

ritchie46 commented May 14, 2024

ritchie46 commented May 15, 2024

wence- commented May 15, 2024

wence- commented May 15, 2024 • edited

ritchie46 commented May 20, 2024

wence- commented May 21, 2024

wence- commented May 14, 2024 •

edited

ritchie46 commented May 14, 2024 •

edited

wence- commented May 15, 2024 •

edited