Migrate most shrinker functions to the ir #3962

tybug · 2024-05-01T22:49:50Z

part of #3921.

This is a big one, apologies! Hard to avoid; the remaining functions had cascading dependencies on each other.

I've benchmarked these changes by hooking minimal with derandomize=False and 5 trials.

latest (866256e)

old (a9f50c6)

old (d9a077d)

old (f7d0d29)

old (d81d50e)

old (901d636)

I think there are a few places we are still not optimal:

doubling up on Integer.shrink from the negative and positive direction (see review comment)
to know whether an ir tree A is better than B ahead of time, we have to convert both to a buffer and sort_key the buffers, which costs a call. This will be improved when we move our ordering function to the ir tree.

Conversations of note:

disabling shrinking dfas (Migrate most shrinker functions to the ir #3962 (comment))

tybug · 2024-05-01T22:51:02Z

hypothesis-python/src/hypothesis/internal/conjecture/data.py

@@ -2292,12 +2327,16 @@ def _pop_ir_tree_node(self, ir_type: IRTypeName, kwargs: IRKWargsType) -> IRNode
        # (in fact, it is possible that giving up early here results in more time
        # for useful shrinks to run).
        if node.ir_type != ir_type:
+            # needed for try_shrinking_nodes to see what node was *attempted*
+            # to be drawn.
+            self.invalid_at = (ir_type, kwargs)
            self.mark_invalid(f"(internal) want a {ir_type} but have a {node.ir_type}")


Now that we've moved to a stronger typed representation in the ir, we do pretty badly on a class of problems: crossing typed boundaries. In particular shrinking one_of(A, B) where A has a different type/kwargs than B. Before, we would shrink the integer for one_of causing it to choose A instead of B. A interprets the bits differently than B, but we would get lucky most of the time and find an counterexample; and now that we're in A we do more robust shrinking. In the ir we're much more strict and mark_invalid as soon as we have a misalignment, so the initial coincidental slip from B to A never happens.

One thing that works here is drawing randomly instead of giving up when we hit a misalignment, which I would say is basically equivalent to master behavior.

Concretely:

# condition avoids trivial counterexamples assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 0]) == 102

fails on this branch and passes on master (I'll add this as a test case as well).

idle thoughts: I wonder if a short "randomly generate ir trees smaller than our current value" phase when the shrinker would otherwise exit as stuck would get us out of many of these coincidental slips. Possibly not, since if we didn't have to exploit some fixed structure while shrinking, we likely would have found this counterexample during the generation phase.

I haven’t touched Hypothesis in a few years, so take this with a grain of salt, but I vaguely recall that having the shrinker introduce randomness into the data stream has tended to be a bad idea.

Normally, the shrinker tends to make its input both shorter and simpler over time, despite the fact that (999 random bytes) is technically “smaller” than (1000 zero bytes).

But if the shrinker starts reintroducing randomness, this falls apart, and the shrinker will get obsessed with some high-entropy input because it happens to be shorter the current low-entropy input.

Seconded on this. Specifically the problem that tends to result is that the performance becomes terrible, because you end up repeatedly reintroducing lots of expensive-to-shrink regions in order to make minor size gains.

My guess is that you could possibly fix this with some sort of low-entropy generation mode (e.g. always generate one of the two smallest values for a given type) but I'm not sure how robust that would be and it potentially still has the same problem.

That intuition makes sense to me. I think performance may still be ok if we only introduce randomness as a last-resort phase in the shrinker when we are otherwise at a fixpoint. iirc the shrinker used to have some dynamic enabling logic similar to this. Of course we're now complicating the shrink pass logic and have to justify the tradeoff in complexity as worthwhile. Alternatively low-entropy generation also seems reasonable.

It just feels quite bad to not be able to shrink st.integers() | st.floats() because we happened to hit an error in the floats region first, and we are seeing a few regressions in the shrink quality test suite due to this (though how much that maps to reality is unclear to me).

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

tybug · 2024-05-01T22:57:09Z

hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py

+            # try shrinking from both sides towards shrink_towards
+            shrink_towards = kwargs["shrink_towards"]
+            Integer.shrink(
+                abs(shrink_towards - value),
+                lambda n: self.try_shrinking_nodes(nodes, shrink_towards + n),
+            )
+            Integer.shrink(
+                abs(shrink_towards - value),
+                lambda n: self.try_shrinking_nodes(nodes, shrink_towards - n),
+            )


This is pretty brute force and does ~double the shrinks necessary in the worst case.

I think master behavior for integer shrinking is something like "shrink towards 0 on the current sign (only positive or only negative) and occasionally try and flip the sign bit". I think master gets a bit luckier/smarter than this in practice, because a straight-line translation of that here shrinks minimal(st.integers(), lambda n: not (-1 < n < 2)) to 2 instead of -1; to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1.

A complicating factor here is shrinking behavior is different for bounded and unbounded integers:

assert minimal(st.integers(), lambda n: not (-1 < n < 2)) == -1 assert minimal(st.integers(-5, 5), lambda n: not (-1 < n < 2)) == 2 assert minimal(st.integers(max_value=5), lambda n: not (-1 < n < 2)) == -1 assert minimal(st.integers(min_value=-5), lambda n: not (-1 < n < 2)) == -1

which we can't change until we move the ordering to the ir.

to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1

I suspect it's getting lucky with some sort of punning here, as the integers strategy is a mixture under the hood. I don't think it can reliably do this strategy - there's no backtracking in the shrinker - but it might be doing something that causes previous bytes for a larger integer width to be reinterpreted as a negative integer.

yeah, good point. master fails on a trivial extension of this problem:

assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 102, 0]) == 103

so it's just getting lucky.

Zac-HD · 2024-05-03T09:13:41Z

Can we get a secondary (log) y-axis on the plot showing proportional change, in addition to the absolute number? +800 calls is quite different on a base of 50 to 2,000!

hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py

tybug · 2024-05-03T20:38:47Z

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

Zac-HD · 2024-05-03T21:40:29Z

I've updated the graph to include an n✕ change axis. I'm looking into the worst discrepancies there next.

Fwiw this looks fine to me - the large-negative multipliers are relative to a small baseline, and the tails with large absolute changes aren't that large in relative terms. Might as well look into it, but don't stress too much + consider doing so as a follow-up later.

tybug · 2024-05-04T01:08:31Z

agree it doesn't look hugely out of line, but I won't feel confident in this pull until I look into it, for my own sake more than anything 😄

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

Zac-HD · 2024-05-04T07:05:41Z

I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come.

Nice! You could get a buffer-without-dfas baseline if you wanted to, but I don't think this is really necessary - you're already measuring the change as we're going to ship it, and it's looking good. I guess in principle the downside might be some loss of shrink quality, but I don't expect anyone to notice that.

this fixes a nasty flaky error that I spent many hours tracking down

tybug · 2024-05-25T20:30:47Z

@Zac-HD how important would you say the test_removes_needless_steps and test_prints_equal_values_with_correct_variable_name stateful failures (eg) are as blockers? It appears stateful shrinking with bundles may have gotten worse. It's going to take me a while to track down why though, because the ir trace for bundle draws isn't exactly easy to follow.

Zac-HD · 2024-05-25T23:14:06Z

How important are test_removes_needless_steps and test_prints_equal_values_with_correct_variable_name stateful failures (eg) are as blockers? It appears stateful shrinking with bundles may have gotten worse.

I'm OK with taking a regression on these, since I'm confident that we can fix it later and we're seeing a substantial win overall. Just make sure to track them in an issue, along with the other shrinker issues noted in comments / review below.

Zac-HD

I read through the whole diff, and it looks good to me - I've tagged everywhere that I thought we might want to revisit later, but overall I'll be happy with minimal changes to get the tests passing.

And- this is a huge contribution. Thanks, Liam 😍

hypothesis-python/RELEASE.rst

hypothesis-python/src/hypothesis/internal/conjecture/data.py

hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py

hypothesis-python/tests/conjecture/test_engine.py

hypothesis-python/tests/conjecture/test_minimizer.py

hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py

tybug · 2024-05-26T04:16:01Z

ah good, now we get to the fun step of debugging newly flaky tests 🙂

tybug · 2024-05-26T04:29:23Z

looks like an actual failure: https://github.com/HypothesisWorks/hypothesis/actions/runs/9240572106/job/25421012169?pr=3962. We're shrinking to 102 instead of 101. I won't be getting to this tonight.

tybug · 2024-05-27T19:19:22Z

I pushed two fixes:

41b15e3 where we tried out of bounds integers while shrinking, leading to assertion errors. we should move our assertions about forced values in draw_* to assert ir_value_permitted(forced, ir_type, kwargs) (have a personal todo for this and will follow up)
866256e which fixes the above failure (explanation below).

Updated graphs in description.

Here's the cause of the above failure test_reports_target_results:

we generate two counterexamples: x=101 and x=n > 101, in that order. x=101's buffer is very large because of e.g. failed probes or differing size buckets.
we try to shrink x=n, and eventually try x=101 while shrinking. But we had already cached the ir tree [101] as corresponding to a large buffer above, so we're "blocked" from progressing any farther than x=102.

We could potentially solve a large number of these occurrences by removing discarded portions of the buffer before caching. A more general approach which also handles "larger because drawing from different size buckets" is not caching interesting buffer datas at all and relying on the ir tree corresponding to the smallest buffer via forced. I think this is the right approach given that the vast majority of datas with status.INTERESTING are on the ir via the shrinker, so are fine to cache. The overhead is one or two cache misses when trying the same interesting buffer data.

reproducer

@given(st.integers(min_value=0))
@settings(database=None)
def f(x):
    assert x <= 100

for i in range(500):
    print("-" * 50, i)
    try:
        f()
    except Exception as e:
        assert "x=101" in "".join(traceback.format_exception(e))

tybug · 2024-05-27T19:42:30Z

oops, I read the wrong number — shrinking is a mean of 2.12x faster, but a median of 1.38x faster. I fixed the release notes. I think the median is the more useful and realistic number to report, though getting big wins on the tail end is of course also important.

tybug · 2024-05-29T00:37:03Z

I looked into the stateful regression and I'm confident we can improve it in the near future. The short of it is we'll need to support slips from the integer 0 to the boolean False, and realign pseudo choices. ie it should be allowed to draw min_value=0 max_value=0 from an ir tree with a node of value=2.

I've completely skipped the problematic tests for now.

Zac-HD

🎊

migrate most shrinker functions to the ir

0ba943c

tybug requested a review from Zac-HD as a code owner May 1, 2024 22:49

tybug commented May 1, 2024

View reviewed changes

tybug added 3 commits May 2, 2024 01:04

format

9e3dbc0

temporarily hack around test_sets_of_fixed_length

b02a67d

fix region deletion logic using start instead of ir_start

901d636

jobh reviewed May 3, 2024

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py Outdated Show resolved Hide resolved

tybug added 2 commits May 3, 2024 21:02

cache called ir_tree_nodes in addition to computed ones

50b2b13

disable dfa shrinking

d81d50e

tybug commented May 4, 2024

View reviewed changes

hypothesis-python/src/hypothesis/internal/conjecture/engine.py Outdated Show resolved Hide resolved

track ir tree datas in the DataTree

f7d0d29

tybug mentioned this pull request May 8, 2024

Improve caching for misaligned ConjectureDatas #3977

Merged

tybug added 13 commits May 9, 2024 19:18

Merge branch 'master' into shrinker-ir

f826f69

format

bc418d9

Merge branch 'master' into shrinker-ir

12d0019

respect forced status in datatree simulate for invalid nodes

172dbd9

this fixes a nasty flaky error that I spent many hours tracking down

only set invalid_at when required

a1b1732

track and report misaligned shrinking counts

627af8f

check initial value in minimize_nodes

870c20b

guard against out of bounds replace_all

e88ce7e

normalize to minimal buffer when shrinking

aeccbc5

check ir_value_permitted for all trees

b69cf17

fix invalid_at tests

866c846

improve datatree printing

d26fd9b

disable dfa shrinking tests for now

1fe712e

tybug added 3 commits May 25, 2024 16:01

add shrinker coverage tests and pragmas

31c445a

Merge branch 'master' into shrinker-ir

7c1f554

more clear float shrink logic

fdeeb0d

tybug added 2 commits May 25, 2024 17:31

add shrinking benchmark code

9ad13e4

add release notes

1b5093b

Zac-HD reviewed May 26, 2024

View reviewed changes

tybug added 4 commits May 25, 2024 23:45

strengthen test case, copy editing

674c8e6

add TODO_BETTER_SHRINK to where we can improve shrinking

237332f

extract function

6112e22

clarify comment

a9f50c6

try debug

25dbf2d

tybug added 3 commits May 26, 2024 14:50

refactor slightly

6a981fc

don't permit >128 bit integers

41b15e3

dont cache interesting buffer-based datas

866256e

minor fixes, reword release notes

66f2464

temporarily skip stateful shrink quality tests

3814b54

Zac-HD approved these changes May 29, 2024

View reviewed changes

Zac-HD enabled auto-merge May 29, 2024 02:55

Zac-HD merged commit 822e39d into HypothesisWorks:master May 29, 2024
54 checks passed

tybug deleted the shrinker-ir branch May 29, 2024 03:45

This was referenced May 29, 2024

Migrate our core representation to an IR layer #3921

Open

Follow up on IR shrinking tasks #4006

Open

Account for time spent in garbage collection #3979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate most shrinker functions to the ir #3962

Migrate most shrinker functions to the ir #3962

tybug commented May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug May 2, 2024

Zalathar May 3, 2024

DRMacIver May 3, 2024

tybug May 8, 2024

tybug May 1, 2024 •

edited

DRMacIver May 4, 2024

tybug May 25, 2024

Zac-HD commented May 3, 2024

tybug commented May 3, 2024

Zac-HD commented May 3, 2024

tybug commented May 4, 2024

Zac-HD commented May 4, 2024

tybug commented May 25, 2024 •

edited

Zac-HD commented May 25, 2024

Zac-HD left a comment

tybug commented May 26, 2024

tybug commented May 26, 2024

tybug commented May 27, 2024

tybug commented May 27, 2024

tybug commented May 29, 2024 •

edited

Zac-HD left a comment

Migrate most shrinker functions to the ir #3962

Migrate most shrinker functions to the ir #3962

Conversation

tybug commented May 1, 2024 • edited

tybug May 1, 2024 • edited

Choose a reason for hiding this comment

tybug May 2, 2024

Choose a reason for hiding this comment

Zalathar May 3, 2024

Choose a reason for hiding this comment

DRMacIver May 3, 2024

Choose a reason for hiding this comment

tybug May 8, 2024

Choose a reason for hiding this comment

tybug May 1, 2024 • edited

Choose a reason for hiding this comment

DRMacIver May 4, 2024

Choose a reason for hiding this comment

tybug May 25, 2024

Choose a reason for hiding this comment

Zac-HD commented May 3, 2024

tybug commented May 3, 2024

Zac-HD commented May 3, 2024

tybug commented May 4, 2024

Zac-HD commented May 4, 2024

tybug commented May 25, 2024 • edited

Zac-HD commented May 25, 2024

Zac-HD left a comment

Choose a reason for hiding this comment

tybug commented May 26, 2024

tybug commented May 26, 2024

tybug commented May 27, 2024

tybug commented May 27, 2024

tybug commented May 29, 2024 • edited

Zac-HD left a comment

Choose a reason for hiding this comment

tybug commented May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug May 1, 2024 •

edited

tybug commented May 25, 2024 •

edited

tybug commented May 29, 2024 •

edited