Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error: tests using unique collections containing nan are flaky #3926

Closed
Zac-HD opened this issue Mar 17, 2024 · 5 comments · Fixed by #3931
Closed

Internal error: tests using unique collections containing nan are flaky #3926

Zac-HD opened this issue Mar 17, 2024 · 5 comments · Fixed by #3931
Labels
bug something is clearly wrong here flaky-tests for when our tests only sometimes pass

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Mar 17, 2024

e.g. https://github.com/HypothesisWorks/hypothesis/actions/runs/8313487434/job/22749533612?pr=3924#step:6:581


I've also hit a weird pytest crash a couple of times, but it looks like psf/black#4224 just hasn't been released yet.

@Zac-HD Zac-HD added flaky-tests for when our tests only sometimes pass tests/build/CI about testing or deployment *of* Hypothesis labels Mar 17, 2024
@tybug
Copy link
Member

tybug commented Mar 17, 2024

I haven't tried to bisect this yet, but it's possible this regressed in implement fake_forced (#3806). I saw flaky raised internally when I had a bug in my implementation locally there. At least we have the strategy definition from this run, though it's a bit of a beast:

while generating 'Draw 1: ' from frozensets(one_of(builds(BaseExceptionGroup, text(), lists(from_type(builtins.BaseException), min_size=1, max_size=5)).filter(_can_hash), none().filter(_can_hash), just(NotImplemented).filter(_can_hash), builds(UnicodeDecodeError, just('unknown encoding'), just(b''), just(0), just(0), just('reason')).filter(_can_hash), builds(UnicodeEncodeError, just('unknown encoding'), text(), just(0), just(0), just('reason')).filter(_can_hash), builds(UnicodeTranslateError, text(), just(0), just(0), just('reason')).filter(_can_hash), builds(classmethod, just(lambda self: self)).filter(_can_hash), functions().filter(_can_hash), iterables(nothing()).filter(_can_hash), builds(dict).map(dict.values).filter(_can_hash), dates().filter(_can_hash), times().filter(_can_hash), one_of(builds(timezone, offset=builds(timedelta, hours=integers(min_value=-23, max_value=23), minutes=integers(min_value=0, max_value=59))).filter(_can_hash), builds(timezone, name=text(alphabet=characters()), offset=builds(timedelta, hours=integers(min_value=-23, max_value=23), minutes=integers(min_value=0, max_value=59))).filter(_can_hash)), just(Ellipsis).filter(_can_hash), builds(frozenset).filter(_can_hash), one_of(tuples(integers(min_value=0, max_value=4294967295), integers(min_value=-32, max_value=0).map(abs)).map(lambda x: ipaddress.IPv4Network(x, strict=False)).filter(_can_hash), sampled_from(('0.0.0.0/8', '10.0.0.0/8', '100.64.0.0/10', '127.0.0.0/8', '169.254.0.0/16', '172.16.0.0/12', '192.0.0.0/24', '192.0.0.0/29', '192.0.0.8/32', '192.0.0.9/32', '192.0.0.10/32', '192.0.0.170/32', '192.0.0.171/32', '192.0.2.0/24', '192.31.196.0/24', '192.52.193.0/24', '192.88.99.0/24', '192.168.0.0/16', '192.175.48.0/24', '198.18.0.0/15', '198.51.100.0/24', '203.0.113.0/24', '240.0.0.0/4', '255.255.255.255/32')).map(IPv4Network).filter(_can_hash)), one_of(tuples(integers(min_value=0, max_value=340282366920938463463374607431768211455), integers(min_value=-128, max_value=0).map(abs)).map(lambda x: ipaddress.IPv6Network(x, strict=False)).filter(_can_hash), sampled_from(('::1/128', '::/128', '::ffff:0:0/96', '64:ff9b::/96', '64:ff9b:1::/48', '100::/64', '2001::/23', '2001::/32', '2001:1::1/128', '2001:1::2/128', '2001:2::/48', '2001:3::/32', '2001:4:112::/48', '2001:10::/28', '2001:20::/28', '2001:db8::/32', '2002::/16', '2620:4f:8000::/48', 'fc00::/7', 'fe80::/10')).map(IPv6Network).filter(_can_hash)), binary().map(memoryview).filter(_can_hash), builds(PurePath, text()).filter(_can_hash), builds(property, just(lambda _: None)).filter(_can_hash), randoms().filter(_can_hash), one_of(builds(range, integers(min_value=0)).filter(_can_hash), builds(range, integers(), integers()).filter(_can_hash), builds(range, integers(), integers(), integers().filter(bool)).filter(_can_hash)), text().map(lambda c: re.match(".", c, flags=re.DOTALL)).filter(bool).filter(_can_hash), builds(compile, sampled_from(['', b''])).filter(_can_hash), builds(slice, one_of(none(), integers()), one_of(none(), integers()), one_of(none(), integers())).filter(_can_hash), text().filter(_can_hash), builds(super, from_type(builtins.type)).filter(_can_hash), builds(Bar, integers()).filter(_can_hash), builds(Baz, integers()).filter(_can_hash), builds(tuple).filter(_can_hash), builds(BytesIO, binary()).filter(_can_hash), one_of(booleans().filter(_can_hash), integers().filter(_can_hash), floats().filter(_can_hash), complex_numbers().filter(_can_hash), fractions().filter(_can_hash), decimals().filter(_can_hash), timedeltas().filter(_can_hash)), one_of(booleans().filter(_can_hash), binary().filter(_can_hash), integers(min_value=0, max_value=255).filter(_can_hash), lists(integers(min_value=0, max_value=255)).map(tuple).filter(_can_hash)), one_of(booleans().filter(_can_hash), integers().filter(_can_hash), floats().filter(_can_hash), complex_numbers().filter(_can_hash), decimals().filter(_can_hash), fractions().filter(_can_hash)), one_of(booleans().filter(_can_hash), integers().filter(_can_hash), floats().filter(_can_hash), decimals().filter(_can_hash), fractions().filter(_can_hash), floats().map(str).filter(_can_hash)), one_of(integers().filter(_can_hash), booleans().filter(_can_hash)), one_of(booleans().filter(_can_hash), integers().filter(_can_hash), floats().filter(_can_hash), uuids().filter(_can_hash), decimals().filter(_can_hash), from_regex('\\A-?\\d+\\Z').filter(functools.partial(can_cast, int)).filter(_can_hash)), one_of(booleans().filter(_can_hash), integers().filter(_can_hash), floats().filter(_can_hash), decimals().filter(_can_hash), fractions().filter(_can_hash)), builds(StringIO, text()).filter(_can_hash), shared(sampled_from([<class 'NoneType'>, <class 'bool'>, <class 'int'>, <class 'float'>, <class 'str'>, <class 'bytes'>]), key="typevar=<class 'collections.abc.Hashable'>").flatmap(from_type).filter(_can_hash), timezones().filter(_can_hash)))

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 18, 2024

test_resolve_typing_module[typing.ChainMap] flaked with this similar-looking error. Cleaned up:

# while generating 'Draw 1: ' from
types_strat = sampled_from([type(None), bool, int, float, str, bytes])
dictionaries(
    keys=shared(types_strat, key='typevar=~KT').flatmap(from_type).filter(_can_hash), 
    values=shared(types_strat, key='typevar=~VT').flatmap(from_type)
).map(ChainMap)

It might also be relevant that all of the examples I remember have involved sets or mappings; I'd suspect iteration order but ChainMap wrapping a dict should be entirely deterministic in iteration order. Perhaps an interaction between from_type (which tries to cache) and typevars and/or _can_hash?

@tybug
Copy link
Member

tybug commented Mar 18, 2024

Manually shrunk to the following:

dictionaries(
    keys=st.floats(),
    values=st.just(None),
)

Anecdotally using float seems to be critical, vs say int. At this point I'd suspect -0.0 vs 0.0 or related float issues.

def f():
    s = dictionaries(
        keys=st.floats(),
        values=st.just(None),
    )
    s.example()

for i in range(1000):
    print("-" * 25, i, "-" * 25)
    f()

@tybug
Copy link
Member

tybug commented Mar 18, 2024

reduced further:

lists(
    st.floats(allow_infinity=False),
    min_size=0,
    max_size=3,
    unique=True,
)

something to do with multiple nans in the same list while filtering for uniqueness. I wonder if

n = 18444492273895866368
assert math.isnan(int_to_float(n))
assert int_to_float(n) not in [int_to_float(n)]

is relevant? (int_to_float destroys identity).

@Zac-HD
Copy link
Member Author

Zac-HD commented Mar 18, 2024

Yep, that would do it!

List containment (and maybe other collections?) uses a is b as an optimization over a == b, only checking the latter if a is not b. Unfortunately this is unsound in the presence of aliased nans, and while that's not usually a problem here we are. I'd guess the mechanism is that we alias the first time we generate, and then don't alias on the second, causing the second to attempt a different subsequent draw.

Probably the correct general solution is to ensure that we return a different float object each time we generate a nan, which will ensure that we get the non-aliasing behavior each time.

This trades away a little bit of test power which we had ~by accident; we may well want to bring that back in future but should do so above the IR layer.

@Zac-HD Zac-HD added the bug something is clearly wrong here label Mar 19, 2024
@Zac-HD Zac-HD changed the title test_generic_collections_only_use_hashable_elements[FrozenSet] is flaky Internal error: tests using unique collections containing nan are flaky Mar 19, 2024
@Zac-HD Zac-HD removed the tests/build/CI about testing or deployment *of* Hypothesis label Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is clearly wrong here flaky-tests for when our tests only sometimes pass
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants