Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for annotated-types constraints in st.from_type(Annotated[T, ...]) #3356

Closed
Zac-HD opened this issue May 22, 2022 · 5 comments · Fixed by #3780
Closed

Check for annotated-types constraints in st.from_type(Annotated[T, ...]) #3356

Zac-HD opened this issue May 22, 2022 · 5 comments · Fixed by #3780
Labels
enhancement it's not broken, but we want it to be better

Comments

@Zac-HD
Copy link
Member

Zac-HD commented May 22, 2022

At PyCon 2022 I helped start https://github.com/annotated-types/annotated-types - so it would be nice to support these new constraints once we finalize the upstream release! The key idea is that we can iterate through the constraints, and derive callables which we then use as filters on the strategy that we resolved for the annotated type.

Filter-rewriting will then make numeric bounds as efficient as is practically possible at runtime (though see #3134 / b4c4161 / #3479 for strings). For example:

>>> from_type(Annotated[int, Gt(10)])
# integers().filter(partial(gt, 10))
integers(min_value=11)

>>> from_type(Annotated[str, Predicate(str.isdigit)])
text().filter(str.isdigit)
# from_regex(r"\d+", fullmatch=True)

@adriangb @samuelcolvin FYI 😁

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label May 22, 2022
@Zac-HD Zac-HD mentioned this issue Jul 14, 2022
20 tasks
@Zac-HD
Copy link
Member Author

Zac-HD commented Oct 30, 2022

A more detailed breakdown of the tasks involved:

  • Gt, Ge, Lt, Le:
    • translate into filter predicates with partial and the operator module
    • add filter-rewriting support for st.fractions() and st.decimals()
  • MultipleOf:
    • annoyingly difficult to get right, especially for floats! Also ambiguous: modulo-based or division-based semantics?
    • filtering is absurdly inefficient; probably needs special handling plumbed right through; at minimum apply this last
    • can we get away with warning people off for now?
  • Timezone:
    • resolve this to a timezones= strategy for st.times() or st.datetimes()
    • register a callback fn for time and datetime types, which accepts this arg, and plumb it through
    • explicit error if we see this constraint but the resolver func doesn't accept this arg
  • MinLen, MaxLen: as for timezones, but fall back to a warning and filter predicate rather than an error
  • Predicate: use as predicate to .filter() in the obvious way

Other constraints: unpack GroupedMetadata into the component parts, ignore unknown constraint types and log a warning at debug verbosity. The new code will replace logic invoked from:

if is_annotated_type(thing): # pragma: no cover
# This requires Python 3.9+ or the typing_extensions package
annotated_strategy = find_annotated_strategy(thing)
if annotated_strategy is not None:
return annotated_strategy
args = thing.__args__
assert args, "it's impossible to make an annotated type with no args"
annotated_type = args[0]
return st.from_type(annotated_type)

@Viicos
Copy link
Contributor

Viicos commented Oct 27, 2023

I've started an implementation of this, and I have a couple questions:

Gt, Ge, Lt, Le:

  • translate into filter predicates with partial and the operator module
  • add filter-rewriting support for st.fractions() and st.decimals()

I'm don't have much knowledge on how filter rewriting works, but is there a way to check that integers().filter(partial(gt, 10)) is correctly rewritten by Hypothesis to integers(min_value=11)? How about integers().filter(partial(gt, 10)).filter(partial(lt, 20))?
Edit: nevermind, seems to work when accessing the wrapped strategy of the lazy strategy.


How should we handle constraints that are incompatible with each other/with the annotated type? e.g. Annotated[int, Gt(1), Timezone(tz.utc)], Annotated[UUID, Gt(1)]? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?

@Zac-HD
Copy link
Member Author

Zac-HD commented Oct 27, 2023

How to check that filter-rewriting works: trust that https://github.com/HypothesisWorks/hypothesis/blob/master/hypothesis-python/tests/cover/test_filter_rewriting.py would surface any problems 😁

How should we handle constraints that are incompatible with each other/with the annotated type? e.g. Annotated[int, Gt(1), Lt(1)], Annotated[str, Gt(1)]? Should we ignore and let a potential error be raised at some point when the resulting strategy is used?

Hmm, this is somewhat tricky - the decision about whether to return st.nothing() or raise an InvalidArgument exception is made on a case-by-case basis. We prefer to raise an error when we think there's something the user could reasonably do to avoid it; or use nothing() when that would be impractical.

I think in this case I'd just translate to filters, as the easy-to-implement approach. That means:

  • Annotated[str, Gt(1)] will raise a TypeError due to the comparison when an example is drawn. It's somewhat unfortunate that the error message won't mention that this came from an annotated type, but I think not worth the code required to improve on it.
  • Annotated[int, Gt(1), Lt(1)] will return nothing(), because that's equivalent to a strategy where all possible values are filtered out. In this case, I think we can add a few lines of code like if s.is_empty: raise InvalidArgument(f"There are no valid values for type {t!r}")

@Viicos
Copy link
Contributor

Viicos commented Oct 29, 2023

Ok I see, I'll soon open a draft PR so that progress can be tracked.

  • register a callback fn for time and datetime types, which accepts this arg, and plumb it through

By register a callback you mean using this function?

def register(type_, fallback=None, *, module=typing):
if isinstance(type_, str):
# Use the name of generic types which are not available on all
# versions, and the function just won't be added to the registry


Ideally I will try to build strategies in a "smart" way, and avoid using filters when no filter rewriting is available (e.g. text(min_size=20) instead of text().filter(lambda el: len(el) >= 20); applies to list/bytes/probably others). Even Gt/Ge/Lt/Le can apply to types where filter rewriting could be implemented but currently isn't (e.g. datetime), so I need to build the corresponding strategies in a smart way.

What's challenging is that I could manually build the strategies with the correct kwargs (e.g. min_size=..., max_size=...) for each known type, but that would be rewriting what from_type is currently doing with the extra logic from constraints.

@Zac-HD
Copy link
Member Author

Zac-HD commented Nov 1, 2023

Let's start simple, and only include filters in the initial version. If sometimes that means users get Unsatisfiable rather than data which violates their annotation constraints, I think that's still an improvement!

After we've shipped the logically correct version, we can look at improving performance in common cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants