Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use non-mapped strategies for populating numpy arrays #3858

Merged
merged 9 commits into from
Jan 27, 2024

Conversation

jobh
Copy link
Contributor

@jobh jobh commented Jan 24, 2024

Closes #3857

@jobh

This comment was marked as outdated.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@jobh

This comment was marked as outdated.

@jobh jobh force-pushed the numpy-array-elements-without-map branch from 33f8f88 to 0345288 Compare January 24, 2024 12:43
@Zac-HD
Copy link
Member

Zac-HD commented Jan 25, 2024

Hmm. I somewhat dislike adding a new typed= argument; it feels like we're imposing a burden on the user and a better solution ought to be possible.

  • What's the performance improvement like in the non-unique case? If nontrivial, I'd prefer to pick it up in a way which doesn't require the user to do anything (e.g. have arrays() know how to unwrap strategies created by from_dtype())
  • To support the unique case, what if we instead used sampled_from() for <= 8bit dtypes?

@jobh
Copy link
Contributor Author

jobh commented Jan 25, 2024

The first version (commit) of this PR did your no.1, arrays() just took away any mapping added by from_dtype(). Because we know that the latter only adds type mappings. I changed my mind because it's less maintainable (deep knowledge of another function's implementation), and also because the option may come in handy for users in composite strategies; for use, or even just as a documented performance hint. I can revert to the "unmap" variant if you like.

I'm opposed to no.2 (using sampled_from directly). Bad for repr, but more importantly it is specific for this particular problem instead of protecting against this plus allows it to take part in changed or future similar optimizations.

@jobh
Copy link
Contributor Author

jobh commented Jan 25, 2024

  • What's the performance improvement like in the non-unique case? If nontrivial

I haven't checked. Nonzero but trivial, I expect.

@jobh

This comment was marked as outdated.

jobh added 4 commits January 25, 2024 09:45
@jobh
Copy link
Contributor Author

jobh commented Jan 25, 2024

Note unrelated flaky test (in case it goes away) at https://github.com/HypothesisWorks/hypothesis/actions/runs/7639777846/job/20813516976?pr=3858

FAILED tests/conjecture/test_forced.py::test_forced_floats[True-True] - assert None is not None

@tybug could this be related to your recent work? Just a heads-up, it hasn't been bothering me since.

@jobh jobh force-pushed the numpy-array-elements-without-map branch 2 times, most recently from 6d93eb9 to 141fbe1 Compare January 25, 2024 13:04
@jobh
Copy link
Contributor Author

jobh commented Jan 25, 2024

Advice needed:

It looks like coverage is missing for some invalid inputs, but I don't understand why that starts happening now.

And I don't quite understand the stacks either; fill_for doesn't call check_valid_sizes at all as far as I can see

coverage: commands[2]> python scripts/validate_branch_check.py
Some branches were not properly covered.

The following were always True:
  * check_type at internal/validation.py:103 in check_valid_size at internal/validation.py:125 in check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_type at internal/validation.py:103 in check_valid_size at internal/validation.py:125 in check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
  * check_type at internal/validation.py:103 in check_valid_size at internal/validation.py:126 in check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_type at internal/validation.py:103 in check_valid_size at internal/validation.py:126 in check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
  * check_valid_interval at internal/validation.py:127 in check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_valid_interval at internal/validation.py:127 in check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
  * check_valid_size at internal/validation.py:125 in check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_valid_size at internal/validation.py:125 in check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
  * check_valid_size at internal/validation.py:126 in check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_valid_size at internal/validation.py:126 in check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
  * check_valid_sizes at strategies/_internal/core.py:825 in fill_for at extra/numpy.py:535 passed
  * check_valid_sizes at strategies/_internal/core.py:959 in fill_for at extra/numpy.py:535 passed
coverage: exit 1 (0.02 seconds) /home/runner/work/hypothesis/hypothesis/hypothesis-python> python scripts/validate_branch_check.py pid=7187

[edit: ...and strategies/_internal/core.py:959 is an empty line inside a docstring...?]
https://github.com/jobh/hypothesis/blob/141fbe1a264f9b41313e1fef4181cac5199e9d34/hypothesis-python/src/hypothesis/strategies/_internal/core.py#L825
https://github.com/jobh/hypothesis/blob/141fbe1a264f9b41313e1fef4181cac5199e9d34/hypothesis-python/src/hypothesis/strategies/_internal/core.py#L959

@tybug
Copy link
Member

tybug commented Jan 25, 2024

Note unrelated flaky test (in case it goes away) at https://github.com/HypothesisWorks/hypothesis/actions/runs/7639777846/job/20813516976?pr=3858

FAILED tests/conjecture/test_forced.py::test_forced_floats[True-True] - assert None is not None

@tybug could this be related to your recent work? Just a heads-up, it hasn't been bothering me since.

thanks for the ping — definitely my fault in some way. I haven't seen it in #3818 recently, so it may have been fixed at some point in the interim on that branch (or I may just be unlucky). I'll take a look if it comes up again after that is merged.

@@ -393,7 +399,6 @@ def do_draw(self, data):
return result


@check_function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this started failing, but also don't think it really needs to be a check function... so it's not.

@Zac-HD Zac-HD merged commit 9b67398 into HypothesisWorks:master Jan 27, 2024
47 checks passed
@jobh jobh deleted the numpy-array-elements-without-map branch February 1, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very slow unique sampling of np.int8
3 participants