Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings for side effects during import #3837

Merged
merged 14 commits into from
Jan 16, 2024

Conversation

jobh
Copy link
Contributor

@jobh jobh commented Jan 12, 2024

Warn if the storage directory is accessed, or if a lazy strategy is materialized, during import. This is intended to target plugins that are loaded during import of hypothesis through importlib entrypoints, but the scope is expanded to the full duration of the import (because why not).

For pytest plugins, it's not clear that it is possible since we don't control the plugin loading order. We could add side-effect detection at the time of loading the hypothesis pytest plugin, to detect if those running before have done anything unwanted, but I don't see how we can do anything about those loaded after us. Not sure if this kind of flaky detection is better than none at all.

Closes #3836.

I don't know how to test the during-import behaviour beyond superficial "mock to pretend we're still importing"

@jobh
Copy link
Contributor Author

jobh commented Jan 12, 2024

Hm, possibly we could hook into f.x. pytest_sessionstart or pytest_plugin_registered

I don't have any experience with pytest plugins, any advice @Zac-HD ? We'd want to catch things done by the plugins themselves, not by the user in conftest.py etc.

@Zac-HD
Copy link
Member

Zac-HD commented Jan 12, 2024

Hmm, if we're just issuing warnings I think it's reasonable to warn about stuff in conftest.py too? That also adds latency in time-to-starting-first-test, which is equally annoying for feedback loops no matter where it comes from.

I'd actually taken a stab at this myself too, master...Zac-HD:hypothesis:plugin-checks; feel free to crib from that if you want. I do prefer your warnings approach and think deprecations are probably a mistake though; I'd be OK with an eventual error for plugins but not conftest stuff etc.

@jobh
Copy link
Contributor Author

jobh commented Jan 12, 2024

Right! It looks like we're following similar paths, with the major differences being

  • up-front checking for lazy strategies (yours) / try-and-catch (mine) for registration
  • warn-while-checking (yours) / warn-at-callsite (mine) for showing warnings

I'll gladly continue this branch with improvements from yours, or vice versa. For clarity: By "prefer your warnings approach" you're thinking of the warn-at-callsite part of it, not the registration part? And I'm afraid I don't fully comprehend the implication of "deprecations are probably a mistake", either, sorry ;-)

@jobh
Copy link
Contributor Author

jobh commented Jan 12, 2024

FYI: This is mostly feature complete. I'll have to leave it for a little while now, the remaining tasks:

  • Pass tests, formatting, etc.
  • Probably pick the "check for lazy strategy" implementation from your branch, depending on feedback above
  • You'll probably have opinions on the actual messages we show to users ;-)
  • The difficult bit: Try to add test(s)

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll gladly continue this branch with improvements from yours, or vice versa. For clarity: By "prefer your warnings approach" you're thinking of the warn-at-callsite part of it, not the registration part? And I'm afraid I don't fully comprehend the implication of "deprecations are probably a mistake", either, sorry ;-)

  • I'd suggest continuing with your branch, and stealing any inspiration you want from mine
  • I think try/except checking for lazy strategies is better than the way I implemented it; the direct check AFAICT currently works but it's more fragile
  • I'd avoid bringing our pytest plugin into it; people also use Hypothesis with unittest and there's nothing pytest-specific about these checks.
  • I'm actually not too worried about regression tests for this; the environment management is enough of a pain that (at least until we have a regression) I'm OK with leaning more on manual testing before we merge. Just write up a description of what you manually tested + how so we can do it again next time!

Comment on lines 52 to 55
if os.environ.get("HYPOTHESIS_WARN_SIDEEFFECT"):

def sideeffect_should_warn():
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to always warn?

Copy link
Contributor Author

@jobh jobh Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit of a long story, so I'll explain. Pardon the verbosity! I agree there's nothing pytest-specific about the check for plugins that we load ourselves, but pytest plugins (and conftest) are an instance of a wider check: Side effects during initialization, where initialization is not just the initial import.

We can't warn directly about side effects during post-import initialization, but we can detect them. Even at no additional cost, since we can check whether the patching has been done - which happens on the first side-effect-inducing operation after import. And then the pytest plugin (or other test runners) can quite easily check if this is the case at start of session.

Now, it can warn that it has happened, but it can't see who did it or where, at least not easily [*]. That's where this setting comes in:

    "initialization, possibly causing slowdown or creation of files. To pinpoint and explain "
    "the problem, execute with environment 'PYTHONWARNINGS=error HYPOTHESIS_WARN_SIDEEFFECT=1'",

...which will cause the warning to happen, even after import is finished, and show the relevant stack trace.

[*] Yeah, we could record the stack of the first such call, but that's more effort than it's worth IMO

Copy link
Contributor Author

@jobh jobh Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[*] Yeah, we could record the stack of the first such call, but that's more effort than it's worth IMO

Hm, or maybe. I guess it's easy enough to stash the stacktrace and message. Would simplify the wording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there are basically two ways to go about this:

  1. Warn at the end of initialization, if we can see that a side-effectful operation happened.
  2. Warn on a side-effectful operation, if we're not yet finished initializing.

I generally prefer (2), because the stack trace from -Werror makes it really easy to tell when this is happening.

I guess the problem is how to tell whether we're in post-import initialization, but I think there we could set a global from the earliest possible pytest setup hook, and then unset it at the end? We might still miss a few cases due to the ordering of hooks between our and other pytest plugins, but I think that applies equally to (1) for side effects triggered by pytest plugins.

At that point I suppose there's still a case for (1) to warn if we detect that there was a side-effect between finishing import hypothesis and delivery of the first pytest hook... but that seems concerningly prone to false alarms (c.f. testdir.runpytest_inprocess() test changes!). Let's just skip detection in this case then?

Copy link
Contributor Author

@jobh jobh Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree (2) would be best, and we could do that as soon as possible (I was thinking even plugin-module execution, not wait for the import hook). I haven't done this because I am concerned that there is a significant chance of something happening during that hole in coverage. But hey, maybe it's not that important compared to having a simpler best-effort variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... if there is a hole, we must check and deal with that anyway to revert the patching, so we may as well warn in that case, too. I'll follow this path then.

hypothesis-python/src/hypothesis/errors.py Outdated Show resolved Hide resolved
@jobh
Copy link
Contributor Author

jobh commented Jan 13, 2024

Thanks for the review and suggestions @Zac-HD ! I'll clean up and complete, and ping you when ready.

Note: I'd like to keep the pytest stuff in, it makes it at least partially testable because we can then check detection of side-effects after import which is fairly simple infrastructure-wise.

@jobh
Copy link
Contributor Author

jobh commented Jan 15, 2024

Right, ready for next round now @Zac-HD . I'm reasonably happy with how it ended up, if you feel it's too complicated it would be possible to simplify a bit at the price of reduced coverage. I did take the liberty of creating a new toplevel module for globals, since I didn't find any existing appropriate module.

Test coverage is actually fairly complete, yay! The earlier "false positives" were a bug, I didn't realize that plugin hooks would run multiple times for runpytest_inprocess. After that was fixed, no existing tests needed to change which is a good sign.

All tests should pass soon, with the exception of the windows memory error, which might be related to #3820 (which I recall saw and provisionally fixed the same failure). It is triggered by generating f"{self!r}" to pass as what into the check; the failing test sees reprs of up to 200+MB. I can work around it by changing the api, not having to generate the repr upfront, but... it's probably something to look into in its own right.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great to me - thanks so much for your work here Joachim!

I think all we have left to do is the two small comments below, and the mysterious-but-repeated failure on 32-bit Windows (and Python 3.11, but I bet it's the first bit causing trouble). Also suspicious that it's a MemoryError...

hypothesis-python/src/_hypothesis_globals.py Show resolved Hide resolved
hypothesis-python/src/hypothesis/errors.py Outdated Show resolved Hide resolved
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
@jobh
Copy link
Contributor Author

jobh commented Jan 16, 2024

I think all we have left to do is the two small comments below, and the mysterious-but-repeated failure on 32-bit Windows (and Python 3.11, but I bet it's the first bit causing trouble). Also suspicious that it's a MemoryError...

I didn't really follow #3820, but I did check this memory error. It was caused by repr becoming so long in this test that it exhausts available memory or address space on the runner. It was fixed in 2c37fb7,
but now that it reappears perhaps it was more papering-over than a real fix.

It does however point out that the repr is potentially expensive, so maybe we should not construct it unless the warning is actually emitted. I.e., input format string and arguments separately, logger-style.

"to get a traceback and show which plugin is responsible." + extra,
HypothesisSideeffectWarning,
stacklevel=3,
)
else:
_first_postinit_what = what
_first_postinit_what = (what, fmt_args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to stored the formatted what? Slightly concerned about holding onto a potentially mutable object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH we know exactly what comes in, no user will call this function. Thanks for getting to all green 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to help!

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jobh 🥰

@Zac-HD Zac-HD merged commit cb0f2ac into HypothesisWorks:master Jan 16, 2024
47 checks passed
@jobh jobh deleted the import-time-warnings branch January 16, 2024 10:32
@Zac-HD
Copy link
Member

Zac-HD commented Jan 17, 2024

pytest-dev/pytest#11825 (review) - looks like a false alarm inside pytest's own tests, possibly related to the initialization-detection logic in our pytest plugin?

I also noticed that the warning unconditionally mentions "import time", we should be specific about import time / between import and loading the pytest plugin / during pytest initialization.

@jobh
Copy link
Contributor Author

jobh commented Jan 17, 2024

pytest-dev/pytest#11825 (review) - looks like a false alarm inside pytest's own tests, possibly related to the initialization-detection logic in our pytest plugin?

That's weird. It's only reproducible when running the full test suite instead of just the python test folder, and by tracing all increments/decrements I can see that is_initialization ends up at zero. Then it's suddenly 1 again later, thread does not change, globals module id does not change, ... Anyway. No idea yet.

@jobh
Copy link
Contributor Author

jobh commented Jan 17, 2024

Seriously...

I renamed the global to ensure all access is via logged accessors, and see the following sequence:

XXXXX Value incremented to 2 <_MainThread(MainThread, started 140293875918656)> module id 140293872734304
XXXXX Value decremented to 1 <_MainThread(MainThread, started 140293875918656)> module id 140293872734304
XXXXX    Value returned as 1 <_MainThread(MainThread, started 140293875918656)> module id 140293872734304
[...many of these elided, they are from the `is_empty` check in register_]
XXXXX Value decremented to 0 <_MainThread(MainThread, started 140293875918656)> module id 140293872734304
XXXXX    Value returned as 1 <_MainThread(MainThread, started 140293875918656)> module id 140293872734304
E           hypothesis.errors.HypothesisSideeffectWarning: XXXXXXXC 140293841914912 <_MainThread(MainThread, started 140293875918656)> Slow code in plugin: avoid lazy evaluation of text() at import time!  Set PYTHONWARNINGS=error to get a traceback and show which plugin is responsible.

What could possibly cause this?

@jobh
Copy link
Contributor Author

jobh commented Jan 17, 2024

Right. I suspect what happens is that the final plugin decrement happens within a somehow auto-mocked pytester environment, so it is magically reverted afterwards. The minimal test sequence to reproduce is

pytest testing/acceptance_test.py::TestInvocationVariants::test_invoke_plugin_api testing/python/metafunc.py::TestMetafunc::test_idval_hypothesis

This means we can't balance our incs/decs, so the only solution is to accept imbalance like the suggested fail-safe solution for thread safety above.

@jobh jobh mentioned this pull request Jan 17, 2024
@jobh
Copy link
Contributor Author

jobh commented Jan 17, 2024

I submitted a hotfix PR to fix it. It passes pytest's tests, if it passes ours as well then it should be good.

Btw: Impressive early spotting of this problem @Zac-HD !

@jobh
Copy link
Contributor Author

jobh commented Jan 18, 2024

Not that it matters now, but checking this with a clearer head, for the benefit of better understanding:

Right. I suspect what happens is that the final plugin decrement happens within a somehow auto-mocked pytester environment, so it is magically reverted afterwards. [...]

pytest testing/acceptance_test.py::TestInvocationVariants::test_invoke_plugin_api

the real cause for the mystery was much less complicated, it was just the logging output being hidden by capsys. The thing that was hidden was a pytest_configure with no following pytest_sessionstart. Fix is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Folder ".hypothesis" appears after "pytest" run but I don't use hypothesis
2 participants