Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118138: Adds test for multithreaded writes to files. #119107

Conversation

elistevens
Copy link

@elistevens elistevens commented May 17, 2024

Multithreaded writes to files (including standard calls to print() from threads) can result in assertion failures, missed dropped output, and (in non-debug builds) writing uninitialized memory out due to race conditions in _io_TextIOWrapper_write_impl and _textiowrapper_writeflush.

See: #118138

I've provided a change that fixes the original issue for me on python 3.10.12, but my attempt to turn the repro script into a self-contained test case fails with that patch on both 3.10.12 and 3.14.0a0 (main as of today). That patch makes self->pending_bytes always be a list, and calls PyList_SetSlice(pending, 0, pending_size_at_start, NULL) to remove entries after they've been processed. I strongly suspect that the patch only narrows the time windows for race conditions, not actually eliminate them. I also think that it might introduce the possibility of dropped or repeated output should those race conditions trigger, but I am uncertain if that's actually the case (and they don't happen in practice, according to the test script).

I am out of my depth and would appreciate a maintainer taking a look.

Thanks to @sterwill for the original repro script, which the test case here is a descendent of.

Multithreaded writes to files (including standard calls to print() from
threads) can result in assertion failures and potentially missed output due to
race conditions in _io_TextIOWrapper_write_impl and _textiowrapper_writeflush.

See: python#118138
Copy link

cpython-cla-bot bot commented May 17, 2024

The following commit authors need to sign the Contributor License Agreement:

Click the button to sign:
CLA not signed

@bedevere-app
Copy link

bedevere-app bot commented May 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@siddhu032d
Copy link

Result: ACCESS

@bedevere-app
Copy link

bedevere-app bot commented May 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@elistevens
Copy link
Author

elistevens commented May 17, 2024

I believe that the issue with my patch was due to closing the file accepting output before calling executor.shutdown(wait=True). The modified test now passes with my change.

Edit: I also feel like there's probably a cleaner way to handle pending_bytes_count but given that I'm not even certain that this is the right way to solve the problem, I'm not investing too much thought into it.

@elistevens
Copy link
Author

After deploying this patch at scale internally, I can confirm that there are still race condition issues; the test fails about 14% of the time in our CI cluster (interestingly it always passed while running locally, so it might be a load issue; I haven't investigated the exact nature of the failures yet). 14% is better than the 100% failure rate we were seeing prior, but it's still not fully fixed.

@bedevere-app
Copy link

bedevere-app bot commented May 30, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@elistevens
Copy link
Author

I think that the test provided here might be useful (after being adapted to meet cpython standards), but the actual code change likely is not. On #119507 a much simpler patch achieves the same practical "it doesn't assert any longer" effect for my use case.

The issue is also being discussed here: https://discuss.python.org/t/bug-with-multi-threaded-print-write-causing-dropped-output-output-of-uninitialized-memory-assertion-failures/54538 where @pitrou suggests using a mutex to protect the internal state.

Should we close this PR?

@elistevens elistevens closed this Jun 4, 2024
@ericsnowcurrently
Copy link
Member

@elistevens, thanks for taking the time you did on this PR. While there wasn't much feedback here, that DPO thread definitely served as a proxy, which is probably okay. Regardless, you clearly put a lot of thought into this and demonstrated a healthy dose of determination and skill. I hope there will be more opportunities for you to participate in Python core development. If you have any feedback, please let us know.

@elistevens
Copy link
Author

Thanks, Eric! I'm not worried about the relative quiet on this PR; #119507 had a lively discussion and ultimately resulted in a much cleaner fix than the one I proposed here. I also recognize that maintainers are quite busy and I think should focus on the PRs that are likely to merge.

Thank you for all the work that you all do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants