Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rayoff, fast simple threadpool #7110

Closed
wants to merge 10 commits into from

Conversation

aeyakovenko
Copy link
Member

@aeyakovenko aeyakovenko commented Nov 23, 2019

Problem

Rayon work stealing is busted.

Summary of Changes

Implement a fast thredpool.

Raw bench performance:

test bench_rayoff   ... bench:      10,478 ns/iter (+/- 2,517)
test bench_rayon    ... bench:      12,926 ns/iter (+/- 9,041)

sigverify performance:

running 3 tests
test bench_sigverify        ... bench:   3,973,128 ns/iter (+/- 306,527)
test bench_sigverify_rayoff ... bench:   3,697,677 ns/iter (+/- 738,464)

Fixes #

@codecov
Copy link

codecov bot commented Nov 23, 2019

Codecov Report

Merging #7110 into master will increase coverage by 2.8%.
The diff coverage is 76.3%.

@@           Coverage Diff            @@
##           master   #7110     +/-   ##
========================================
+ Coverage    76.3%   79.1%   +2.8%     
========================================
  Files         230     231      +1     
  Lines       46869   44389   -2480     
========================================
- Hits        35783   35136    -647     
+ Misses      11086    9253   -1833

@garious
Copy link
Contributor

garious commented Nov 23, 2019

Can you PR rayon? I don't think we have a strong enough need for this to justify migrating to a fork of such a popular, well-documented, community-maintained library.

@aeyakovenko
Copy link
Member Author

@garious rayon is to big, and development is sluggish. It doesn’t seem like it’s build for our usecase. Their kind of work stealing is better suited for ai, image processing etc, where a process runs at constant 100% load the whole time.

@aeyakovenko
Copy link
Member Author

@garious Need due to the fact that because their work stealing algorithm isn’t suitable for our usecase we have to use 50% fewer threads.

@garious
Copy link
Contributor

garious commented Nov 23, 2019

So what I see here is a solution to a problem we don't have (we're above 50ktps (on CPU!)), no docs, and unsafe annotations. There's obvious costs here and no benefit that we require for SLP1 or TdS1. Can you instead move this code to a separate git repo and keep it in your back pocket for when we need the TPS boost and it's a good time to take on the churn?

@aeyakovenko
Copy link
Member Author

@garious https://www.cia.gov/news-information/featured-story-archive/2012-featured-story-archive/simple-sabotage.html

Organizations and Conferences: When possible, refer all matters to committees, for "further study and consideration." Attempt to make the committees as large and bureaucratic as possible. Hold conferences when there is more critical work to be done.

@garious
Copy link
Contributor

garious commented Nov 23, 2019

Speak directly please

@aeyakovenko
Copy link
Member Author

So what I see here is a solution to a problem we don't have (we're above 50ktps (on CPU!)

Small number of nodes, on a private network, no packet loss or network failure simulation. Optimizing a pure infrastructure lib with no business logic is low hanging fruit.

Our motto is “optimize every aspect of the system that doesn’t sacrifice performance”, which this clearly does.

There's obvious costs here and no benefit that we require for SLP1 or TdS1.

What’s the obvious cost? Rayon is a huge library that moves slowly in the direction that we want. The CPU hog issue has been open for 3 months with no signs of progress. Rayoff is tiny with no additional dependencies. It’s a one days of work to document the unsafes and add a few more tests and switch it out.

@mvines
Copy link
Member

mvines commented Nov 23, 2019

rayoff - Trolling other open source projects is beneath us. It's not busted, it's just not suited for our workload. If anything we are busted because we picked the wrong tool. Please pick another name, perhaps fast-threadpool.

@garious
Copy link
Contributor

garious commented Nov 23, 2019

Let's focus on the benefit this brings to SLP1 or TdS1 before diving into costs.

The reason an issue may be open for three months could be that it's never been in the critical path of anything important.

Documenting why the unsafe annotations are needed and safe is a hard requirement, sorry.

@stale
Copy link

stale bot commented Dec 17, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 17, 2019
@stale
Copy link

stale bot commented Dec 24, 2019

This stale pull request has been automatically closed. Thank you for your contributions.

@stale stale bot closed this Dec 24, 2019
@ryoqun
Copy link
Member

ryoqun commented Aug 12, 2020

@aeyakovenko It seem that rayon is working to fix the idle cpu-usage issue (rayon-rs/rayon#642): rayon-rs/rayon#746. Then we might be able to remove this work-around as well: #5871

I'm just curious how this compares to the new rayon's PR. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants