Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - sync: prioritize peers with higher success rate and low latency #5143

Closed
wants to merge 11 commits into from

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Oct 11, 2023

closes: #5127 #5036

peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer:

  • request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency
  • latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs).

related: #4977

limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once.

synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h

@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Merging #5143 (dcca7de) into develop (6338651) will increase coverage by 77.7%.
Report is 1 commits behind head on develop.
The diff coverage is 90.0%.

@@            Coverage Diff             @@
##           develop   #5143      +/-   ##
==========================================
+ Coverage         0   77.7%   +77.7%     
==========================================
  Files            0     259     +259     
  Lines            0   30495   +30495     
==========================================
+ Hits             0   23704   +23704     
- Misses           0    5299    +5299     
- Partials         0    1492    +1492     
Files Coverage Δ
fetch/interface.go 100.0% <100.0%> (ø)
fetch/peers/peers.go 100.0% <100.0%> (ø)
syncer/data_fetch.go 76.4% <100.0%> (ø)
syncer/find_fork.go 78.2% <100.0%> (ø)
syncer/syncer.go 92.8% <95.4%> (ø)
fetch/fetch.go 82.0% <91.7%> (ø)
syncer/state_syncer.go 73.9% <57.1%> (ø)

... and 252 files with indirect coverage changes

@dshulyak dshulyak changed the title sync: prioritize responsive peers with low latency sync: prioritize peers with higher success rate and low latency Oct 12, 2023
@dshulyak
Copy link
Contributor Author

  • submit change with formatting changes in touched files

@dshulyak dshulyak marked this pull request as ready for review October 12, 2023 08:37
Copy link
Contributor

@poszu poszu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you benchmark the SelectBest() method? I'm not sure how often it is called and what's the expected total number of peers, but it takes around 3ms to find 10 best peers from a set of 50.

fetch/fetch_test.go Outdated Show resolved Hide resolved
fetch/fetch_test.go Outdated Show resolved Hide resolved
fetch/peers/peers.go Outdated Show resolved Hide resolved
fetch/peers/peers.go Outdated Show resolved Hide resolved
fetch/peers/peers.go Outdated Show resolved Hide resolved
syncer/syncer.go Show resolved Hide resolved
@dshulyak
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Oct 12, 2023
@bors
Copy link

bors bot commented Oct 12, 2023

try

Build failed:

@dshulyak
Copy link
Contributor Author

@poszu i added benchmark. below is SelectBest(10) from 10000 total. how did you get 3 ms?

BenchmarkSelectBest-24 2854 401721 ns/op 240 B/op 2 allocs/op

@dshulyak
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Oct 13, 2023
closes: #5127 #5036

peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer:
- request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency
- latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs).

related: #4977

limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once.

synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
@bors
Copy link

bors bot commented Oct 13, 2023

Build failed:

@dshulyak
Copy link
Contributor Author

bors merge

@dshulyak
Copy link
Contributor Author

bors cancel

@bors
Copy link

bors bot commented Oct 13, 2023

Canceled.

@dshulyak
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Oct 13, 2023
closes: #5127 #5036

peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer:
- request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency
- latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs).

related: #4977

limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once.

synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
@bors
Copy link

bors bot commented Oct 13, 2023

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title sync: prioritize peers with higher success rate and low latency [Merged by Bors] - sync: prioritize peers with higher success rate and low latency Oct 13, 2023
@bors bors bot closed this Oct 13, 2023
@poszu
Copy link
Contributor

poszu commented Oct 13, 2023

@poszu i added benchmark. below is SelectBest(10) from 10000 total. how did you get 3 ms?

BenchmarkSelectBest-24 2854 401721 ns/op 240 B/op 2 allocs/op

I'm sorry, my bad - I can't do the math 🤦 it was 3µs.

dshulyak added a commit to dshulyak/go-spacemesh that referenced this pull request Oct 13, 2023
…emeshos#5143)

closes: spacemeshos#5127 spacemeshos#5036

peers that are overwhelmed or generally will not be used for requests. there are two criteria used to select good peer:
- request success rate . success rates within 0.1 (10%) of each other are treated as equal, and in such case we will use latency
- latency. hs/1 protocol used to track latency, as it is the most used protocol and objects served in this protocol are of the same size with several exceptions (active sets, list of malfeasence proofs).

related: spacemeshos#4977

limits number of peers to request data for atxs. previously we were requesting data from all peers atleast once.

synced data 2 times in 90m, previous attempt on my computer was 1 week ago and took 12h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sync stuck despite having many peers
2 participants