New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - sync: new protocol to fetch layer hash and certificate #4846
[Merged by Bors] - sync: new protocol to fetch layer hash and certificate #4846
Conversation
bors try |
Codecov Report
@@ Coverage Diff @@
## develop #4846 +/- ##
=========================================
- Coverage 76.9% 76.9% -0.1%
=========================================
Files 261 261
Lines 29807 30103 +296
=========================================
+ Hits 22951 23166 +215
- Misses 5392 5453 +61
- Partials 1464 1484 +20
|
tryBuild succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
could you please add how json config changes in pr description. tbh such options should not be even added to command line as it will be harder to drop them later. but i assume this needed for tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would consider doing upgrade without configured layer, it seems much more convenient for testing on mainnet in this case.
i think in system like this peers are expected to not respond or disconnect randomly. printing warnings doesn't make much sense, i would replace that with prometheus counter as it is mainly useful for operations
syncer/state_syncer.go
Outdated
} | ||
|
||
func (s *Syncer) fetchOpinions(ctx context.Context, lid types.LayerID) ([]*peerOpinion, []*types.Certificate, error) { | ||
if s.ticker.CurrentLayer() >= types.LayerID(s.cfg.UpdateLayer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that upgrade path without configured layer is more convenient and robust. if there is a regression we will be able to notice it immediately
i am referring to a upgrade path, where node checks if it has any peers with new protocol and if so then uses new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used PeerStore().GetProtocols() to make upgrade path automatic.
peer will first fetch from v2 peers, and then fetch from peers that don't support v2
as this is not on critical path, i did this sequentially instead of concurrently for simpler code.
fetch/mesh_data.go
Outdated
var peerCert types.Certificate | ||
err = codec.Decode(data, &peerCert) | ||
if err != nil { | ||
defer close(done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consumer reads exactly one event from done and then continues, same for output. doesn't look good with this defers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! fixed
done <- err | ||
return | ||
} | ||
if peerCert.BlockID != bid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check also somewhat inconsistent now, other messages check it in handlers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is true for generic hash fetches. (atx/block/proposal/ballot/tx).
however, certificate doesn't go through that path. it's specifically requested, as block certificate doesn't have an ID associated with it. keeping it and added comment.
bors try |
tryBuild succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
bors try |
tryBuild succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
bors merge |
## Motivation Closes #4674 ## Changes when a node poll layer opinions from peers, it gets back a response from each peer. ```go type LayerOpinion struct { PrevAggHash types.Hash32 Certified *types.BlockID } ``` node then go through each response and collect unique BlockIDs and fetch certificate for those blocks. ## before and after in TestAddNodes, total amount of data a new node downloaded at layer 39 reduced from 10_647_927 bytes to 208_435 bytes before ![Screenshot from 2023-08-18 09-45-36](https://github.com/spacemeshos/go-spacemesh/assets/30611210/92eb3c9a-ce2e-4aee-994a-56504600427d) after ![Screenshot from 2023-08-18 09-48-18](https://github.com/spacemeshos/go-spacemesh/assets/30611210/e59ebab6-8d38-472e-b4c3-7dbcbc174f77) ## testing - systests TestAddNodes starts 28 nodes with new protocol and later added 2 nodes to sync with old protocol - manually sync with mainnet with new protocol enabled ## note on deployment - new nodes will fetch opinions with new protocol from nodes that support it. and with old protocol from nodes that don't support it. - in the case of rollback, set syncer config `UseNewProtocol` to false ``` Sync: syncer.Config{ ... UseNewProtocol: false, }, ``` all nodes will then only request/serve the old protocol
Pull request successfully merged into develop. Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page.
|
Motivation
Closes #4674
Changes
when a node poll layer opinions from peers, it gets back a response from each peer.
node then go through each response and collect unique BlockIDs and fetch certificate for those blocks.
before and after
in TestAddNodes, total amount of data a new node downloaded at layer 39 reduced from 10_647_927 bytes to 208_435 bytes
before
after
testing
note on deployment
UseNewProtocol
to falseall nodes will then only request/serve the old protocol