Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - sync: new protocol to fetch layer hash and certificate #4846

Closed

Conversation

countvonzero
Copy link
Contributor

@countvonzero countvonzero commented Aug 14, 2023

Motivation

Closes #4674

Changes

when a node poll layer opinions from peers, it gets back a response from each peer.

type LayerOpinion struct {
	PrevAggHash types.Hash32
	Certified *types.BlockID
}

node then go through each response and collect unique BlockIDs and fetch certificate for those blocks.

before and after

in TestAddNodes, total amount of data a new node downloaded at layer 39 reduced from 10_647_927 bytes to 208_435 bytes

before
Screenshot from 2023-08-18 09-45-36

after
Screenshot from 2023-08-18 09-48-18

testing

  • systests TestAddNodes starts 28 nodes with new protocol and later added 2 nodes to sync with old protocol
  • manually sync with mainnet with new protocol enabled

note on deployment

  • new nodes will fetch opinions with new protocol from nodes that support it. and with old protocol from nodes that don't support it.
  • in the case of rollback, set syncer config UseNewProtocol to false
		Sync: syncer.Config{
                        ...
			UseNewProtocol:   false,
		},

all nodes will then only request/serve the old protocol

@countvonzero
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Aug 14, 2023
@codecov
Copy link

codecov bot commented Aug 14, 2023

Codecov Report

Merging #4846 (01a2145) into develop (bfa734e) will decrease coverage by 0.1%.
The diff coverage is 71.5%.

@@            Coverage Diff            @@
##           develop   #4846     +/-   ##
=========================================
- Coverage     76.9%   76.9%   -0.1%     
=========================================
  Files          261     261             
  Lines        29807   30103    +296     
=========================================
+ Hits         22951   23166    +215     
- Misses        5392    5453     +61     
- Partials      1464    1484     +20     
Files Changed Coverage Δ
fetch/interface.go 100.0% <ø> (ø)
fetch/metrics.go 100.0% <ø> (ø)
p2p/upgrade.go 64.2% <0.0%> (-1.0%) ⬇️
fetch/wire_types.go 63.6% <33.3%> (-8.5%) ⬇️
fetch/handler.go 48.4% <53.6%> (+4.1%) ⬆️
fetch/mesh_data.go 64.6% <63.4%> (-1.1%) ⬇️
fetch/fetch.go 77.8% <71.4%> (-0.3%) ⬇️
sql/certificates/certs.go 76.5% <78.5%> (+0.2%) ⬆️
syncer/state_syncer.go 76.1% <81.2%> (-2.2%) ⬇️
syncer/data_fetch.go 77.0% <84.5%> (+2.9%) ⬆️
... and 2 more

... and 3 files with indirect coverage changes

@bors
Copy link

bors bot commented Aug 14, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

config/mainnet.go Outdated Show resolved Hide resolved
@dshulyak
Copy link
Contributor

in the case of rollback, simply set --sync-update-layer to a big number via config.

could you please add how json config changes in pr description. tbh such options should not be even added to command line as it will be harder to drop them later. but i assume this needed for tests

Copy link
Contributor

@dshulyak dshulyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would consider doing upgrade without configured layer, it seems much more convenient for testing on mainnet in this case.

i think in system like this peers are expected to not respond or disconnect randomly. printing warnings doesn't make much sense, i would replace that with prometheus counter as it is mainly useful for operations

fetch/wire_types.go Outdated Show resolved Hide resolved
}

func (s *Syncer) fetchOpinions(ctx context.Context, lid types.LayerID) ([]*peerOpinion, []*types.Certificate, error) {
if s.ticker.CurrentLayer() >= types.LayerID(s.cfg.UpdateLayer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that upgrade path without configured layer is more convenient and robust. if there is a regression we will be able to notice it immediately

i am referring to a upgrade path, where node checks if it has any peers with new protocol and if so then uses new

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used PeerStore().GetProtocols() to make upgrade path automatic.
peer will first fetch from v2 peers, and then fetch from peers that don't support v2
as this is not on critical path, i did this sequentially instead of concurrently for simpler code.

syncer/state_syncer.go Outdated Show resolved Hide resolved
var peerCert types.Certificate
err = codec.Decode(data, &peerCert)
if err != nil {
defer close(done)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consumer reads exactly one event from done and then continues, same for output. doesn't look good with this defers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed

fetch/mesh_data.go Show resolved Hide resolved
done <- err
return
}
if peerCert.BlockID != bid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check also somewhat inconsistent now, other messages check it in handlers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true for generic hash fetches. (atx/block/proposal/ballot/tx).
however, certificate doesn't go through that path. it's specifically requested, as block certificate doesn't have an ID associated with it. keeping it and added comment.

syncer/data_fetch.go Outdated Show resolved Hide resolved
syncer/data_fetch.go Outdated Show resolved Hide resolved
@countvonzero
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Aug 18, 2023
@bors
Copy link

bors bot commented Aug 18, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@countvonzero
Copy link
Contributor Author

bors try

bors bot added a commit that referenced this pull request Aug 18, 2023
@bors
Copy link

bors bot commented Aug 18, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@countvonzero
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Aug 18, 2023
## Motivation
Closes #4674

## Changes
when a node poll layer opinions from peers, it gets back a response from each peer. 
```go
type LayerOpinion struct {
	PrevAggHash types.Hash32
	Certified *types.BlockID
}
```
node then go through each response and collect unique BlockIDs and fetch certificate for those blocks.

## before and after
in TestAddNodes, total amount of data a new node downloaded at layer 39 reduced from 10_647_927 bytes to 208_435 bytes

before
![Screenshot from 2023-08-18 09-45-36](https://github.com/spacemeshos/go-spacemesh/assets/30611210/92eb3c9a-ce2e-4aee-994a-56504600427d)

after
![Screenshot from 2023-08-18 09-48-18](https://github.com/spacemeshos/go-spacemesh/assets/30611210/e59ebab6-8d38-472e-b4c3-7dbcbc174f77)

## testing
- systests TestAddNodes starts 28 nodes with new protocol and later added 2 nodes to sync with old protocol
- manually sync with mainnet with new protocol enabled

## note on deployment
- new nodes will fetch opinions with new protocol from nodes that support it. and with old protocol from nodes that don't support it.
- in the case of rollback, set syncer config `UseNewProtocol` to false
```
		Sync: syncer.Config{
                        ...
			UseNewProtocol:   false,
		},
```
all nodes will then only request/serve the old protocol
@bors
Copy link

bors bot commented Aug 18, 2023

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title sync: new protocol to fetch layer hash and certificate [Merged by Bors] - sync: new protocol to fetch layer hash and certificate Aug 18, 2023
@bors bors bot closed this Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sync: layer opininons protocol uses too much traffic
2 participants