Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(net): test syncing from geth #623

Merged
merged 27 commits into from Jan 31, 2023
Merged

feat(net): test syncing from geth #623

merged 27 commits into from Jan 31, 2023

Conversation

Rjected
Copy link
Member

@Rjected Rjected commented Dec 26, 2022

reth integration tests

This adds tests which spawn an instance of geth and configure it to produce blocks with cilque. Geth is sent transactions, which gets used to produce blocks. This is then used to test the following situations:

  • Reth full sync test, adding geth as reth's only outbound peer
  • Reth network keepalive test, initializing just reth's networking stack, and making sure we can stay connected to a peered geth instance for 30 seconds.

Depends on:

TODO:

  • Write sync test that sets up the reth pipeline and peers geth
    • Get geth to produce blocks
    • Set up a reth Genesis
    • Rest of pipeline
    • Put transactions in blocks (currently they are empty but we do have a funded account)
  • Consider moving to different folder since we will be using the entire pipeline
  • Figure out how to produce blocks
    • Configures geth in with clique, sending transactions that should get automatically mined.
  • Remove patch
  • Integrate pipeline builder from feat: pipeline builder #1017

@Rjected Rjected added C-test A change that impacts how or what we test A-networking Related to networking in general labels Dec 29, 2022
@gakonst
Copy link
Member

gakonst commented Jan 2, 2023

@Rjected gm - planned next steps here?

@Rjected
Copy link
Member Author

Rjected commented Jan 2, 2023

gm @gakonst

still debugging an invalid sender error, looking through geth to see if I activated EIP155 correctly

Then:

  • Initialize other components
  • Run the test

@gakonst
Copy link
Member

gakonst commented Jan 2, 2023

Cool - pls ping me if you're still blocked on this by eod (if you're online today?) so that we debug together as it's important we get confidence that it's the network and not us causing the peering issues

@Rjected Rjected force-pushed the dan/geth-sync-test branch 5 times, most recently from 920587b to 76cb0f7 Compare January 6, 2023 23:03
@codecov-commenter
Copy link

codecov-commenter commented Jan 6, 2023

Codecov Report

Merging #623 (e6b4844) into main (cc43b72) will increase coverage by 0.89%.
The diff coverage is 73.39%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #623      +/-   ##
==========================================
+ Coverage   74.46%   75.35%   +0.89%     
==========================================
  Files         319      321       +2     
  Lines       34844    34949     +105     
==========================================
+ Hits        25945    26336     +391     
+ Misses       8899     8613     -286     
Flag Coverage Δ
unit-tests 75.35% <73.39%> (+0.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
crates/staged-sync/src/lib.rs 100.00% <ø> (ø)
crates/staged-sync/src/test_utils/clique.rs 63.93% <63.93%> (ø)
...es/staged-sync/src/test_utils/clique_middleware.rs 85.41% <85.41%> (ø)
crates/metrics/metrics-derive/src/expand.rs 84.36% <0.00%> (-14.19%) ⬇️
crates/metrics/metrics-derive/src/metric.rs 75.00% <0.00%> (-3.13%) ⬇️
bin/reth/src/p2p/mod.rs 0.00% <0.00%> (ø)
bin/reth/src/node/mod.rs 0.00% <0.00%> (ø)
bin/reth/src/stage/mod.rs 0.00% <0.00%> (ø)
crates/net/network/src/peers/manager.rs 83.84% <0.00%> (+0.18%) ⬆️
crates/stages/src/stages/bodies.rs 93.34% <0.00%> (+0.22%) ⬆️
... and 32 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great!
incredibly useful.

should we go over this after the call Monday to assess what's missing/blockers etc?

@Rjected
Copy link
Member Author

Rjected commented Jan 9, 2023

should we go over this after the call Monday to assess what's missing/blockers etc?

yeah that would be very useful, lets do it @mattsse

@Rjected
Copy link
Member Author

Rjected commented Jan 10, 2023

big refactor:

  • network test utils are now in a new crate reth-net-test-utils. I am thinking of bringing this out into a module reth_network::test_utils (like reth_provider::test_utils), as long as it doesn't add unnecessary dependencies to the network crate. additional dev-deps are fine
  • the geth sync test now lives in reth-tests (in crates/tests), which will contain other structs and utils for writing integration tests with geth. we can also import other parts of reth from here

@Rjected
Copy link
Member Author

Rjected commented Jan 12, 2023

almost done, just noting down things left TODO:

  • Create conversions from ethers_core::utils::ChainConfig => reth_consensus::Config, then use that to initialize BeaconConsensus and pass it into the RethBuilder
  • Create conversion from ethers_core::utils::Genesis => reth_cli_utils::Genesis, also pass it into RethBuilder
  • Fill blocks with transactions and change period to 0 so blocks seal instantly
  • Assert that we get blocks, headers, etc

We should have enough support code once this is done s.t. we can write all sorts of tests with geth, for example testing that reth passes the geth snap checkpoint. We would need to configure geth's checkpoint for this

@Rjected Rjected force-pushed the dan/geth-sync-test branch 2 times, most recently from 2b0e05d to 659c95a Compare January 13, 2023 04:59
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this landed. A big nitpick but I think we need some refactors and splitting in smaller PRs because it's a lot. I also feel we don't need dedicated crates for these and should be able to roll most in the existing ones?

crates/primitives/src/proofs.rs Outdated Show resolved Hide resolved
crates/primitives/src/account.rs Outdated Show resolved Hide resolved
crates/net/network/tests/it/connect.rs Outdated Show resolved Hide resolved
crates/net/network/Cargo.toml Outdated Show resolved Hide resolved
crates/net/test-utils/src/testnet.rs Outdated Show resolved Hide resolved
crates/tests/src/clique.rs Outdated Show resolved Hide resolved
crates/tests/src/clique.rs Outdated Show resolved Hide resolved
crates/tests/src/sync.rs Outdated Show resolved Hide resolved
crates/tests/src/sync.rs Outdated Show resolved Hide resolved
crates/tests/src/sync.rs Outdated Show resolved Hide resolved
@Rjected
Copy link
Member Author

Rjected commented Jan 18, 2023

squashed the commits and rebased on main - this now only contains code related to crates/tests

The only thing blocking this PR is a CliqueConsensus implementation, because BeaconConsensus enforces that extraData cannot be longer than 32 bytes, and clique always produces extraData longer than 32 bytes, which causes the geth peer to be kicked after failing header validation.
As a result, this integration test would not be covering BeaconConsenus validation.

@Rjected Rjected force-pushed the dan/geth-sync-test branch 3 times, most recently from 77e2a39 to afe1e75 Compare January 20, 2023 15:26
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After tons of ether PRs and debugging, this will be very useful.
great work!

only smol nit.
and we need to discuss where to put this exactly.

Comment on lines 181 to 184
fs::remove_dir_all(dir_path).await.unwrap();
})
.await
.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be unnecessary because tempdir already does this on drop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I had to do this because I was using into_path, which requires manual deletion, but switching to path works and does not require doing this

Comment on lines 306 to 309
fs::remove_dir_all(dir_path).await.unwrap();
})
.await
.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +107 to +117
tokio::spawn(async move {
loop {
if let (Ok(line), _line_str) = {
let mut buf = String::new();
(err_reader.read_line(&mut buf), buf.clone())
} {
if line == 0 {
break
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for debugging, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print_logs is for debugging, but prevent_blocking is necessary to prevent the stderr pipe buffer from filling up and causing the test to hang. Using Drop doesn't seem to work here because it causes geth to crash (probably due to EPIPE, but I'm not entirely sure). I'm not sure if there's a better way to prevent the stderr pipe buffer from filling though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we send to /dev/null instead and not bother with this? either way let's do in a followup

Copy link
Member Author

@Rjected Rjected Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't do it by default because Geth captures the stderr to make sure rpc, p2p, etc are up when spawn is called. So have to first pipe the Command's output with Stdio::piped(). Maybe we could spawn cat or something, with geth's stderr as cat's stdin, and pipe cat to /dev/null

Comment on lines +5 to +8

#[cfg(any(test, feature = "test-utils"))]
/// Common helpers for integration testing.
pub mod test_utils;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if staged-sync is the right crate here, I guess so we can use the pipeline and test certain parts of it against a GethInstance?

perhaps it would be appropriate to move to its own crate?

wdyt @onbjerg ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer it here, or in any node-named packages (ref #1079 )

Comment on lines +107 to +117
tokio::spawn(async move {
loop {
if let (Ok(line), _line_str) = {
let mut buf = String::new();
(err_reader.read_line(&mut buf), buf.clone())
} {
if line == 0 {
break
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we send to /dev/null instead and not bother with this? either way let's do in a followup

Comment on lines +5 to +8

#[cfg(any(test, feature = "test-utils"))]
/// Common helpers for integration testing.
pub mod test_utils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer it here, or in any node-named packages (ref #1079 )

Comment on lines 77 to 79
let remote_genesis = SealedHeader::from(provider.remote_genesis_block().await.unwrap());

let mut local_genesis_header: Header = chainspec.genesis().clone().into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't clique.0.genesis() load Geth's genesis file? how is remote_genesis_block() relevant here?

Comment on lines 88 to 89
let local_genesis = local_genesis_header.seal();
assert_eq!(local_genesis, remote_genesis, "genesis blocks should match, we computed {local_genesis:#?} but geth computed {remote_genesis:#?}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a redundant sanity check right? aren't these always going to be the same since they're both from geth?

Comment on lines 201 to 272
reth_tracing::init_test_tracing();
tokio::time::timeout(GETH_TIMEOUT, async move {
// first create a signer that we will fund so we can make transactions
let chain_id = 13337u64;
let data_dir = tempfile::tempdir().expect("should be able to create temp geth datadir");
let dir_path = data_dir.path();
tracing::info!(
data_dir=?dir_path,
"initializing geth instance"
);

// this creates a funded geth
let clique_geth = Geth::new()
.chain_id(chain_id)
.p2p_port(unused_port())
.data_dir(dir_path.to_str().unwrap());

// build the funded geth
let (mut clique, provider) = CliqueGethInstance::new(clique_geth, None).await;
let geth_p2p_port = clique.0.p2p_port().expect("geth should be configured with a p2p port");
tracing::info!(
p2p_port=%geth_p2p_port,
rpc_port=%clique.0.port(),
"configured clique geth instance in keepalive test"
);

// don't print logs, but drain the stderr
clique.prevent_blocking().await;

// get geth to start producing blocks - use a blank password
let clique_private_key = clique.0.clique_private_key().clone().expect("clique should be configured with a private key");
provider.enable_mining(clique_private_key, "".into()).await.unwrap();

// === check that we have the same genesis hash ===

// get the chainspec from the genesis we configured for geth
let mut chainspec: ChainSpec = clique.0.genesis().clone().expect("clique should be configured with a genesis").into();
let remote_genesis = SealedHeader::from(provider.remote_genesis_block().await.unwrap());

let mut local_genesis_header = Header::from(chainspec.genesis().clone());

let hardforks = chainspec.hardforks();

// set initial base fee depending on eip-1559
if let Some(0) = hardforks.get(&Hardfork::London) {
local_genesis_header.base_fee_per_gas = Some(EIP1559_INITIAL_BASE_FEE);
}

let local_genesis = local_genesis_header.seal();
assert_eq!(local_genesis, remote_genesis, "genesis blocks should match, we computed {local_genesis:#?} but geth computed {remote_genesis:#?}");

// set the chainspec genesis hash
chainspec.genesis_hash = local_genesis.hash();

// === create many blocks ===

let nonces = 0..1000u64;
let txs = nonces
.map(|nonce| {
// create a tx that just sends to the zero addr
TypedTransaction::Eip1559(Eip1559TransactionRequest::new()
.to(H160::zero())
.value(1u64)
.nonce(nonce))
});
tracing::info!("generated transactions for blocks");

// finally send the txs to geth
provider.send_requests(txs).await.unwrap();

let block = provider.get_block_number().await.unwrap();
assert!(block > U64::zero());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this boilerplate should go to a helper function, since it's duped with the code above

Comment on lines 305 to 306
// wait for the session to be established
let _peer_id = events.peer_added_and_established().await.unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we add a assert_eq!(geth_peer_id, peer_id) here? what is the purpose of the keepalive stuff below?

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally :) this exposed a lot of things to improve, so im glad we did this.

@gakonst gakonst merged commit f771e23 into main Jan 31, 2023
@gakonst gakonst deleted the dan/geth-sync-test branch January 31, 2023 05:35
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 6, 2023
Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-networking Related to networking in general C-test A change that impacts how or what we test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants