Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create infrastructure for the Holesky network #152

Closed
zah opened this issue Aug 17, 2023 · 43 comments
Closed

Create infrastructure for the Holesky network #152

zah opened this issue Aug 17, 2023 · 43 comments
Assignees

Comments

@zah
Copy link
Member

zah commented Aug 17, 2023

Nimbus will run 100,000 validator on the new Holesky testnet.

We should plan the addition of at least 30 new servers to our fleet in order to be able to operate such a large number of validators in a heavier-to-process network.

As usual, our validator keys are available here:
https://github.com/status-im/nimbus-private/tree/master/holesky_deposits

@jakubgs
Copy link
Member

jakubgs commented Aug 17, 2023

Related PR: eth-clients/holesky#41

@jakubgs
Copy link
Member

jakubgs commented Aug 17, 2023

Here's some questions:

  • What is the purpose of this testnet?
  • Is it going to be a long-lived testnet?
  • What's the deadline for this? Or more exactly, when is genesis?
  • What are the host requirements? Same as Prater? Smaller? Bigger?
  • Any idea how big in terms of storage this testnet will be? Small as Sepolia? Big as Prater/Goerli? Bigger?
  • What layout will we want for this fleet? Do we need only linux hosts? Mixed?

@zah
Copy link
Member Author

zah commented Aug 17, 2023

This network is going to replace Prater, so we would like to continue all of the practices that we follow on Prater.
The purpose of the network is to be significantly bigger than mainnet, so performance issues in the clients are discovered there first. This may or may not mean that we need bigger hosts than Prater (we don't have much actual performance numbers yet). Over time, it will surely consume more disk space than Prater.

@zah
Copy link
Member Author

zah commented Aug 17, 2023

The planned launch date is Sept/15, 2023, 14:00 UTC .

@jakubgs
Copy link
Member

jakubgs commented Aug 18, 2023

The current hosts we use from InnovaHosting are like this:

  • nimbus.mainnet - Xeon CPU E5-2690 v2 @ 3.00GHz (10 cores)
  • nimbus.prater - Xeon CPU E5-2667 v3 @ 3.20GHz (8 cores)
  • nimbus.sepolia - Xeon CPU E5-2667 v3 @ 3.20GHz (8 cores)

All hosts have 64 GB of RAM, and disk layout can be customized.

Do you think any of those processors would fit, or would you want me to ask IH sales about something different?

@jakubgs
Copy link
Member

jakubgs commented Aug 22, 2023

I have opened a ticket with Innova Hosting sales team:

Hello,

We will need 30 servers for a new Ethereum Testnet called Holesky, which has genesis on Sept/15, 2023, 14:00 UTC.

Sorry for the short notice but I was only informed about this need recently.

I think the Xeon CPU E5-2667 v3 @ 3.20GHz CPUs will be fine, unless you have something stronger in terms of single core performance available or purchasable in tame before the 15th.

As for storage. Is the 1.6 TB NVMe the upper limit of storage size SSDs you have available? If so then we'll need the same setup as with other hosts. One small SSD for OS(at least 100 GB), and two big SSDs for data. If possible it would be nice if the big ones were 2 TB or bigger, but if not possible 1.6 TB will probably be fine.

Cheers!

https://client.innovahosting.net/viewticket.php?tid=774553&c=wQeslgJE

@jakubgs
Copy link
Member

jakubgs commented Aug 22, 2023

Their sales rep responded with:

We do not have 30 servers at the moment, I think we can activate about 10 servers and the rest of the servers untill octomber 1-st.

I will check once again with our CTO and tell you in 1-2 days how many servers we can activate now and how many days we need to order and receive the rest of the servers.

And I added:

The testnet genesis is on the 15th of September, and since this is a completely new testnet there is nothing to sync, since it has not yet started. So as long as I can get the hosts on 13th or even 14th I can get them ready for genesis on the 15th.

If all 30 at not possible in time for the 15th then we can probably make due with less, but at least half would be good to have. As for CPUs, any chance for anything with higher single-core performance, or not really?

@jakubgs
Copy link
Member

jakubgs commented Aug 29, 2023

We had a bit of back-and-forth:

We can activate about 10 servers quickly.
For the other 20 servers we are out of SSD 1.6TB at the moment.
We will receive the next package of 50x SSD 1.6TB in about one month
You will have to wait for 1.6TB to activate he next 2o servers or we can make a quick order for 800GB SSD or a few SSD-s of 3.84 TB

I said that 800 GB is fine, so they said:

OK, to make sure

Server HP DL360 Gen9 with 2 PSU
CPU1: E5-2667 v3
CPU2: No CPU
RAM:4x 16GB DDR4
SSD1: 400GB for OS. NO RAID
SSD2: 800GB (after september 20, we will receive 1.6TB SAS-SSD. We will insert in in the server and you can move the data from 800GB to 1.6TB. 800GB will be removed after you move the data to 1.6TB). NO RAID
SSD3: 800GB (after september 20, we will receive 1.6TB SAS-SSD. We will insert in in the server and you can move the data from 800GB to 1.6TB. 800GB will be removed after you move the data to 1.6TB) NO RAID

All disks mentioned above are write intensive

We will also receive up to 20x 3.84 SANDISK, READ intensive. Will you need this kind of SSD ?

Tell me if everything here is OK and I will calculate the price.

But then it turns out they actually can do the 1.6 TB ones:

I have just received an confirmation from our vendor that the SSD can be delivered earlier by 7 days so I guess we can install all servers with 1.6TB untill September 15-th

You will need to decide faster so we could place the order for components

I responded to them via email while on holidays but it turns out they don't receive those responses.

Confirmed today that is fine and that I need a quote as soon as possible.

@jakubgs
Copy link
Member

jakubgs commented Aug 29, 2023

Here's the quote:

Regarding the server configuration that we were discussing in the ticket.

Server HP DL360 Gen9 with 2 PSU
CPU1: E5-2667 v3
CPU2: No CPU
RAM:4x 16GB DDR4
Raid controller HP P440ar/2GB
SSD1: 400GB for OS. NO RAID
SSD2: 1.6 TB no raid configured.
SSD3: 1.6 TB no raid configured.
Internet connection 1Gbps

The price for one server is 173Euro/month.

5% discount can be applied if you pay for one year.

Additional 5% discount can be applied if you pay by crypto.

Which adds up to a total of 56052 EUR after discounts.

@jakubgs
Copy link
Member

jakubgs commented Aug 29, 2023

I have opened process ID 3968 in Spiff. Blocked on approval from Johannes.

https://prod.mod.spiff.status.im/admin/process-instances/for-me/manage-procurement:procurement:requisition-order-management:request-goods-services/3968

We really need some short links...

@jakubgs
Copy link
Member

jakubgs commented Aug 31, 2023

Because I selected "Infrastructure" project the request was rejected, and I have no way of editing it.
Created a new one that uses "ETH2" as project because that's apparently Nimbus team:

https://prod.mod.spiff.status.im/admin/process-instances/for-me/manage-procurement:procurement:requisition-order-management:request-goods-services/3981

@jakubgs
Copy link
Member

jakubgs commented Aug 31, 2023

I have asked them to verify if they purchased the SSDs and they did:

We have ordered the following SSDs:

400GB SAS-SSD HGST (Write Intensive)
800GB SAS-SSD HGST (Write Intensive)
3.84TB Toshiba (Read Intensive)

These have been shipped and are expected to arrive in 5 days.

We have also ordered:

1.6TB SAS-SSD HGST (Write Intensive)
7.68TB SAS-SSD HGST (Write Intensive)

These will be shipped on Monday and we expect to receive them within 7 days after shipping.

As for payment, we are comfortable accepting USDT on either the ERC20 or TRC20 network. We can also accept ETH and Bitcoin, although we prefer USDT.

They have also informed us they prefer a payment in USDT.

@jakubgs
Copy link
Member

jakubgs commented Sep 12, 2023

I have requested the update on servers setup and got back:

The technical team is now installing the servers, hope it will manage to activate all untill september 15

I have informed them that the sooner I can get a few initial servers the sooner I can work on the new setup to get it done in time.

@jakubgs
Copy link
Member

jakubgs commented Sep 12, 2023

And looks like go-ethereum release with support for Holesky testnet just came out 2 hours ago:

It also apparently has stable support for new database model with support for proper pruning:

Still, just to quickly recap, Geth v1.13.0 finally ships a new database model which supports proper, full pruning of historical states; meaning you will never need to take your node offline again to resync or to manually prune. The new database model is optional for now (you need to enable it via --state.scheme=path) and does require resyncing the state, since we need to store it completely different (you can keep your ancients, no need to resync the chain too).

@jakubgs
Copy link
Member

jakubgs commented Sep 12, 2023

@jakubgs
Copy link
Member

jakubgs commented Sep 12, 2023

I've upgraded all 3 roles to the versions that support Holesky:

Will roll out new Geth to other fleets.

@jakubgs
Copy link
Member

jakubgs commented Sep 12, 2023

Also decided to add verison to Consul metadata:

Which should make it easier to track versions across fleets.

@jakubgs
Copy link
Member

jakubgs commented Sep 14, 2023

Got an email Dan Popusoi about some servers being close to becoming available:

Jakub, I am out of office right now and can't reply by ticket.

Today our technical will send you the login credentials to 10-15 servers.

These servers are installed in our new server room. Waiting for our electrician to connect the server room to electricity.

So hopefully we should have half the fleet up and running today. We'll see.

@jakubgs
Copy link
Member

jakubgs commented Sep 14, 2023

Also, apparently:

You will be the first client in the new server room

So that's cool... I guess. We'll be testing their new setup :D. Hopefully it won't explode.

jakubgs added a commit that referenced this issue Sep 14, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Member

jakubgs commented Sep 14, 2023

Looks like we will received servers tomorrow in the morning. Which means we will have little time for setup.
Hopefully there's nothing to sync, so we just need to have the nodes up and running with validators in time.

I have prepared a new layout for this fleet in a branch: https://github.com/status-im/infra-nimbus/tree/holesky-testnet

nodes_layout:
# Geth -------------------------------------------------------------------------
'linux-01.ih-eu-mda1.nimbus.holesky': # 0 each
- { branch: 'stable', el: 'geth', vc: true }
- { branch: 'testing', el: 'geth', vc: false }
- { branch: 'unstable', el: 'geth', vc: false }
- { branch: 'libp2p', el: 'geth', vc: false }
'linux-02.ih-eu-mda1.nimbus.holesky': # 1 each
- { branch: 'stable', start: 0, end: 1, el: 'geth', vc: true }
- { branch: 'testing', start: 1, end: 2, el: 'geth', vc: false }
- { branch: 'unstable', start: 2, end: 3, el: 'geth', vc: false }
- { branch: 'libp2p', start: 3, end: 4, el: 'geth', vc: false }
'linux-03.ih-eu-mda1.nimbus.holesky': # 5 each
- { branch: 'stable', start: 4, end: 9, el: 'geth', vc: true }
- { branch: 'testing', start: 9, end: 14, el: 'geth', vc: false }
- { branch: 'unstable', start: 14, end: 19, el: 'geth', vc: false }
- { branch: 'libp2p', start: 19, end: 24, el: 'geth', vc: false }
'linux-04.ih-eu-mda1.nimbus.holesky': # 14 each
- { branch: 'stable', start: 24, end: 38, el: 'geth', vc: true }
- { branch: 'testing', start: 38, end: 52, el: 'geth', vc: false }
- { branch: 'unstable', start: 52, end: 66, el: 'geth', vc: false }
- { branch: 'libp2p', start: 66, end: 80, el: 'geth', vc: false }
'linux-05.ih-eu-mda1.nimbus.holesky': # 20 each
- { branch: 'stable', start: 80, end: 100, el: 'geth', vc: true }
- { branch: 'testing', start: 100, end: 120, el: 'geth', vc: false }
- { branch: 'unstable', start: 120, end: 140, el: 'geth', vc: false }
- { branch: 'libp2p', start: 140, end: 160, el: 'geth', vc: false }
'linux-06.ih-eu-mda1.nimbus.holesky': # 110 each
- { branch: 'stable', start: 160, end: 270, el: 'geth', vc: true }
- { branch: 'testing', start: 270, end: 380, el: 'geth', vc: false }
- { branch: 'unstable', start: 380, end: 490, el: 'geth', vc: false }
- { branch: 'libp2p', start: 490, end: 600, el: 'geth', vc: false }
'linux-07.ih-eu-mda1.nimbus.holesky': # 400 each
- { branch: 'stable', start: 600, end: 1000, el: 'geth', vc: true }
- { branch: 'testing', start: 1000, end: 1400, el: 'geth', vc: false }
- { branch: 'unstable', start: 1400, end: 1800, el: 'geth', vc: false }
- { branch: 'libp2p', start: 1800, end: 2200, el: 'geth', vc: false }
'linux-08.ih-eu-mda1.nimbus.holesky': # 700 each
- { branch: 'stable', start: 2200, end: 2900, el: 'geth', vc: true }
- { branch: 'testing', start: 2900, end: 3600, el: 'geth', vc: false }
- { branch: 'unstable', start: 3600, end: 4300, el: 'geth', vc: false }
- { branch: 'libp2p', start: 4300, end: 5000, el: 'geth', vc: false }
'linux-09.ih-eu-mda1.nimbus.holesky': # 2000 each
- { branch: 'stable', start: 5000, end: 7000, el: 'geth', vc: true }
- { branch: 'testing', start: 7000, end: 9000, el: 'geth', vc: false }
- { branch: 'unstable', start: 9000, end: 11000, el: 'geth', vc: false }
- { branch: 'libp2p', start: 11000, end: 13000, el: 'geth', vc: false }
'linux-10.ih-eu-mda1.nimbus.holesky': # 5000 each
- { branch: 'stable', start: 13000, end: 18000, el: 'geth', vc: true }
- { branch: 'testing', start: 18000, end: 23000, el: 'geth', vc: false }
- { branch: 'unstable', start: 23000, end: 28000, el: 'geth', vc: false }
- { branch: 'libp2p', start: 28000, end: 33000, el: 'geth', vc: false }
# Erigon ------------------------------------------------------------------------
'linux-11.ih-eu-mda1.nimbus.holesky': # 0 each
- { branch: 'stable', el: 'erigon', vc: false }
- { branch: 'testing', el: 'erigon', vc: true }
- { branch: 'unstable', el: 'erigon', vc: false }
- { branch: 'libp2p', el: 'erigon', vc: false }
'linux-12.ih-eu-mda1.nimbus.holesky': # 1 each
- { branch: 'stable', start: 33000, end: 33001, el: 'erigon', vc: false }
- { branch: 'testing', start: 33001, end: 33002, el: 'erigon', vc: true }
- { branch: 'unstable', start: 33002, end: 33003, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 33003, end: 33004, el: 'erigon', vc: false }
'linux-13.ih-eu-mda1.nimbus.holesky': # 5 each
- { branch: 'stable', start: 33004, end: 33009, el: 'erigon', vc: false }
- { branch: 'testing', start: 33009, end: 33014, el: 'erigon', vc: true }
- { branch: 'unstable', start: 33014, end: 33019, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 33019, end: 33024, el: 'erigon', vc: false }
'linux-14.ih-eu-mda1.nimbus.holesky': # 14 each
- { branch: 'stable', start: 33024, end: 33038, el: 'erigon', vc: false }
- { branch: 'testing', start: 33038, end: 33052, el: 'erigon', vc: true }
- { branch: 'unstable', start: 33052, end: 33066, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 33066, end: 33080, el: 'erigon', vc: false }
'linux-15.ih-eu-mda1.nimbus.holesky': # 20 each
- { branch: 'stable', start: 33080, end: 33100, el: 'erigon', vc: false }
- { branch: 'testing', start: 33100, end: 33120, el: 'erigon', vc: true }
- { branch: 'unstable', start: 33120, end: 33140, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 33140, end: 33160, el: 'erigon', vc: false }
'linux-16.ih-eu-mda1.nimbus.holesky': # 110 each
- { branch: 'stable', start: 33160, end: 33270, el: 'erigon', vc: false }
- { branch: 'testing', start: 33270, end: 33380, el: 'erigon', vc: true }
- { branch: 'unstable', start: 33380, end: 33490, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 33490, end: 33600, el: 'erigon', vc: false }
'linux-17.ih-eu-mda1.nimbus.holesky': # 400 each
- { branch: 'stable', start: 33600, end: 34000, el: 'erigon', vc: false }
- { branch: 'testing', start: 34000, end: 34400, el: 'erigon', vc: true }
- { branch: 'unstable', start: 34400, end: 34800, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 34800, end: 35200, el: 'erigon', vc: false }
'linux-18.ih-eu-mda1.nimbus.holesky': # 700 each
- { branch: 'stable', start: 35200, end: 35900, el: 'erigon', vc: false }
- { branch: 'testing', start: 35900, end: 36600, el: 'erigon', vc: true }
- { branch: 'unstable', start: 36600, end: 37300, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 37300, end: 38000, el: 'erigon', vc: false }
'linux-19.ih-eu-mda1.nimbus.holesky': # 2000 each
- { branch: 'stable', start: 38000, end: 40000, el: 'erigon', vc: false }
- { branch: 'testing', start: 40000, end: 42000, el: 'erigon', vc: true }
- { branch: 'unstable', start: 42000, end: 44000, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 44000, end: 46000, el: 'erigon', vc: false }
'linux-20.ih-eu-mda1.nimbus.holesky': # 5000 each
- { branch: 'stable', start: 46000, end: 51000, el: 'erigon', vc: false }
- { branch: 'testing', start: 51000, end: 56000, el: 'erigon', vc: true }
- { branch: 'unstable', start: 56000, end: 61000, el: 'erigon', vc: false }
- { branch: 'libp2p', start: 61000, end: 66000, el: 'erigon', vc: false }
# Nethermind ---------------------------------------------------------------------
'linux-21.ih-eu-mda1.nimbus.holesky': # 0 each
- { branch: 'stable', el: 'nethermind', vc: false }
- { branch: 'testing', el: 'nethermind', vc: false }
- { branch: 'unstable', el: 'nethermind', vc: true }
- { branch: 'libp2p', el: 'nethermind', vc: false }
'linux-22.ih-eu-mda1.nimbus.holesky': # 1 each
- { branch: 'stable', start: 66000, end: 66001, el: 'nethermind', vc: false }
- { branch: 'testing', start: 66001, end: 66002, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 66002, end: 66003, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 66003, end: 66004, el: 'nethermind', vc: false }
'linux-23.ih-eu-mda1.nimbus.holesky': # 5 each
- { branch: 'stable', start: 66004, end: 66009, el: 'nethermind', vc: false }
- { branch: 'testing', start: 66009, end: 66014, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 66014, end: 66019, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 66019, end: 66024, el: 'nethermind', vc: false }
'linux-24.ih-eu-mda1.nimbus.holesky': # 14 each
- { branch: 'stable', start: 66024, end: 66038, el: 'nethermind', vc: false }
- { branch: 'testing', start: 66038, end: 66052, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 66052, end: 66066, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 66066, end: 66080, el: 'nethermind', vc: false }
'linux-25.ih-eu-mda1.nimbus.holesky': # 20 each
- { branch: 'stable', start: 66080, end: 66100, el: 'nethermind', vc: false }
- { branch: 'testing', start: 66100, end: 66120, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 66120, end: 66140, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 66140, end: 66160, el: 'nethermind', vc: false }
'linux-26.ih-eu-mda1.nimbus.holesky': # 110 each
- { branch: 'stable', start: 66160, end: 66270, el: 'nethermind', vc: false }
- { branch: 'testing', start: 66270, end: 66380, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 66380, end: 66490, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 66490, end: 66600, el: 'nethermind', vc: false }
'linux-27.ih-eu-mda1.nimbus.holesky': # 400 each
- { branch: 'stable', start: 66600, end: 67000, el: 'nethermind', vc: false }
- { branch: 'testing', start: 67000, end: 67400, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 67400, end: 67800, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 67800, end: 68200, el: 'nethermind', vc: false }
'linux-28.ih-eu-mda1.nimbus.holesky': # 700 each
- { branch: 'stable', start: 68200, end: 68900, el: 'nethermind', vc: false }
- { branch: 'testing', start: 68900, end: 69600, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 69600, end: 70300, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 70300, end: 71000, el: 'nethermind', vc: false }
'linux-29.ih-eu-mda1.nimbus.holesky': # 2000 each
- { branch: 'stable', start: 71000, end: 73000, el: 'nethermind', vc: false }
- { branch: 'testing', start: 73000, end: 75000, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 75000, end: 77000, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 77000, end: 79000, el: 'nethermind', vc: false }
'linux-30.ih-eu-mda1.nimbus.holesky': # 5000 each
- { branch: 'stable', start: 79000, end: 84000, el: 'nethermind', vc: false }
- { branch: 'testing', start: 84000, end: 89000, el: 'nethermind', vc: false }
- { branch: 'unstable', start: 89000, end: 94000, el: 'nethermind', vc: true }
- { branch: 'libp2p', start: 94000, end:100000, el: 'nethermind', vc: false }

The general idea is that we have 30 hosts split into 3 groups:

  • Hosts 01-10 running Geth, 4 nodes on each host, ~33000 validators in total. Validator clients on stable nodes.
  • Hosts 11-20 running Erigon, 4 nodes on each host, ~33000 validators in total. Validator clients on testing nodes.
  • Hosts 21-30 running Nethermind, 4 nodes on each host, ~33000+ validators in total. Validator clients on unstable nodes.

The idea is to see behavior of different execution layer nodes across different branches and numbers of validators.
Also, validator clients are enabled for stable

jakubgs added a commit to status-im/infra-role-beacon-node-linux that referenced this issue Sep 15, 2023
status-im/infra-nimbus#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/infra-role-beacon-node-macos that referenced this issue Sep 15, 2023
status-im/infra-nimbus#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit to status-im/infra-role-beacon-node-windows that referenced this issue Sep 15, 2023
status-im/infra-nimbus#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit that referenced this issue Sep 15, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Member

jakubgs commented Sep 15, 2023

The new hosts have started trickling in:

image

@jakubgs
Copy link
Member

jakubgs commented Sep 15, 2023

@jakubgs
Copy link
Member

jakubgs commented Sep 15, 2023

We got first batch of hosts, but there's some randomness with CPUs:

Hostname CPU Model
linux-01.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-02.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-03.ih-eu-mda1.nimbus.holesky Xeon E5-2698 v4 @ 2.20GHz
linux-04.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-05.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-06.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-07.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-08.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-09.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-10.ih-eu-mda1.nimbus.holesky Xeon E5-2698 v4 @ 2.20GHz
linux-11.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-12.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-13.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-14.ih-eu-mda1.nimbus.holesky Xeon E5-2667 v3 @ 3.20GHz
linux-15.ih-eu-mda1.nimbus.holesky Xeon E5-2690 v4 @ 2.60GHz

I will ask support about it.

@jakubgs
Copy link
Member

jakubgs commented Sep 15, 2023

Major issues with GitHub timeouts resulted in inability to deploy all nodes on time, we were about 50% ready:

image

The issues were reported to Innova, but not much was improved in time. Lots of timeouts when checking out repos:

fatal: unable to access 'https://github.com/status-im/nim-kzg4844.git/': Failed to connect to github.com port 443 after 130761 ms: Connection timed out
fatal: unable to access 'https://github.com/gnosischain/configs.git/': Failed to connect to github.com port 443 after 129404 ms: Connection timed out
fatal: unable to access 'https://github.com/eth-clients/eth2-networks.git/': Failed to connect to github.com port 443 after 129505 ms: Connection timed out

Which is kinda the same issue we've been having with InnovaHosting MacOS hosts in CI.

@jakubgs
Copy link
Member

jakubgs commented Sep 15, 2023

Here's the config changes:

We'll need to re-deploy validators to use the proper layout once we have all 30 hosts.

jakubgs added a commit that referenced this issue Sep 15, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit that referenced this issue Sep 15, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit that referenced this issue Sep 15, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
jakubgs added a commit that referenced this issue Sep 16, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Member

jakubgs commented Sep 16, 2023

Added two more hosts that were bootstrapped:

Also fixed installation of Netdata by using their official APT repository:

@jakubgs
Copy link
Member

jakubgs commented Sep 16, 2023

Also found a bug with new Erigon verison where metric type contains labels:

Example:

admin@linux-05.ih-eu-mda1.nimbus.prater:~ % curl -s http://localhost:7063/debug/metrics/prometheus | grep TYPE | grep rpc
# TYPE rpc_duration_seconds{method="engine_exchangeTransitionConfigurationV1",success="success"} summary
# TYPE rpc_duration_seconds{method="engine_forkchoiceUpdatedV2",success="success"} summary
# TYPE rpc_duration_seconds{method="engine_newPayloadV2",success="success"} summary
# TYPE rpc_duration_seconds{method="eth_syncing",success="success"} summary
# TYPE rpc_failure counter
# TYPE rpc_total counter

jakubgs added a commit that referenced this issue Sep 16, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Member

jakubgs commented Sep 25, 2023

I checked up on EL clients and their readiness for new Holesky genesis on the 28th of September 2023, 12:00 UTC and:

Nethermind has the update and a release, Erigon has the update but no release, but Go-Ethereum seems to have neither:

	return &Genesis{
		Config:     params.HoleskyChainConfig,
		Nonce:      0x1234,
		ExtraData:  hexutil.MustDecode("0x686f77206d7563682069732074686520666973683f"),
		GasLimit:   0x17d7840,
		Difficulty: big.NewInt(0x01),
		Timestamp:  1694786100,
		Alloc:      decodePrealloc(holeskyAllocData),
	}

https://github.com/ethereum/go-ethereum/blob/82ec555d709e5a3a2e0d22430f2ac70ebe814e88/core/genesis.go#L587-L595

The garbage extra data is still there and the timestamp value is the old one.

@jakubgs
Copy link
Member

jakubgs commented Sep 27, 2023

I have bootstrapped the remaining hosts, we have now all 30:

  • e40017a2 - holesky.tf: add remaining hosts to the fleet
  • 39ee763a - nimbus.holesky: switch to correct validators layout

And I'm going to purge all node data on all hosts to start anew.

jakubgs added a commit that referenced this issue Sep 27, 2023
#152

Signed-off-by: Jakub Sokołowski <jakub@status.im>
@jakubgs
Copy link
Member

jakubgs commented Sep 27, 2023

I had to use non-official Docker images for Go-Ethereum and Eirgon to support new Holesky genesis:

This change is temporary until they create proper releases:

# Temporary fix for lack of releases for new Holesky genesis.
geth_cont_image: 'ethpandaops/geth:master-614804b'
erigon_cont_image: 'ethpandaops/erigon:devel-f99f326'

Deploying now.

@jakubgs
Copy link
Member

jakubgs commented Sep 27, 2023

Actually, Erigon just created a release: https://github.com/ledgerwatch/erigon/releases/tag/v2.49.3

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

Looks like Go-Ethereum also had a release 2 hours ago: https://github.com/ethereum/go-ethereum/releases/tag/v1.13.2

Might as well use it.

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

I discovered a major issue while checking nodes:

admin@linux-01.ih-eu-mda1.nimbus.holesky:/data/beacon-node-holesky-stable % for port in $(seq 9301 9304); do c 0:$port/eth/v1/node/syncing | jq -c; done
{"data":{"head_slot":"0","sync_distance":"0","is_syncing":false,"is_optimistic":true}}
{"data":{"head_slot":"0","sync_distance":"92428","is_syncing":true,"is_optimistic":true,"el_offline":false}}
{"data":{"head_slot":"0","sync_distance":"92428","is_syncing":true,"is_optimistic":false,"el_offline":false}}
{"data":{"head_slot":"0","sync_distance":"92428","is_syncing":true,"is_optimistic":false,"el_offline":false}}

Some nodes were showing non-zero sync_distance, which is wrong, since this is fresh new genesis.

What turns out is that @zah applied the Holesky fix only to the stable branch:

image

But testing, unstable, and nim-libp2p-auto-bump-unstable did not include that fix. Made a PR:

Based off of the 23.9.1 release commit: status-im/nimbus-eth2@cfa0268

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

I also applied the status-im/nimbus-eth2@cfa0268 commit directly to the testing branch.

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

Looks like instead Zahary merged stable into unstable: status-im/nimbus-eth2@77d6bc5

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

Some nodes were not online for the Echo 0 because I had to purge data folders for all nodes and then restart, which took ages with Ansible due to endless GitHub timeouts on the hosts:

fatal: unable to access 'https://github.com/status-im/nim-blscurve.git/': Failed to connect to github.com port 443 after 130791 ms: Connection timed out
fatal: run_command returned non-zero status for vendor/nim-blscurve

Innova Hosting really need to do something about this.

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

Also found a bug in ports configuration for EL nodes that I fixed:

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

A set of missed proposals was identified:

Slot Time UTC Host Node
https://holesky.beaconcha.in/slot/310 13:02:00 linux-20.ih-eu-mda1.nimbus.holesky libp2p
https://holesky.beaconcha.in/slot/314 13:02:48 linux-19.ih-eu-mda1.nimbus.holesky stable
https://holesky.beaconcha.in/slot/384 13:16:48 linux-20.ih-eu-mda1.nimbus.holesky stable
https://holesky.beaconcha.in/slot/389 13:17:48 linux-30.ih-eu-mda1.nimbus.holesky stable
https://holesky.beaconcha.in/slot/390 13:18:00 linux-20.ih-eu-mda1.nimbus.holesky stable

Predictably they cluster around hosts with most validators.

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

All of those misses above most probably are explained by port mismatch that was fixed in adc1a061.

@jakubgs
Copy link
Member

jakubgs commented Sep 28, 2023

For some reason Erigon nodes have no peer and refuse to connect to anything at all. Opened an issue about it:

I tried static-nodes.json, trusted-nodes.json, and admin_addPeer RPC method, but nothing worked so far.

@jakubgs
Copy link
Member

jakubgs commented Sep 29, 2023

Found another issue with Erigon, this time it's ignoring --port flag and listening on 30303 anyway:

@jakubgs
Copy link
Member

jakubgs commented Oct 4, 2023

I believe this is now done.

@jakubgs jakubgs closed this as completed Oct 4, 2023
@jakubgs
Copy link
Member

jakubgs commented Oct 9, 2023

Looks like finally all the servers now have the same CPU: Xeon E5-2667 v3

 > a nimbus.holesky --become -o -a 'cat /proc/cpuinfo | grep "model name" | uniq' | sort
linux-01.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-02.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-03.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-04.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-05.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-06.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-07.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-08.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-09.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-10.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-11.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-12.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-13.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-14.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-15.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-16.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-17.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-18.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-19.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-20.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-21.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-22.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-23.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-24.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-25.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-26.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-27.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-28.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-29.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
linux-30.ih-eu-mda1.nimbus.holesky | CHANGED | rc=0 | (stdout) model name	: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz

@jakubgs jakubgs self-assigned this Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants