Uptime Performance: Addressing Missed Blocks
Overview
This discussion focuses on the configuration issues faced by validators during the early Initia testnet phase, particularly when they encountered high transaction volumes that led to performance degradation. We will explore the potential causes and offer solutions to mitigate these problems.
Potential Causes & Solutions
Low Hardware Specifications
One of the primary factors contributing to uptime degradation is inadequate hardware. Validators must ensure their systems meet the recommended specifications to handle high transaction volumes effectively. Key performance issues such as high iowait, internet congestion, disk read/write speeds, and temperature should be monitored and managed.
Recommended Hardware Specifications:
CPU: 16 cores
Memory: 32GB RAM
Disk: 2 TB NVMe/SSD Storage with Write Throughput > 1000 MiBps
Bandwidth: 100 Mbps
These specifications can be checked using tools like htop, glances, ifconfig, and vnstat.
Reliable Server/VPS Providers
Choosing reliable server or VPS providers is crucial. Ensure your provider offers:
Comprehensive monitoring dashboards
Quick response times to mitigate disconnections
Transparent maintenance schedules
Uninterrupted connections
Geolocation
Report by A41 indicates that the location of servers significantly impacts uptime performance. Internet connection quality and workload distribution are key factors. Nodejumper’s decentralization map reveals that most validators are operating in Europe, particularly Germany. While relocating servers to these areas may improve performance, it can also reduce regional decentralization and increase security risks.
Configurations
Configurations are based on our own experience and opinions from trusted validators when interact with various Cosmos-based networks. We suggest to make changes to these configurations while the network is under stress.
Main Base Config Options db_backend filter_peers
goleveldb false
goleveldb true
While PebbleDB is known for consuming less disk space and providing a faster block synchronization rate, Chorus One reported that it did not enhance uptime performance and quickly used up disk space. GoLevelDB is a better option. In the early stages, there were many bad peers which degraded the overall network performance. Turning on filter_peers
could help disconnect bad peers and retain good ones.
RPC Server Configuration Options grpc_max_open_connections max_open_connections max_subscription_clients
900 900 100
10 10 10
Lowering these values could protect the node from DDoS attacks. There are tools available that can crawl RPC endpoints on a specific blockchain, such as Notional Labs' RPC Crawler. ValidatorVN suffered from RPC spam during Phases 2 and 3, and this adjustment helped mitigate the problem.
P2P Configuration Options max_num_inbound_peers max_num_outbound_peers pex
40 10 true
50 50 false
Inbound and outbound peers should set to a higher limit instead of default values to ensure your validator node connect to more peers. It is suggested to connect to between 50 and 100 peers PEX is disabled to avoid bad peers, so your node won't try to connect to random peers on the network. Private peers should be selectively chosen from validators you trust.
seeds persistent_peers private_peer_ids
none
Validators in a network should share their seeds and peers and connect them together. While persistent_peers can make your node laggy behind the network if the peers do not work, there is a tool from OriginStake that help you measure the connectivity and latency between your validator node and a list of peers.
Mempool Configuration Option size max_txs_bytes
5000 1073741824
1000 20000000
Normal transactions are small compared to abuse transactions, reducing mempool size in bytes as that is more impactful in resolving these issues.
Consensus Configuration Options timeout_propose timeout_propose_delta timeout_prevote timeout_prevote_delta timeout_precommit timeout_precommit_delta timeout_commit
1.8s
300ms
600ms
300ms
600ms
300ms 3s
3s 500ms 1s
500ms
1s
500ms 1s
The suggestion is from Initia's developers themselves and aims to increase the timeout to mitigate missed blocks. However, reducing timeout_commit
results in faster block completion (from ~5.5s to 1.5s). While making this change, some validators do not follow this protocol, which affects the other validators to miss some specific blocks because of inconsistent timeout_commit. We suggest penalizing or hard-coding this section to ensure network stability.
Transaction Indexer Configuration Options indexer
kv
null
Validators should disable indexer because during spamming transactions, it will break indexer and degrade uptime performance.
Base Configuration minimum-gas-prices
0uinit
0.15uinit,0.01uusdc
Default suggestion from developers. While this is optional, we suggest to do a confirmation on the active validator set in order to make sure everyone is on the same page. Therefore, it will help to (1) prevent 0tx block and (2) spamming transactions
pruning
default
nothing
Confirmed & tested by us & various validators on Initia network. Be careful using pruning = custom/default/everything
iavl-disable-fastnode
false
true
This is optional which helps to sync quickly to the current network height.
gRPC Configuration enable
true
false
Validators should not open unnecessary ports to the world, unless it's required.
gRPC Web Configuration enable
true
false
Validators should not open unnecessary ports to the world, unless it's required.
Suggestions & Feedbacks
We suggest to reproduce the situation to prepare for mainnet since it could happen in Token Generation Event. A tool from Somatic Labs or a simple spammy transaction script (redelegate, undelegate, delegate, etc) could be helpful.
In phase 2 of the testnet, selected validators were required to open RPC ports for performance evaluation which were vulnerable to security risks (e.g. DDoS attacks), the team should warn validators to change their RPC Server Configuration Options
to mitigate the problem to some extent. Another thing was that, there was a way for inactive validators to always catch up with network height by "cheating" using timeout_propose_delta, timeout_prevote_delta
, timeout_precommit_delta
= 0ms.
Other than these suggestions, Initia team did a very good job. This milestone is a testament to your hard work, dedication, and innovative spirit. As stepping into this new chapter, may Initia network thrive, community grow, and the vision for a decentralized future become a reality. Wishing great success and a seamless journey ahead!
Last updated