What's new in the Flow Mainnet 20 Spork?

vishal · October 26, 2022, 11:11pm

What’s new in the Flow Mainnet 20 Spork?

This Flow mainnet spork (Nov, 2nd 2022) has several important updates to improve the network resiliency, scalability and support for decentralization.

BFT protection for Permissonless Access Node

Earlier in August, we announced the path to permissionless node operation and also released the observer node v1 as part of the last spork. Continuing on that journey of decentralization, this spork includes several major updates to the peer-to-peer networking layer to improve a node’s resilience to any byzantine nodes on the network have been made. These updates are prerequisites for the launch of the permissionless access nodes.

Notable updates include:

Improve authorization of messages sent over the staked peer-to-peer network
Adding a rate limiter to the unicast connections

github.com/onflow/flow-go

Khalil/1716 unicast rate limiter

onflow:master ← onflow:khalil/1716-unicast-rate-limiter

opened 05:04PM - 21 Jul 22 UTC

kc1116

+1575 -293

This PR defines 2 rate limiters for messages sent via unicast. Both rate limiter…s are configurable on p2p middleware, and are used in the unicast stream handler. - bandwidth rate limiter - rate limits unicast message bandwidth allowed per some configured interval - streams rate limiter - rate limits amount of streams that can be created per some configured interval Each rate limiter keeps track of current rate limited peers and the rate limiter interface provides a func IsRateLimited to check if a peer is currently being rate limited. This PR also improves the connection gater by adding the ability to configure multiple peer filters for outgoing and incoming intercepted connections. This allows us to add the IsRateLimited func as a peer filter on the connection gater. The peer manager peer provider func topologyPeers on the middleware was also updated to filter topology peers through configured peerManagerFilters before returning the list of peers to the peer manager. Both of these improvements allow us to provide connection gating and disconnects for rate limited peers in the correct lifecycle of a connection respective to the networking layer.

Enabling the peer scoring mechanism on the gossip network to eliminate invalid subscriptions

github.com/onflow/flow-go

[Networking] GossipSub Authorized Subscription

onflow:master ← onflow:yahya/6921-gossipsub-authorized-subscription-part-2

opened 08:50PM - 27 Sep 22 UTC

yhassanzadeh13

+1874 -243

This PR enables the peer scoring mechanism of GossipSub for Flow blockchain. The… peer scoring mechanism at this phase only contains a subscription validator. When a peer discovers invalid subscriptions of other peers (based on their roles), it descends their local score to the `-Inf` value. With this approach, honest peers collectively refrain from routing the pubsub messages of (as well as to and from) malicious peers who conduct an invalid subscription. In other words, the malicious peers who conducted an invalid subscription are excluded from the GossipSub mesh of the honest peers.

Improved monitoring

Execution Storage Improvements

Flow protocol architecture is designed to benefit from having a few powerful execution nodes for fast execution without sacrificing the integrity and safety of the network. Consequently, execution node data storage has been designed such that access to the execution state data is as fast as possible by keeping the data in memory while also utilizing a disk-backed write-ahead-logs (WAL) system for reliability and recoverability. To expedite the recovery process, at specific intervals a separate process generates checkpoints so that only a small number of WALs have to be replayed from the last checkpoint.

There has been an ongoing effort spanning several sporks to improve both - the performance of the execution data storage and the checkpoint process. Below is a list of some of the most recent improvements.

Storage Optimization

To keep up with the growth in the number of accounts and number of cadence stored objects on the network, we continue to improve the efficiency of the execution storage based on the different access patterns reducing the overall memory requirements of an execution node. In this spork, we reduced the storage usage footprint by changing the way an account’s metadata is stored. This resulted in a 9% reduction in memory usage of the execution node and also reduced the rate at which an execution node’s memory usage grows as more accounts are created.

Checkpointer Redesign

The original design and implementation of the checkpoint process couldn’t scale as the network grew and started causing spikes in memory usage of up to 180% when generating a checkpoint. There were reports from execution node operators of their execution node going out of memory and restarting during times of high load on the system. The recovery from a checkpoint during a restart also took a long time. In this and the previous sporks, we have redesigned the checkpointing process to address both - memory usage and recovery. The memory usage during checkpointing now goes up by only 2-5% and the checkpointing process itself is almost five times faster. Execution node startup time from any given checkpoint has been reduced to up to 80% (4 minutes to 1 minute on Testnet). The faster read and write of the checkpoint files also help reduce the total downtime needed during the spork operation.

Notable updates include:

github.com/onflow/flow-go

Optimize MTrie checkpoint: 47x speedup (11.7 hours -> 15 mins), -431 GB alloc/op, -7.6 billion allocs/op, -6.9 GB file size

onflow:master ← onflow:fxamacker/optimize-checkpoint

opened 04:47PM - 03 Feb 22 UTC

fxamacker

+2443 -873

## Description Optimize checkpoint creating (includes loading): - 47x speedu…p (11.7 hours to 15 mins), avoid 431 GB alloc/op, avoid 7.6 billion allocs/op - 171x speedup (11.4 hours to 4 mins) in MTrie traversal+flattening+writing phase - Reduce long-held data in RAM by 116+ GB (lowers hardware requirements) - Reduce checkpoint file size by 6.9+ GB (another 4.4+ GB reduction planned in separate PR) without using compression Most of the optimizations were proposed in [comments](https://github.com/onflow/flow-go/issues/1750#issuecomment-1004870851) to issue #1750. I'm moving all remaining optimizations like concurrency and/or compression, etc. to separate PRs so this is ready for review as-is. Increased interim + leaf node counts are causing checkpoint creation to take hours. This PR sacrifices some readability and simplicity as tradeoffs and gains speed, memory efficiency, and storage efficiency. I limited scope of PR to optimizations that don't require performance tradeoffs or overhead (like adding processes, IPC). Big thanks to @ramtinms for opening #1750 to point out that trie flattening can have big optimizations. 👍 Closes #1750 Closes #1884 Updates #1744, #1746, https://github.com/dapperlabs/flow-go/issues/6114 ## Impact on Execution Nodes :warning: Unoptimized checkpoint creation reaches 248+GB RAM within the first 30 minutes and can run for about 15-17+ hours on EN3. This duration during heavy load was long enough on EN3 to accumulate enough WAL files to trigger another checkpoint immediately after the current one finishes. So the 248 GB RAM is held again with 590 GB alloc/op and 9.8 billion allocs/op. ### EN Startup Time This PR speeds up EN startup time in several ways: - Checkpoint loading and WAL replaying will be optimized for speed (see benchmarks). - Checkpoint creation will be fast enough to run multiple times per day, which will reduce WAL segments that need to be replayed during startup. - Checkpoint creation finishing in minutes rather than 15+ hours reduces risk of being interrupted by shutdown, etc. which can cause extra WAL segments to be replayed on next EN startup. See issue #1884 for more info about extra WAL segments causing EN startup delays. ### EN Memory Use and System Requirements This PR can reduce long-held data in RAM (15+ hours on EN3) by up to 116 GB. Additionally, eliminating 431 GB alloc/op and 7.6 billion allocs/op will reduce load on the Go garbage collector. ## Benchmark Comparisons  #### Unoptimized Checkpoint Creation After the first 30 minutes and for next 11+ hours (15-17+ hours on EN3): ![image](https://user-images.githubusercontent.com/33205765/153464063-a8013083-1f05-4ef9-b6a9-a6dcae58720c.png) #### Optimized Checkpoint Creation Finishes in 15+ minutes and peaks in the last minute at: ![image](https://user-images.githubusercontent.com/33205765/153721893-9f29db07-8e8f-4bdc-a38b-ee2bb7263ea3.png) ### Preliminary Results (WIP) Without Adding Concurrency Yet #### MTrie Checkpoint Load+Create v3 (old) vs v4 (WIP) ``` Input: checkpoint.00003443 + 41 WAL files Platform: Go 1.16, benchnet (the big one) name old time/op new time/op delta NewCheckpoint-48 42052s ± 0% 886s ± 0% -97.89% name old alloc/op new alloc/op delta NewCheckpoint-48 590GB ± 0% 159GB ± 0% -73.04% name old allocs/op new allocs/op delta NewCheckpoint-48 9.80G ± 0% 2.19G ± 0% -77.67% DISCLAIMERS: not done yet, didn't add concurrency yet, n=1 due to duration, file system cache can affect results. ``` UPDATE: on March 1, optimized checkpoint creation speed (v4 -> v4 with 41 WALs) varied by 63 seconds between the first 2 runs (all 3 used same input files to create same output): * 926 secs (first run right after OS booted, maybe didn't wait long enough) * 863 secs (second run without rebooting OS, maybe file system cache helped) * 879 secs (third run after other activities without rebooting OS) #### Load Checkpoint File + replay 41 WALs v3 (old) vs v4 (WIP) ``` Input: checkpoint.00003443 + 41 WAL files Platform: Go 1.16, benchnet (the big one) name old time/op new time/op delta LoadCheckpointAndWALs-48 989s ± 0% 676s ± 0% -31.64% name old alloc/op new alloc/op delta LoadCheckpointAndWALs-48 297GB ± 0% 136GB ± 0% -54.35% name old allocs/op new allocs/op delta LoadCheckpointAndWALs-48 5.98G ± 0% 2.17G ± 0% -63.67% DISCLAIMERS: not done yet, didn't add concurrency yet, n=1, file system cache affects speed so delta can be -28% to -32%. ``` ### Changes include: - [x] Create checkpoint file v4 and replace v3, while retaining ability to load older versions. (v4 is not yet finalized). First, [Reduce checkpoint file size by 5.8+GB](https://github.com/onflow/flow-go/commit/422b75b1c56fdcb968569fde82bfdfd5f726f2c3). Next, [reduce checkpoint file size by 1.1+GB](https://github.com/onflow/flow-go/pull/1944/commits/dcca74efe8e79b27f4e1ac3919e118fac481156d) by removing encoded hash size and path size. Further reduction of 4.4+GB is planned for 10.2 GB combined reduction compared to v3. These file size reductions don't use compression. - [x] [Use stream encoding and writing](https://github.com/onflow/flow-go/commit/8b03a22c12f51cb1bcb41b2d6e97554b7025be5d) for checkpoint file creation. This reduces RAM use by avoiding the creation of a ~400 million element slice containing all nodes and creation of 400 million objects. Savings will be about 43.2+ GB plus more from other changes in this PR. - [x] [Add NewUniqueNodeIterator() to skip shared nodes](https://github.com/onflow/flow-go/commit/04e5c08b0e6fe35d31646addd5a8c30c0baeae20). NewUniqueNodeIterator() can be used to optimize node iteration for forest. It skips shared sub-tries that were visited and only iterates unique nodes. - [x] [Optimize reading checkpoint file by reusing buffer](https://github.com/onflow/flow-go/pull/1944/commits/6c5ad37be8b14e7d5aefa6265270a9d7dd44c781). Reduce allocs by using a 4096 byte scratch buffer to reduce another 400+ million allocs during checkpoint reading. Since checkpoint creation requires reading checkpoint, this optimization benefits both. - [x] [Optimize creating checkpoint by reusing buffer](https://github.com/onflow/flow-go/pull/1944/commits/c9a8f145835d7addbc4fbacb98ba091c6892aca6). Reduce allocs by using a 4096 byte scratch buffer to reduce another 400+ million allocs during checkpoint writing. - [x] [Skip StorableNode/StorableTrie when creating checkpoint](https://github.com/onflow/flow-go/commit/f185fd93e24bf1323a2feb8661039cca03d8a629) - Merge FlattenForest() with StoreCheckpoint() to iterate and serialize nodes without creating intermediate StorableNode/StorableTrie objects. - Stream encode nodes to avoid creating 400+ million element slice holding 400 million StorableNode objects. - Change checkpoint file format (v4) to store node count and trie count at the footer (instead of header) required for stream encoding. - Support previous checkpoint formats (v1, v3). - [x] [Skip StorableNode/Trie when reading checkpoint](https://github.com/onflow/flow-go/commit/8b03a22c12f51cb1bcb41b2d6e97554b7025be5d) - Merge RebuildTries() with LoadCheckpoint() to deserialize data to nodes without creating intermediate StorableNode/StorableTrie objects. - Avoid creating 400+ million element slice holding all StorableNodes read from checkpoint file. - DiskWal.Replay*() APIs are changed. checkpointFn receives []*trie.MTrie instead of FlattenedForest. - [x] Add [flattening encoding tests](https://github.com/onflow/flow-go/pull/1944/commits/802347fc627b345cf5b9150549dc91d5be04cc2e), add [checkpoint v3 decoding tests](https://github.com/onflow/flow-go/pull/1944/commits/2b9d1aeca4a45cadfefaf32ee26a3b94f2b56ecf), add [more validation](https://github.com/onflow/flow-go/pull/1944/commits/aa1549eb65ca1e74d740ba17e55b0da42b9b2539), add comments, [refactor code for readability](https://github.com/onflow/flow-go/pull/1944/commits/6618948bb1d290087d306cedd6e8d8fdbe0ab9ef), and etc. ### TODO - [ ] update benchmark comparisons using latest results #### Additional TODOs that will probably be wrapped up in a separate PR - maybe add zeroCopy flag and more tests for these functions: `DecodeKey()`, `DecodeKeyPart()`, and `DecodePayload()`. Not high priority because these functions appear to be unused. - further reduce data written to checkpoint part 2 (e.g. encoded payload value size can be uint32 instead of uint64 but changing this affects code outside checkpoint creation) - micro optimizations (depends on speedup vs readability tradeoff) - add concurrency - maybe add file compression or payload compression (only if concurrency is added) - maybe replace CRC32 with BLAKE3 or BLAKE2 since checkpoint file is >60GB - maybe encode integers using variable length to reduce space (possibly not needed if/when we use file compression) - ~~maybe split checkpoint file into 3 files (metadata, nodes, and payload file).~~ I synced with Ramtin and his preference is to keep the checkpoint as one file for this PR.

github.com/onflow/flow-go

Optimize MTrie Checkpoint (regCount & regSize): -9GB alloc/op, -110 milllion allocs/op, -4GB file size

onflow:master ← onflow:fxamacker/move-regcount-regsize-to-trie

opened 05:29PM - 09 Mar 22 UTC

fxamacker

+2466 -673

### Description - Remove `Node.regCount` and `Node.maxDepth` to reduce file s…ize and memory - -10 bytes per node file size reduction - -16 bytes per node memory reduction -- this memory reduction extends beyond checkpoint. 🎉 - Optimize MTrie update() to reduce heap allocations - Fix Payload.Equals() crash bug when p is nil - Fix Payload.Equals() bug when comparing nil payload with empty payload Closes #1747 Closes #1748 Closes #2125 Updates #1744 Updates #1884 https://github.com/dapperlabs/flow-go/issues/6114 Big thanks to @ramtinms for opening issues #1747, #1748, and for clarifying things. 👍 ### Changes include: - remove Node.regCount (8 bytes) and Node.maxDepth (2 bytes) - remove regCount (8 bytes) and maxDepth (2 bytes) from checkpoint node serialization - add MTrie.regCount (8 bytes) and MTrie.regSize (8 bytes) - add regCount (8 bytes) and regSize (b bytes) to checkpoint trie serialization - modify trie update() to return regCountDelta, regSizeDelta, and lowestHeightTouched so that NewTrieWithUpdatedRegisters() can compute regCount, regSize, and maxDepthTouched for the updated trie - fix Payload.Equals() crash bug when p is nil - fix Payload.Equals() bug when comparing nil payload with empty payload - optimize MTrie update() to reduce heap allocations ### Note lowestHeightTouched returned by update() is the lowest height reached during recursive update. Unlike maxDepth, lowestHeightTouched isn't affected by prune flag. It's mostly new/updated node height. It can also be height of new node created from compact leaf at a higher height. ### Preliminary Results #### Unoptimized Checkpoint Creation After the first 30 minutes and for next 11+ hours on benchnet (15-17+ hours on EN3): <img width="65%" src="https://user-images.githubusercontent.com/33205765/153464063-a8013083-1f05-4ef9-b6a9-a6dcae58720c.png" /> #### Optimized Checkpoint Creation (PR #1944) Finishes in about 15 minutes and peaks in the last minute at: <img width="65%" src="https://user-images.githubusercontent.com/33205765/153721893-9f29db07-8e8f-4bdc-a38b-ee2bb7263ea3.png" /> #### Optimized Checkpoint Creation (PR #1944 + PR #2126) Finishes in 14 - 15 minutes and peaks in the last minute at: <img width="65%" alt="image" src="https://user-images.githubusercontent.com/33205765/157785194-95cc2166-e821-4b62-b21a-d05f007d1d54.png"> ### New Checkpoint (Load Old + Replay 41 WALs + Create New) #### PR #1944 vs PR #2126 ``` > benchstat v4_to_v4.txt v41_to_v41.txt name old time/op new time/op delta NewCheckpoint-48 886s ± 0% 868s ± 2% -2.10% name old alloc/op new alloc/op delta NewCheckpoint-48 159GB ± 0% 150GB ± 0% -5.61% name old allocs/op new allocs/op delta NewCheckpoint-48 2.19G ± 0% 2.08G ± 0% -4.91% ``` #### Mainnet vs (PR #1944 + PR #2126) ``` name old time/op new time/op delta NewCheckpoint-48 42052s ± 0% 868s ± 2% -97.94% name old alloc/op new alloc/op delta NewCheckpoint-48 590GB ± 0% 150GB ± 0% -74.55% name old allocs/op new allocs/op delta NewCheckpoint-48 9.80G ± 0% 2.08G ± 0% -78.76% ``` Go 1.16 on benchnet (the big one) NOTE: File system cache can make the duration fluctuate. The first run after OS reboot can take 20-60 secs longer due to cache and other reasons (e.g. not waiting long enough after OS boots).

github.com/onflow/flow-go

[EN Performance] Reuse ledger state for about -200GB peak RAM, -160GB disk i/o, and about -32 minutes duration

onflow:master ← onflow:fxamacker/reuse-mtrie-state-for-checkpointing-2

opened 03:36PM - 13 Jul 22 UTC

fxamacker

+2520 -1017

EDIT: When deployed on August 24, 2022, this PR reduced peak RAM use by over 250…GB (out of over 300GB total reduction). Initial estimate of -150GB was based on old, smaller checkpoint file. By August, checkpoint file grew substantially so memory savings were better. Duration is about 14 minutes today (Sep 12), it was 46-58 minutes in mid-August, and it was 12-17 hours in Dec 2021 depending on system load. Avoid creating separate ledger state during checkpointing. Closes #2286 Closes #2378 because this PR avoids reading 160GB checkpoint file (except during EN startup). Updates #1744 ### Impact Based on EN2 logs (July 8, 2022), this will - Reduce operational memory (peak RAM) by (very roughly) about 150GB on Grafana EN Memory Usage chart. Other charts showing smaller peaks will show relatively smaller reduction. - Reduce checkpoint duration by 24 mins (from 45 mins). - Reduce disk i/o by 160GB by not reading checkpoint file (except during EN startup). - Reduce memory allocations (TBD) Ledger state is continuously growing larger, so memory & duration savings will be better than listed as each day passes. ### Context Recent increase in transactions is causing WAL files to get created more frequently, which causes checkpoints to happen more frequently, increases checkpoint file size, and increases ledger state size in memory. | | File Size | Checkpoint Frequency | | ------------- | ---: | --- | | Early 2022 | 53 GB | 0-2 times per day | | July 8, 2022 | 126 GB | every 2 hours | Without PR #1944 the system checkpointing would currently be: - taking well over 20-30 hours each time, making it impossible to complete every 2 hours - requiring more operational RAM, making OOM crashes very frequent - creating billions more allocations and gc pressure, consuming CPU cycles and slowing down EN After PR #1944 reduced Mtrie flattening and serialization phase to under 5 minutes (which sometimes took 17 hours on mainnet16), creating a separate MTrie state currently accounts for most of the duration and memory used by checkpointing. This opens up new possibilities such as reusing ledger state to significantly reduce duration and operational RAM of checkpointing again. ### Design Design goal is to reduce operational RAM, reduce allocations, and speed up checkpointing by not creating a separate ledger state. To achieve these goals, this PR: - reuses tries from the main ledger state for checkpointing - avoids blocking with time-consuming tasks such as creating checkpoint The `Compactor`: - receives trie updates and new trie created from the update - saves encoded updates in WAL segment - tracks active segment number returned by `RecordUpdate()` - starts checkpointing async when enough finalized segments are accumulated NOTE: To reuse ledger tries for checkpointing, new tries must match its corresponding WAL update. ### TODO: - [x] add more tests - [x] handle known edge cases - [x] handle errors - [x] add more tests related to concurrency (longer duration test runs on benchnet) - [x] preliminary review of draft PR by @zhangchiqing - [x] remove draft status early to get @m4ksio and @ramtinms reviews - [x] preliminary review and approval of PR by @m4ksio - [x] preliminary review and approval of PR by @ramtinms - [x] incorporate feedback from preliminary reviews - [x] add more tests - [ ] awesome suggestion by @ramtinms which should not be shortchanged "... run tests on benchnet by turning nodes off to make sure the logic is safe and no unknown case is there." - [ ] continue looking for edge cases we may have missed - [x] cleanup code and follow best practices (e.g. document errors returned from functions, etc.) - [ ] request code reviews for non-urgent items (micro optimizations, improved design, etc.) See TODOs listed in the source code for more details.

github.com/onflow/flow-go

[FVM] merge constant size account registers in account status

onflow:master ← onflow:ramtin/fvm-merge-account-meta-registers

opened 06:55PM - 13 Jul 22 UTC

ramtinms

+433 -246

This PR - closes #2702 - merges `storage_used`, `storage_index`, and `public…_key_count` into the account status register, which reduces about 30M registers from the Mainnet and reduces proof sizes accordingly - updates the previous account status migration to cover these registers besides `frozen` and `exists` registers

github.com/onflow/flow-go

[EN Performance] Reduce memory used for ledger.Payload by 32+ GB, eliminate 1+ billion allocs/op, speedup various ops

onflow:master ← onflow:fxamacker/use-encoded-key-in-payload

opened 05:26PM - 04 Aug 22 UTC

fxamacker

+1597 -940

This PR replaces decoded payload key with encoded key buffer because decoded pay…load key is only used for migration and reports. This change made ledger.Payload immutable. Closes #2569 Closes #2248 (together with changes in PR #2560) Updates #1744 ### Goals - (main goal) reduce memory required by EN (operational RAM). - reduce number of allocations on the heap. - have zero negative performance tradeoffs. ### Impact - operational RAM should be reduced by dozens of GB (very roughly 25-50GB initially and more as data grows) - eliminate over 1+ billion heap allocations for mtrie in memory (about 1+ billion allocs when mtrie is created and additional savings during activities that update mtrie) - as positive side-effects, speedup - ledger update (see TrieUpdate benchstats) - EN startup - checkpoint (de)serialization - TrieProof (de)serialization, and etc. ### Example positive side-effect beyond operational RAM reduction Benchstats are only for ledger update. Other improvements are not benchmarked yet. ``` name old time/op new time/op delta TrieUpdate-4 439ms ± 2% 409ms ± 1% -6.94% (p=0.000 n=18+20) name old alloc/op new alloc/op delta TrieUpdate-4 73.5MB ± 0% 34.1MB ± 0% -53.60% (p=0.000 n=20+20) name old allocs/op new allocs/op delta TrieUpdate-4 187k ± 0% 147k ± 0% -21.44% (p=0.000 n=20+20) ``` ### Caveats - Updating benchmark results or adding missing benchmarks can be done at a later date. - Custom functions for CBOR and JSON serialization was added because immutable fields are no longer exported, but the memory reduction and speedups in other ops outweigh this. ### Reviewers This PR is fairly simple because the most important changes are in the small commit e35bd9449eae87af76ecbdb6422c2b5c5ed44ba9. The large number of lines changed by other commit is to: - commit b2517548083f005fb8321d6ba820c7ac885c957b - make ledger.Payload immutable - commit 74ee21ddaacb4cdc200b7441ee9a6f49d53d0b9e - eliminate circular dependencies (moving ledger/common/encoding/encodiing.go to ledger/trie_encoder.go, split common/utils/testutils.go into 2 packages)

github.com/onflow/flow-go

[EN Performance] Optimize checkpoint serialization for -37GB operational RAM, -2.7 minutes duration, -19.6 million allocs (50% fewer allocs)

onflow:master ← onflow:fxamacker/reduce-checkpoint-serialization-memory

opened 04:36PM - 22 Aug 22 UTC

fxamacker

+607 -55

Primary goal is to reduce operational RAM in checkpoint v5. Secondary goals inc…lude speeding up checkpointing and redesign to simplify concurrency in the next PR. UPDATE: 🚀 Full checkpointing v5 finishes in 12-13 minutes on EN4 and reduced peak memory use more than expected. This PR was merged on Aug 23 and deployed to EN4.mainnet19 on Oct 7, 2022. This PR replaces largest data structure used for checkpoint serialization. During serialization, this change processes subtries instead of entire tries at once. Changes also focused on preallocations to increase memory savings. Serializing data in parallel is made easier (because this PR splits mtrie into multiple subtries), but adding parallelism is outside the scope of this PR. Issue #3075 should be used to determine if parallelism is worthwhile (at this time) before implementing it because parallelism has tradeoffs such as consuming more RAM, etc. Closes #2964 Updates #1744 Updates #3075 ## Preliminary Results Using Level 4 (16 Subtries) Using August 12 mainnet checkpoint file: - -37GB peak RAM (`top` command), -23GB RAM (go bench B/op) - -19.6 million (-50%) allocs/op in serialization phase - -2.7 minutes duration ``` Before: 625746 ms 88320868048 B/op 39291999 allocs/op After: 461937 ms 64978613264 B/op 19671410 allocs/op ``` Root is at Level 0. Benchmark used Go 1.18.5 on `benchnet-dev-004`. No benchstat comparisons yet (n=5+) due to duration and memory required. ## Tests This PR passed unit tests and round-trip tests before it was merged to master on August 23, 2022: - On Sunday, August 21, 2022, I confirmed it passed round-trip tests using a 150GB checkpoint file (August 12 checkpoint file from mainnet). The final 150GB output exactly matched expected results (`b2sum` of 150GB files matched). - On Wednesday, August 31, 2022 another person mentioned in standup meeting a different test (comparing file size) also produced expected results. NOTE: As of Sept 13, 2022 this PR has not been merged to mainnet. EDIT: Added more details after reading PR review questions. Clarified root is at level 0 and we're using level 4 (16 subtries). Mentioned tests, including round-trip tests on Aug 21 that passed before merging PR to master on Aug 23. Mention issue #3075 to replace "issue will be opened" about adding parallelism made easier by this PR. Make it more clear this PR is not deployed yet to mainnet.

github.com/onflow/flow-go

Checkpointing V6 - support concurrent checkpoint encoding/decoding

onflow:master ← onflow:feature-checkpoint-v6

opened 09:16PM - 23 Sep 22 UTC

zhangchiqing

+2546 -507

Closes #3075 This PR implements the checkpointing V6. Checkpointing Vers…ion 6 splits the single checkpoint file into 18 files in total. The main benefits are: - The benefit of splitting the checkpoint file is to support concurrent writes to multiple sub files which speeds up checkpoint generation, and concurrent reads which speeds up reading checkpoint. - V6 is benefited from V5, where it builds the sub trees first to be encoded, which built the ground for allowing concurrent processing. See complete design in this doc: https://www.notion.so/dapperlabs/Checkpoint-V6-8c7b97937da54c5b9e6c18b5b4598f2e Comparison between V5 and V6 using latest mainnet19 data snapshot: - checkpoint writing is reduced from `16mins` to `3mins`, 5.3 times faster - checkpoint reading is reduced from `12mins` to `2mins`, 6 times faster As comparison with V5 checkpointing: - speeds up spork state extraction (deserialize) by 10.75 mins (was 13.5 mins, will be 2.75 mins) - speeds up spork state extraction (serialize) by 7 mins (was 11 mins, will be 3 mins), 4 times faster. - speeds up EN startup (deserialize) by 10.75. Exact same speedup as prior line (deserialize) but different scenario not limited to sporks (tradeoff is one-time cost of using more memory (90GB) during startup which is OK). - slows down regular checkpointing by 1 minutes (was 11 mins, will be 12 mins).

Rolling upgrade support

Reducing planned downtime and ultimately achieving zero-downtime network upgrades is high on the priority list for Flow and in this spork, the first step towards achieving that goal is being taken by adding support for height-coordinated execution and verification node restarts. This feature enables the nodes to restart at the exact same predetermined block height, removing the risks of execution forks and making the network upgrade easier to coordinate across all operators.

Notable updates include:

FVM Performance improvements

Performance improvements in this release deliver speed as well as memory usage improvements for the FVM (Flow virtual machine).

Notable updates include:

Optimized FVM so that it reuses the Cadence runtime environment where possible

Cadence

This release includes Cadence v0.28.0. More on this here.

Other improvements

Other than these, there are several other updates and bug fixes listed here in the release notes.

Topic		Replies	Views
Flow Spork alignment and contingency plan Node Operations	0	638	May 19, 2022
Mainnet Spork 14 \| 10/06/21 Announcements	0	768	October 5, 2021
Mainnet Spork 12 \| 8/18/21 Flow Ecosystem	0	756	August 12, 2021
Flow Mainnet spork on 18th Jan 2023 08:00 AM PT Node Operations	1	620	August 25, 2023
Update on Flow Spork alignment and contingency plan Node Operations	0	384	June 6, 2022

What’s new in the Flow Mainnet 20 Spork?

BFT protection for Permissonless Access Node

Execution Storage Improvements

Storage Optimization

Checkpointer Redesign

Rolling upgrade support

FVM Performance improvements

Cadence

Other improvements

Related topics