Context
The Core Protocol BFT team is working on a mechanism to upgrade the Protocol State without requiring a spork. The goal of this post is to briefly describe Flow’s existing version upgrade systems (spork, height-coordinated upgrade) and start a discussion about:
- How the protocol state upgrade mechanism should function.
- Can we (and if so, how) consolidate existing mechanisms (mainly
NodeVersionBeacon
smart contract andStopControl
tooling in Execution Node.
Terminology
Version Directive - A tuple containing (1) a version identifier and (2) either activation view or activation height, originating from a trusted source.
Software Version - The version identifier of a binary distribution of Flow Node software. By convention specified as semver-ish tag in Git and Docker releases.
Component Version - The version identifier of a specific component within Flow. This document will propose that we exclusively use integer component version identifiers.
Protocol State Machine - The state machine operating on the Protocol State (epochs, identity table, protocol parameters, etc.). LastProtocolState + Block -> NextProtocolState
Execution State Machine - The state machine operatings on the Execution State (smart contracts, user accounts, tokens, NFTs, etc.). LastExecutionState + Transaction -> NextExecutionState
.
Safety Threshold - A buffer of block views or heights that must exist between the point at which a Version Directive is processed by the Protocol and when it comes into effect. This threshold is set so that it is overwhelmingly likely that a block is finalized for any view/height range with the threshold size. (This is referred to as versionBoundaryFreezePeriod
in EN HCU terminology; it is referred to as EpochCommitSafetyThreshold
in the protocol parameters for a similar purpose.)
Existing Software Upgrade Mechanisms
Height-Coordinated Upgrade (HCU)
In a Height-Coordinated Upgrade, the Governance committee publishes a Version Directive (tuple of block height and semver software version) by submitting an admin transaction to the NodeVersionBeacon
smart contract. This causes a VersionBeacon
service event to be emitted. The StopControl
component in Execution Nodes ingests this service event and arranges to stop at the block height when the new software version takes effect.
- Operators manually configure their Execution Node with a new software version
- Verification Nodes don’t currently automatically stop; rather, we disable
--require-result-approvals
then wait for them to be manually updated before re-enabling the flag.
Challenges
- Conceptually, a HCU is implementing a change in Component Version (a breaking change to the Execution State Machine), but it specifies Software Version.
- As a side effect, versions are specified as semver, however:
- Semver is useful to differentiate between different kinds of breaking and non-breaking changes, but here we only care about breaking changes.
- Semver is relatively complex (compared to eg. an integer), and this increases complexity in version upgrade-related components (including
NodeVersionBeacon
smart contract)
- As a side effect, versions are specified as semver, however:
- The
NodeVersionBeacon
smart contract is used to direct changes to the Execution State Machine but the use of a semver software version and the documentation of the contract and surrounding components implies that it refers to a global protocol version. - The Execution Node processes the
VersionBeacon
service event only when it processes the sealing block. If an EN has not processed the lastVersionBeacon
-containing block (for example, due to bootstrapping timing), then it could miss a version upgrade and produce incorrect Execution Receipts. (Version Directives are not reliably persisted.)
Spork
In a spork, the entire network is halted for an extended period. Both the execution state and protocol state databases are re-instantiated from a snapshot. Essentially any backward-incompatible changes are possible during a spork. Included here for completeness.
Dynamic Protocol State Upgrade
The Dynamic Protocol State adds the ability for:
- Protocol State commitments to be included in every block
- Flexible changes to the Protocol State on a block-by-block basis
The Protocol State Key Value Store is a versioned data structure representing the Protocol State (basically, the thing that is committed to in every block).
High-Level Design
- Protocol State Component Version is defined as an integer, incremented for breaking changes
- A particular Software Version may support one or more Component Versions (see #5371)
- A service event communicates Version Directives to the Protocol State. See #5428 and #411 for WIP implementations)
- New versions become active at an “activation view” rather than height.
- Service events must be processed at least
SafetyThreshold
many views in advance of the activation view, like is required for Execution State version upgrades.
In general:
- Component Versions are specified as integers and only increment for breaking changes. There may be several and they are specific to a specific component; there isn’t one global version for the whole network. Component versions are part of a component’s implementation and defined at compiled-time (not tagged after-the-fact).
- Version directives are communicated using service events and pending directives are persisted in the Protocol State, so all nodes have access to it, regardless of bootstrapping timing.
Request for Feedback/Discussion
Thank you for taking the time to read! Here are some questions I’m hoping to answer:
- Are there any reasons to continue using semver versions (or software binary versions) in the version upgrade mechanisms?
- How to best consolidate existing HCU mechanisms with the Protocol State - driven upgrade mechanism? Or at least co-exist peacefully.
- Is it indeed preferable to version components individually (as we do in practice with current HCUs, and plan to with Protocol State version upgrade), rather than using one global version (as implied by existing
NodeVersionBeacon
documentation)? - Is it desirable to include a bypass for version requirements as a “escape hatch” flag (see PR comment)
- Disagreement (or inaccuracies) in the post above.