Dynamic Protocol State as building block for Byzantine Fault Tolerance [BFT] and an autonomously-operating Flow-network

The core-protocol team is starting the work on ‘Flow’s Dynamic Protocol State’. In a nutshell, this work is a necessary premise for the Flow network to autonomously defend itself against malicious nodes within the network. On a technical level, the Dynamic Protocol State is the foundation for countermeasures such as slashing misbehaving nodes or entirely revoking their authorization to participate.

Below, you’ll find broader context on the Dynamic Protocol State and specific technical goals for the work. We will be shortly following up with a more complete proposal, as we are working through the technical details. Community contributions are warmly welcome, including ideas, feedback, change requests, PRs.

Context

The Protocol State maintains the necessary information about the operation of the Flow network protocol, including:
identity table listing all staked nodes, their public keys, role, etc (for the current and upcoming epoch once available)
• blocks produced by nodes in the Flow network
• finality and sealing status of blocks

At the moment, the identity table is fixed for an entire epoch, which is largely an engineering shortcut. Therefore, the following important features are currently blocked by the limitations of the protocol state:

  • Ability for node operator to revoke their node’s keys if they suspect that their keys were compromised (so called self-ejection).
  • Slashing nodes for protocol violations (incl. ejection from the network).
  • Efficiently retrieve the identity table at a certain block, without the need to locally reconstruct the state from cross-epoch history. (in principle this is possible today, but would require very convoluted hacky approach

Note: The protocol State Interface (→ code) is mostly already in the mature form. Just the implementation backing the interface has many shortcuts.

Goal

The protocol state supports tracking and updating the Identity Table throughout the epoch:

  • Identity Table is updated in a fork-aware manner
  • method of applying updates needs to be BFT
  • root hash of the identity table should be included in each block (thereby allowing to retrieve node identity information about the upcoming epoch once it is incorporated into the protocol state)

Scope

  • Design of mature implementation with specific emphasis on BFT.
    • Investigate whether it is helpful to include the GitHub issue #3668 in this work stream.
  • Functionality to track and persist Protocol State on a block-by-block basis (fork aware).
    • We want to restrict out attention on the components of the Protocol State that stay constant on the happy path and are node-independent:

      The resulting data structure changes rarely throughout an epoch. Lets refer to this information set as ProtocolStateSubstrate

    • Therefore, we want to de-duplicate identical ProtocolStateSubstrate instances in the data base.

  • Root hash of the ProtocolStateSubstrate should be included in each block.
  • Interface for applying updates to the ProtocolStateSubstrate (so called ‘identity-changing operations’)
    • Happy-path for applying EpochSetup and EpochCommit service events
    • verification logic (consensus nodes) for checking correctness of proposed updates
    • API should be general purpose, i.e. it supports applying any updates resulting from slashing adjudications in the future.
  • We have considered structuring node Identity into an immutable and a mutable part. Including this cleanup work here is probably a good idea (see #6232 for further details).

Out of scope

  • Beyond EpochSetup and EpochCommit, we do not implement any other identity changing operations.

This means that the ‘Dynamic Protocol State’ work stream provides low-level primitives for implementing slashing later.

Timeline & Milestones

The following is a rough outline for the entire work stream.

  1. Completion of Design and scoping

  2. MVP (not fit for mainnet)

    Seeing the light at the end of the tunnel; might still miss some features necessary for mainnet, but it already covers the core changes. At this point, we will have a more reliable estimate when the first mainnet deployment will be possible

  3. First version suitable for mainnet deployment

    contains all features necessary for mainnet (allowed to still contain notable technical debt; integration testing not included)

  4. Integration Testing

    at the end of this milestone, we will have a first version deployable to mainnet (actual deployment not included)

  5. First production version

    the resulting version cleans up all significant technical debt

Riskiest Assumptions

Conceptually, the problem is well understood and we have high confidence that the outlined direction will lead to a mature solution.

However, there are many details that still need to be worked through. This work-stream has a notable research component. The implementation will probably be intricate with may requirements to consider. In all likelihood, we will encounter implementation challenges and technical debt that needs to be cleaned up. Hence, by nature of the work stream, time lines have high uncertainty. I recommend applying a factor of 3x for translating the scoped work to projected time lines.

Key & Secondary Metrics

  • Primary metric: milestones completed
  • Secondary metric: number of story points completed of next milestone

Anti-Goals

  • slashing
7 Likes

The Core-Protocol team has prepared an ‘Implementation Proposal’ detailing how to implement the Dynamic Protocol State in the existing Flow architecture. This document provides comprehensive instructions on our approach to the implementation, including low-level technical details. Feedback and questions are warmly welcomed. You can find the document in our public Notion space using the following link: Dynamic Protocol State Implementation Proposal.

5 Likes

After reviewing the first two PRs and Jordan’s suggestions (most of them summarized in issue #4649), I decided that I needed to consolidate my and Jordan’s suggestions for design revisions in a written format. After all, there are a lot of requirements to be considered. Furthermore, it wasn’t quite clear to me how we would (safely) incorporate slashing into the design.

Everything considered, it took me quite a while to write all my thoughts down into this notion doc: [Proposal] Detailed specification for framework functionality to support future slashing. Despite it not being specifically my goal when I started writing, I ended up with:

  • detailed list of requirements, which the design has to specify
  • systematic analysis of the software design space to solve the requirements
  • formal proof showing that we need some kind of deferred operation to satisfy the requirements (I wasn’t aware of this when I started writing, but it is a super important point shaping the entire design)

When going one level deeper, trying to figure out what designs are possible and simple:

  • I ended up with something very close to Yurii’s implementation (great job Yurii)
  • Nevertheless, there is some subtle implication that we really need to make sure to satisfy (:point_right: requirement 5)
  • I ended up writing exhaustively defining the state machine for the Identity Table
    • hope that most of this will survive revisions, so we can use it as documentation later
  • For some of the Identity properties (including weight, ejected flag, and a node’s lifecycle classifier):
    • we can compute them
    • but we can’t store them directly in the identity table (or we need to add significant complexity to the framework, which I think is not worth it)
4 Likes

Progress Update:

Development of the first mature version of the Dynamic Protocol State is complete :tada:

4 Likes