FLIP-1057: Automated Slot Assignment

Key Value
Status Draft
Flip FLIP-1057
Editor @okjose
Author @pgpg
FLIP Document https://github.com/onflow/flips/blob/main/protocol/20220719-automated-slot-assignment.md


Flow would benefit from an automated method for allowing node operators to signal their intention to run a node and to then have their node included in the staking table in a subsequent epoch. The idea of staking slots is to build an automated process for including new nodes in the staking table, while managing the max number of nodes per node type. This process can be extended in the future with an auction mechanism for selecting node operators in a fully permissionless way.

@bluesign @Redallica – here’s the proposal I was referring to drafting in FLIP-GOV-1: Maintain Staking Rewards - #19 by pgpg

Also, tagging @flowjosh since he wrote the original staking contracts and will likely have feedback on how this implementation should go!

It mostly makes sense to me. Implementation might be difficult given Cadence’s upgrade rules, but we’ll probably be able to figure it out. One thing I’m still unsure about: Are all slots available every epoch? So once a node gets in for one epoch, are they in for all subsequent epochs where they are staked and approved, or will the slot assignments be random every epoch for everyone? There are well behaved node operators who have staked a lot of tokens and have thousands of delegators (blocto for example) who could just get unlucky and not get a slot, which would unstake them and thousands of their users who expect to just receive rewards every week without having to check to make sure their tokens are still staked.

Great question!

This proposal was meant to be used for stable slot assignment. Specifically mentioned at the bottom.

Items this proposal doesn’t address:

Automated Slashing - Under this proposal, nodes will only be removed from the staking list when they are manually slashed by the Flow service account, or when that operator removes their stake. Research into opportunities for automated slashing conditions will come in the future.

I do think there’s some design space in the future implementation of an auction mechanism where total stake + delegation is taken into account in a weekly auction.

Perhaps I can make this more clear in the FLIP.

I wasn’t talking about slashing. I was talking about slot assignment. If there are only 5 slots available and 10 node operators who want a slot, then if the slot assignment is randomized each epoch, then a node operator and therefore all their delegators will never have any security or that they will stay staked each epoch. Staking auctions will solve this because if you want that security, then you can just stake more FLOW to ensure you get a slot, but this proposal with only random slot assignment would make that impossible, right?

Yeah, so my conception here is that those slots are assigned and then occupied until they’re unstaked or slashed.

Each epoch new slots are released based on the configuration, and those slots are filled. Only new slots each epoch will be filled by the randomized selection.

This will end up with IPv4 addresses, there is no chance someone will drop node assignment voluntarily. ( if Flow is successful )

If we auction spots, I think it will be the best solution ( blocto etc with a lot of delegates can beat the competition in auction ) Also competition causes to give out less rewards and inflation.

But as a general comment, I think correct way to design this system with overview first, then start from most problematic pieces. For me slot assignment is not a problematic piece, working on this now instead of auction system for example is loss of time.

@pgpg if we can make an overview, or dependancy tree, I think it would be more clear.

e.g. Reward Rate / Formula ← automatic slot assignment ← slot auctions ← BFT of nodes

( I couldn’t draw a tree here :slight_smile:

But feels like a lot of missing pieces here to decide for something, as josh pointed.

  • What will happen if some big node with delegator ( e.g. blocto ) loses auction ?
  • When nodes will be BFT ? next year? next decade ?
  • Why we have scaling problems with nodes ? When it will be solved?

I think better to give some times, so we don’t spent valuable time of everyone here, if this automatic slot assignment will be useful at 2030

Of course. Randomized selection (only) isn’t the long-term solution. During a real auction process then people could lose slots and must be competitive with their bids to maintain them, and that is the long-term goal.

Specifically, it will be possible to run a fully permissionless access node within 2022. Once that feature is technically feasible, it would be beneficial to have a mechanism where people can automatically stake for an access node and get assigned a spot without any governance intervention. The automated slot assignment makes that possible.

This doesn’t propose an auction and there is no way for an approved node to get automatically removed yet proposed.

Access Nodes: This year
Consensus Nodes: Next year.

There is a reason why the FLIP proposes a method for nodes to be allow-listed, but still randomly selected to be in the actual staking list. This allows us to move the process of node assignment toward permissionlessness as the software begins to support it.

ALSO: In terms of pure engineering. The FLIP proposes a path toward implementing updated versions of the staking code that mean it can support per-node-type auctions in the future. Specifically the auction selection mechanism can be added to replace the random selection, keeping the rest of the code and process the same.

I’ve been going into implementation thinking around this proposal and chatting with @flowjosh regarding what it would take.

Currently, the implementation of the FlowIDStakingTable is something that could stand improvement. Specifically, the architecture isn’t particularly efficient in terms of execution effort, and the code is organized around a global allow list and doesn’t have per-node-type configuration.

When looking toward build out a new structure, @flowjosh is suggesting that we also need to think through automated slashing since it would be preferable to not re-architect twice.

Given this discussion, I have proposed a simpler stepping stone between what we have now, and FLIP-1057 which will add the allowlist to be disabled for access nodes, and to have a per-node-type configuration for the maximum number of nodes.

Do we know what is the most efficient node architecture today to optimize latency and throughput? In the first Flow technical whitepaper, experiment 1 with 2 fast nodes for execution was showing the best results. Is it still the case ? Is all the network expected to become BFT ?

The goal is 100% to have the whole network be BFT.

Re efficiency: The answer here is highly implementation dependent. Specifically, Hotstuff is sensitive to a lot of nodes participating in consensus because of the communication overhead.

For collection nodes this is gotten around by being able to scale collection clusters horizontally. Each cluster can have a reasonable amount of participants in consensus, and more participants can be scaled out across more clusters.

For consensus nodes there’s a bunch of optimizations left to do, but they’re only deal with block production w/ collections, not individual transactions, so the problem is more constrained. There’s work to do to find out where all the scaling bottlenecks are with the consensus nodes – but the last discussion I was a part of indicated that another ~15 ish consensus nodes could be added without degrading block production performance.

Lastly, there’s execution nodes which is where the current limitations seem to be mostly concentrated. This isn’t an architectural problem as much as an engineering one of making the system work fast enough. This includes issues like DB write speed, proof generation, etc. Technically speaking the BFT version of the network only needs one truthful execution node to proceed. I don’t know if there’s a model of how adding more execution nodes would degrade performance, but my hunch is that it would mostly effect verification nodes network traffic. Perhaps someone more directly related to that part of the protocol could chime in.

To speak specifically on throughput + latency:
As of right now I believe execution time of transactions has the largest impact on both throughput and latency. If execution is slow, transactions will wait to get executed after being included in a block, increasing their latency. Of course there’s some base latency involved in getting added to a collection, and that collection to a block.

Since execution isn’t parallelized (at this point) the throughput of the system is constrained by how many transactions can be executed once blocks have been constructed, not in constructing blocks. I believe the current architecture is well suited for increasing the max throughput of the entire system, but there’s engineering work to get that done.

Thanks @pgpg for your detailed answer, this is helping understanding the state of the network development right now, highlighting the challenges which are ahead.
When you say:

Are there any plans to introduce parallelized execution in the future ? It seems to be one of the viable options to optimize efficiency even further for the long run.

There’s been a lot of talk of how to do it.

The general approach could be similar to what happens during parallel evm execution, or even optimistic parallel compute on a CPU. The runtime would need to execute transactions in parallel optimistically, and then detect any shared state updates and re-execute the transactions using shared state (for update) in a batch.

This is a somewhat naive approach and could work, except that in Flow all transactions modify the vault of the flow fees resource. This effectively means that all transactions share a register update. We do know exactly how that register is updated though: the fee balance is added to.

So, for the naive approach above to work well we would also likely want to implement an approach where registers can be modified as long as the operations happening to the register are all commutative.

Cross post from GitHub:

Btw we never discussed but why random selection ?

Why not you put some FLOW ( preferably not so small not so big amount ) for to keep your position in the queue ( without any rewards ). First come first served.
You stay in the queue.
You can leave queue by taking your FLOW back. ( but later you can join to the end )

You can see how many people ahead, when spot opens, who waited longest gets the first spot.

Random selection is open to manipulation. I think better to give access to normal people than to give rich people.

Here’s my rationale for proposing randomization:

  1. In the situation where there are less proposed new operators than available slots, it is effectively the same as first come first serve.

  2. when there are more potential operators than available slots, randomization stops an attacker from simply packing slots and stopping anyone else from being involved in the network. This is particularly important for access nodes.

As for why we refund: that is the current behavior of the staking contracts and keeping that behavior leaves the staking UX in place. Other options could include allowing folks to easily restake (this may be possible already), having it auto restake, or having it auto delegate.

A major issue for node types with high staking minimums is that orgs likely won’t wanna sacrifice the TVOM of their stake for multiple epochs to wait for a slot. While this is less of an issue for access nodes with a low stake (e.g. 500 flow), keeping the UX co sistent across node types seems worth it.

This is wrong, access nodes have no responsibility to serve public. Probably not a single one except dapper and alchemy is serving.

Also this allows someone with big funds to guarantee AN access. Which is not fair.

I will wait maybe a year, I will commit 500 every month, someone else will come put 50000 , what chance I have?

Your augment being that if a rich entity wants to more or less guarantee their slots they would just attempt to stake a bunch of nodes, putting probabilities in their favor – correct?

stake a bunch of nodes, putting probabilities in their favor

Yes, I think AN should be unlimited, but till then, better not to favour rich.

If it will be random, can be something like I lock 500 flow for 1 year, even I win or lose it is locked up etc

First come first serve also favors the rich.

How about instead we keep a backstop governance process where operators can submit a FLIP-GOV proposal to get their key added in the short term while the available AN slots open slowly?

First come first server doesn’t favor the rich that much, because there is unknown factor ( which is when slot will open )

we keep a backstop governance process where operators can submit a FLIP-GOV

Oh this is not so bad, but considering we don’t have a functioning democracy it is a bit hard.

Also another option I think :

We can separate AN applications, public serving vs private for example, and give public serving ones priority.