Any blockchain protocol must be designed to prevent the double execution of a transaction. (to read more about the problem checkout here)
Currently, execution nodes use sequence numbers to prevent double execution. Here is my proposal of how we could achieve the goal without using sequence numbers.
has a limited expiry time(on flow through referencing a block, bounded by a max number of blocks in the past)
has a unique way of reference (i.e. flow transaction ID is protected against malleability)
As part of the execution state on execution nodes, we could add a secondary on-chain Merkle trie (beside account states) to capture transaction inclusion. Every time a transaction is about to be executed EN nodes add the transaction to the merkle tree and if it already exist it rejects execution of the transaction
Just like the execution state, we would to only keep this state on the execution nodes and provides a secondary level of proofs to verification nodes to verify inclusion or non-inclusion of a transaction. (inclusion proof of transaction before execution shows why execution was rejected).
Clearly, this trie would keep growing and we need pruning and thats might look challenging initially. One way to think about this is we only need to keep executed but not expired transaction in the trie, if an expired transaction is included, then it won’t be executed and would make the collection invalid which already Flow protocol handles if it is not expired but executed in the past we need to check the trie and make sure its the first time we are executing this transaction.
But how do we prune this trie and remove transactions that have already expired without having this tracking on verification nodes?
Our proposal
We have designed a method that could work using two tricks
special consideration for path construction of a transactions
branch pruning with proofs
Let’s consider for inserting a transaction into the trie, we don’t use the transaction ID and we use a longer variation of path, concatenation of block reference and transaction ID. Note the block reference here is different from the block that the transaction is getting executed, it just simply the value that is set on the transaction for expiry. This way when we are inserting transactions into the trie, it would be automatically update sub-tries that has to be pruned after a block is not valid for referencing anymore (or all transaction in that subtrie gets expired).
This way we don’t need to track transactions in a secondary location and trie it’s encapsulating that information within. This pruning by subpath or path prefix (block hash) could be done on system chunk and be verified by verification nodes. the proof here would be simply replacing an interim node with a default node or an empty node, thus we don’t need to provide a batch of proof anymore.
How about proof sizes?
Our calculation shows the proof size even on fully matured Flow (millions of transactions) is still manageable and also since this trie is on a separate subtrie than the execution node, these proofs could be constructed in parallel to collection execution (pipelined) and could be send as a secondary package for data beside chunk data packs for verification.
About the trie?
We use a special variation of a Merkle-trie optimized for this problem and provides set commitment over the list of non-expired transaction executed recently and going to be executed in the current collection.
This special data structure
supports variable length paths and interim node compactions
is optimized for batch insertion of non-existing values and returning errors if any value already existed
is optimized for easy dropping the values on a sub-trie given a sub-path (with prune-proof)
is optimized for batch proof of non-inclusion of values (in practice batch proof of inclusion of empty values helps the verification nodes to have the full structure of the partial trie)
What about transaction ordering?
I believe transaction ordering is still possible but with explicit hints on transaction, a transaction could also reference another unexpired transaction to tell collection nodes and execution nodes to only include if the previous one has been executed. This only requires the trie to hold on to transactions double the max size of expiry in the trie.
If the objective is getting rid of sequence number, it is a solid idea.
But I think need to consider about benefits we lose in the process which sequence number provides. I thought about this before, I came up with few benefits of sequence number ( I maybe wrong )
detecting key leak; one can keep sequence number for keys on client side and then detect if it is compromised.
Most HSMs have counter for keys used for signing operations, losing sync on counter can mean key leak or at least something fishy going on.
generating entropy ( not sure the right word ) for transaction ID; even if I sign the same transaction, as sequence number changes txid changes.
Currently if I keep sequence number on the client side, same transaction run many times ( lets say minting some NFTs in advance ) just requires occasional reference blockid update.
Btw can you explain a bit more on need to trie here? Why do we need a proof inclusion?
Thats right, objective is to not deal with the issues of the sequence number but you’re right we need to find solutions to address the side effect benefits of sequence number or nonce: tx ordering, detecting key leaks and source of entropy as you mentioned.
For key leaks, I’m not a big fan of using seq number check as it’s a passive way of knowing compromises on keys, probably when you realized that a sequence number is different, it has already beyond the window time of action to make an account safe.
I think we need to solve this problem somewhere else, I have actually another idea though on an account level, if we restructure the trie to use the account address as subpath (which I think is part of the road map for storage) we could benefit from providing the state proof for an account showing that nothing has changed about that account and any 3rd party can keep track of this as a service to notify users.
For the source of entropy, you’re right, I believe that was the main reason initially the sequence number of EVM was named nonce, cause it meant to act as a random number, not a sequence number. we might still benefit from having a nonce or a random number to differentiate two transactions with exact functionality. we just don’t need to enforce ordering and deduplication with it.
About the trie, I’m preparing a doc about the proofs and everything else