Forward the Original TitleâEpochs and slots all the way down: ways to give Ethereum users faster transaction confirmation timesâ
One of the important properties of a good blockchain user experience is fast transaction confirmation times. Today, Ethereum has already improved a lot compared to five years ago. Thanks to the combination of EIP-1559 and steady block times after the Merge, transactions sent by users on L1 reliably confirm within 5-20 seconds. This is roughly competitive with the experience of paying with a credit card. However, there is value in improving user experience further, and there are some applications that outright require latencies on the order of hundreds of milliseconds or even less. This post will go over some of the practical options that Ethereum has.
Today, Ethereumâs Gasper consensus uses a slot and epoch architecture. Every 12-second slot, a subset of validators publish a vote on the head of the chain, and over the course of 32 slots (6.4 min), all validators get a chance to vote once. These votes are then re-interpreted as being messages in a vaguely PBFT-like consensus algorithm, which after two epochs (12.8 min) gives a very hard economic assurance called finality.
Over the last couple of years, weâve become more and more uncomfortable with the current approach. The key reasons are that (i) itâs complicated and there are many interaction bugs between the slot-by-slot voting mechanism and the epoch-by-epoch finality mechanism, and (ii) 12.8 minutes is way too long and nobody cares to wait that long.
Single-slot finality replaces this architecture by a mechanism much more similar to Tendermint consensus, in which block N is finalized before block N+1 is made. The main deviation from Tendermint is that we keep the âinactivity leakâ mechanism, which allows the chain to keep going and recover if more than 1/3 of validators go offline.
A diagram of the leading proposed single-slot finality design_
The main challenge with SSF is that naively, it seems to imply that every single Ethereum staker would need to publish two messages every 12 seconds, which would be a lot of load for the chain to handle. There are clever ideas for how to mitigate this, including the very recent Orbit SSF proposal. But even still, while this improves UX significantly by making âfinalityâ come faster, it doesnât change the fact that users need to wait 5-20 seconds.
For the last few years, Ethereum has been following a rollup-centric roadmap, designing the Ethereum base layer (the âL1â) around supporting data availability and other functionalities that can then be used by layer 2 protocols like rollups (but also validiums and plasmas) that can give users the same level of security as Ethereum, but at much higher scale.
This creates a separation-of-concerns within the Ethereum ecosystem: the Ethereum L1 can focus on being censorship resistant, dependable, stable, and maintaining and improving a certain base-level core of functionality, and L2s can focus on more directly reaching out to users - both through different cultural and technological tradeoffs. But if you go down this path, one inevitable issue comes up: L2s want to serve users who want confirmations much faster than 5-20 seconds.
So far, at least in the rhetoric, it has been L2sâ responsibility to create their own âdecentralized sequencingâ networks. A smaller group of validators would sign off on blocks, perhaps once every few hundred milliseconds, and they would put their âstakeâ behind those blocks. Eventually, headers of these L2 blocks get published to L1.
L2 validator sets could cheat: they could first sign block B1, and then later sign a conflicting block B2 and commit it onto the chain before B1. But if they do this, they would get caught and lose their deposits. In practice, we have seen centralized versions of this, but rollups have been slow to develop decentralized sequencing networks. And you can argue that demanding L2s all do decentralized sequencing is an unfair deal: weâre asking rollups to basically do most of the same work as creating an entire new L1. For this reason and others, Justin Drake has been promoting a way to give all L2s (as well as L1) access to a shared Ethereum-wide preconfirmation mechanism: based preconfirmations.
The based preconfirmation approach assumes that Ethereum proposers will become highly sophisticated actors for MEV-related reasons (see here for my explanation of MEV, and see also the execution tickets line of proposals). The based preconfirmation approach takes advantage of this sophistication by incentivizing these sophisticated proposers to accept the responsibility of offering preconfirmations-as-a-service.
The basic idea is to create a standardized protocol by which a user can offer an additional fee in exchange for an immediate guarantee that the transaction will be included in the next block, along with possibly a claim about the results of executing that transaction. If the proposer violates any promise that they make to any user, they can get slashed.
As described, based preconfirmations provide guarantees to L1 transactions. If rollups are âbasedâ, then all L2 blocks are L1 transactions, and so the same mechanism can be used to provide preconfirmations for any L2.
Suppose that we implement single slot finality. We use Orbit-like techniques to reduce the number of validators signing per slot, but not too much, so that we can also make progress on the key goal of reducing the 32 ETH staking minimum. As a result, perhaps the slot time creeps upward, to 16 sec. We then use either rollup preconfirmations, or based preconfirmations, to give users faster assurances. What do we have now? An epoch-and-slot architecture.
The âtheyâre the same pictureâ meme is getting quite overused at this point, so Iâll just put an old diagram I drew years ago to describe Gasperâs slot-and-epoch architecture and a diagram of L2 preconfirmations beside each other, and hopefully that will get the point across.
There is a deep philosophical reason why epoch-and-slot architectures seem to be so hard to avoid: it inherently takes less time to come to approximate agreement on something, than to come to maximally-hardened âeconomic finalityâ agreement on it.
One simple reason why is number of nodes. While the old linear @VitalikButerin/parametrizing-casper-the-decentralization-finality-time-overhead-tradeoff-3f2011672735">decentralization / finality time / overhead tradeoff is looking milder now due to hyper-optimized BLS aggregation and in the near future ZK-STARKs, itâs still fundamentally true that:
In Ethereum today, a 12-second slot is divided into three sub-slots, for (i) block publication and distribution, (ii) attestation, and (iii) attestation aggregation. If the attester count was much lower, we could drop to two sub-slots and have an 8-second slot time. Another, and realistically bigger, factor, is âqualityâ of nodes. If we could also rely on a professionalized subset of nodes to do approximate agreements (and still use the full validators set for finality), we could plausibly drop that to ~2 seconds.
Hence, it feels to me that (i) slot-and-epoch architectures are obviously correct, but also (ii) not all slot-and-epoch architectures are created equal, and thereâs value in more fully exploring the design space. In particular, itâs worth exploring options that are not tightly interwoven like Gasper, and where instead thereâs stronger separation of concerns between the two mechanisms.
In my view, there are three reasonable strategies for L2s to take at the moment:
For some applications, (eg. ENS, keystores), some payments), a 12-second block time is enough. For those applications that are not, the only solution is a slot-and-epoch architecture. In all three cases, the âepochsâ are Ethereumâs SSF (perhaps we can retcon that acronym into meaning something other than âsingle slotâ, eg. it could be âSecure Speedy Finalityâ). But the âslotsâ are something different in each of the above three cases:
A key question is, how good can we make something in category (1)? In particular, if it gets really good, then it feels like category (3) ceases to have as much meaning. Category (2) will always exist, at the very least because anything âbasedâ doesnât work for off-chain-data L2s such as plasmas and validiums. But if an Ethereum-native slot-and-epoch architecture can get down to 1-second âslotâ (ie. pre-confirmation) times, then the space for category (3) becomes quite a bit smaller.
Today, weâre far from having final answers to these questions. A key question - just how sophisticated are block proposers going to become - remains an area where there is quite a bit of uncertainty. Designs like Orbit SSF are very recent, suggesting that the design space of slot-and-epoch designs where something like Orbit SSF is the epoch is still quite under-explored. The more options we have, the better we can do for users both on L1 and on L2s, and the more we can simplify the job of L2 developers.
Forward the Original TitleâEpochs and slots all the way down: ways to give Ethereum users faster transaction confirmation timesâ
One of the important properties of a good blockchain user experience is fast transaction confirmation times. Today, Ethereum has already improved a lot compared to five years ago. Thanks to the combination of EIP-1559 and steady block times after the Merge, transactions sent by users on L1 reliably confirm within 5-20 seconds. This is roughly competitive with the experience of paying with a credit card. However, there is value in improving user experience further, and there are some applications that outright require latencies on the order of hundreds of milliseconds or even less. This post will go over some of the practical options that Ethereum has.
Today, Ethereumâs Gasper consensus uses a slot and epoch architecture. Every 12-second slot, a subset of validators publish a vote on the head of the chain, and over the course of 32 slots (6.4 min), all validators get a chance to vote once. These votes are then re-interpreted as being messages in a vaguely PBFT-like consensus algorithm, which after two epochs (12.8 min) gives a very hard economic assurance called finality.
Over the last couple of years, weâve become more and more uncomfortable with the current approach. The key reasons are that (i) itâs complicated and there are many interaction bugs between the slot-by-slot voting mechanism and the epoch-by-epoch finality mechanism, and (ii) 12.8 minutes is way too long and nobody cares to wait that long.
Single-slot finality replaces this architecture by a mechanism much more similar to Tendermint consensus, in which block N is finalized before block N+1 is made. The main deviation from Tendermint is that we keep the âinactivity leakâ mechanism, which allows the chain to keep going and recover if more than 1/3 of validators go offline.
A diagram of the leading proposed single-slot finality design_
The main challenge with SSF is that naively, it seems to imply that every single Ethereum staker would need to publish two messages every 12 seconds, which would be a lot of load for the chain to handle. There are clever ideas for how to mitigate this, including the very recent Orbit SSF proposal. But even still, while this improves UX significantly by making âfinalityâ come faster, it doesnât change the fact that users need to wait 5-20 seconds.
For the last few years, Ethereum has been following a rollup-centric roadmap, designing the Ethereum base layer (the âL1â) around supporting data availability and other functionalities that can then be used by layer 2 protocols like rollups (but also validiums and plasmas) that can give users the same level of security as Ethereum, but at much higher scale.
This creates a separation-of-concerns within the Ethereum ecosystem: the Ethereum L1 can focus on being censorship resistant, dependable, stable, and maintaining and improving a certain base-level core of functionality, and L2s can focus on more directly reaching out to users - both through different cultural and technological tradeoffs. But if you go down this path, one inevitable issue comes up: L2s want to serve users who want confirmations much faster than 5-20 seconds.
So far, at least in the rhetoric, it has been L2sâ responsibility to create their own âdecentralized sequencingâ networks. A smaller group of validators would sign off on blocks, perhaps once every few hundred milliseconds, and they would put their âstakeâ behind those blocks. Eventually, headers of these L2 blocks get published to L1.
L2 validator sets could cheat: they could first sign block B1, and then later sign a conflicting block B2 and commit it onto the chain before B1. But if they do this, they would get caught and lose their deposits. In practice, we have seen centralized versions of this, but rollups have been slow to develop decentralized sequencing networks. And you can argue that demanding L2s all do decentralized sequencing is an unfair deal: weâre asking rollups to basically do most of the same work as creating an entire new L1. For this reason and others, Justin Drake has been promoting a way to give all L2s (as well as L1) access to a shared Ethereum-wide preconfirmation mechanism: based preconfirmations.
The based preconfirmation approach assumes that Ethereum proposers will become highly sophisticated actors for MEV-related reasons (see here for my explanation of MEV, and see also the execution tickets line of proposals). The based preconfirmation approach takes advantage of this sophistication by incentivizing these sophisticated proposers to accept the responsibility of offering preconfirmations-as-a-service.
The basic idea is to create a standardized protocol by which a user can offer an additional fee in exchange for an immediate guarantee that the transaction will be included in the next block, along with possibly a claim about the results of executing that transaction. If the proposer violates any promise that they make to any user, they can get slashed.
As described, based preconfirmations provide guarantees to L1 transactions. If rollups are âbasedâ, then all L2 blocks are L1 transactions, and so the same mechanism can be used to provide preconfirmations for any L2.
Suppose that we implement single slot finality. We use Orbit-like techniques to reduce the number of validators signing per slot, but not too much, so that we can also make progress on the key goal of reducing the 32 ETH staking minimum. As a result, perhaps the slot time creeps upward, to 16 sec. We then use either rollup preconfirmations, or based preconfirmations, to give users faster assurances. What do we have now? An epoch-and-slot architecture.
The âtheyâre the same pictureâ meme is getting quite overused at this point, so Iâll just put an old diagram I drew years ago to describe Gasperâs slot-and-epoch architecture and a diagram of L2 preconfirmations beside each other, and hopefully that will get the point across.
There is a deep philosophical reason why epoch-and-slot architectures seem to be so hard to avoid: it inherently takes less time to come to approximate agreement on something, than to come to maximally-hardened âeconomic finalityâ agreement on it.
One simple reason why is number of nodes. While the old linear @VitalikButerin/parametrizing-casper-the-decentralization-finality-time-overhead-tradeoff-3f2011672735">decentralization / finality time / overhead tradeoff is looking milder now due to hyper-optimized BLS aggregation and in the near future ZK-STARKs, itâs still fundamentally true that:
In Ethereum today, a 12-second slot is divided into three sub-slots, for (i) block publication and distribution, (ii) attestation, and (iii) attestation aggregation. If the attester count was much lower, we could drop to two sub-slots and have an 8-second slot time. Another, and realistically bigger, factor, is âqualityâ of nodes. If we could also rely on a professionalized subset of nodes to do approximate agreements (and still use the full validators set for finality), we could plausibly drop that to ~2 seconds.
Hence, it feels to me that (i) slot-and-epoch architectures are obviously correct, but also (ii) not all slot-and-epoch architectures are created equal, and thereâs value in more fully exploring the design space. In particular, itâs worth exploring options that are not tightly interwoven like Gasper, and where instead thereâs stronger separation of concerns between the two mechanisms.
In my view, there are three reasonable strategies for L2s to take at the moment:
For some applications, (eg. ENS, keystores), some payments), a 12-second block time is enough. For those applications that are not, the only solution is a slot-and-epoch architecture. In all three cases, the âepochsâ are Ethereumâs SSF (perhaps we can retcon that acronym into meaning something other than âsingle slotâ, eg. it could be âSecure Speedy Finalityâ). But the âslotsâ are something different in each of the above three cases:
A key question is, how good can we make something in category (1)? In particular, if it gets really good, then it feels like category (3) ceases to have as much meaning. Category (2) will always exist, at the very least because anything âbasedâ doesnât work for off-chain-data L2s such as plasmas and validiums. But if an Ethereum-native slot-and-epoch architecture can get down to 1-second âslotâ (ie. pre-confirmation) times, then the space for category (3) becomes quite a bit smaller.
Today, weâre far from having final answers to these questions. A key question - just how sophisticated are block proposers going to become - remains an area where there is quite a bit of uncertainty. Designs like Orbit SSF are very recent, suggesting that the design space of slot-and-epoch designs where something like Orbit SSF is the epoch is still quite under-explored. The more options we have, the better we can do for users both on L1 and on L2s, and the more we can simplify the job of L2 developers.