Two and a half years ago, in my "Endgame" article, I mentioned that the future different development paths of blockchain appear to be very similar from a technical standpoint. In both cases, there are a large number of transactions on-chain, and processing these transactions requires: (1) a lot of computation; (2) a lot of data bandwidth.
A regular Ethereum node (such as the 2 TB reth archive node running on my computer now) cannot directly verify the massive amount of data and computation required, even with powerful software engineering capabilities and Verkle trees. Instead, in both "L1 sharding" and Rollup-centric solutions, ZK-SNARKs are used to verify computations, and DAS is used to verify data availability. Whether it's L2 sharding or Rollups, DAS is the same, and ZK-SNARKs technology is also identical. They are both smart contract code and a protocol feature. From a truly technical perspective, Ethereum is sharding, and Rollups are shards.
This naturally raises a question: what is the difference between the two? One difference is the consequences of code bugs: in Rollups, tokens get stolen; in sharding, consensus breaks down. However, I predict that as the protocol stabilizes and formal verification technology improves, the impact of code bugs will become increasingly smaller. So, what other differences might exist between these two solutions that could persist in the long term?
Diversity of Execution Environments
In 2019, we discussed the idea of execution environments in Ethereum. Essentially, Ethereum would have different "zones" that could set different rules for accounts (including completely different methods like UTXO), how virtual machines work, and other functionalities. This would allow for methodological diversity across different parts of the stack, but it would be difficult to achieve if Ethereum tried to integrate multiple functions into one.
Ultimately, we abandoned some of the more ambitious plans and kept only the EVM. However, Ethereum L2s (including rollups, validiums, and Plasmas) can be said to ultimately serve as execution environments. Currently, we usually focus on EVM-equivalent L2s, but this overlooks the diversity brought by many other methods:
- Arbitrum Stylus, which adds a second WASM-based oracle outside of the EVM;
- Fuel, which uses a UTXO-based architecture similar to Bitcoin (but more functional);
- Aztec, which introduces a new language and programming formalization centered around privacy-preserving smart contracts based on ZK-SNARKs.
We could try to make the EVM a super virtual machine that covers all possible formalizations, but this would greatly reduce the efficiency of each functionality. It would be better to let these platforms do what they specialize in.
Security Trade-offs: Scalability and Transaction Speed
Ethereum L1 provides very strong security guarantees. If data included in the final block on L1 is committed, the entire consensus (including social consensus in extreme cases) will ensure that the data cannot be altered and that any execution triggered by this data will not be reverted, and the data remains accessible. To achieve this security guarantee, Ethereum L1 is willing to accept high costs.
At the time of writing, transaction fees are relatively low: Layer 2 fees are less than one cent per transaction, and even on L1, basic ETH transfers are less than a dollar. If technological progress is rapid enough and the growth in available block space can keep up with demand, these fees may remain relatively low in the future, but they might not. For many non-financial applications (such as social media or gaming), even a transaction fee of 0.01 USD is too high.
However, social media and gaming do not need the same security model as L1. If someone can spend a million dollars to revoke their record of losing a game or make a tweet appear three days later than it actually did, it doesn't matter. Therefore, these applications should not bear the same security costs. L2 solutions achieve this by supporting a range of data availability methods from rollups to plasma to validiums.
Another trade-off is the issue of asset transfers from L2 to L2. I predict that in the next 5 to 10 years, all Rollups will be ZK Rollups, and ultra-efficient proof systems like Binius and Circle STARKs with lookups, combined with proof aggregation layers, will make it possible for L2s to provide final state roots in each slot.
But currently, we can only intricately blend Optimistic Rollups and ZK Rollups and use different proof time windows. If we had implemented execution sharding in 2021, the security model for keeping shards honest would have been Optimistic Rollups, not ZK, so L1 would have to manage complex on-chain fraud-proof logic, and withdrawal times would be as long as a week for transferring assets between shards. But like code bugs, I think this issue will eventually be temporary.
Transaction speed is a third aspect of the security trade-off, and a more lasting one. Ethereum produces blocks every 12 seconds and won't be faster to avoid excessive centralization. However, many L2s are exploring compressing block times to a few hundred milliseconds. Twelve seconds isn't too bad: users typically wait about 6-7 seconds on average for their transaction to be included in a block (not just 6 seconds because the next block may not include them). This is comparable to the time I wait when paying with a credit card. But many applications need faster speeds, and L2s can provide that.
To speed up, L2s have a pre-confirmation mechanism: L2 validators themselves promise to include a transaction at a specific time with digital signatures, and if the transaction is not included, they are penalized. The StakeSure mechanism further extends this mechanism.
Now, we could try to implement all these features on L1. L1 could include a "fast pre-confirmation" and "slow final confirmation" system. It could include different shards with different security levels. However, this would increase the protocol's complexity. Additionally, doing everything on L1 risks overloading consensus, as many larger-scale or higher-throughput methods pose higher centralization risks or require stronger forms of "governance," which, if done on L1, would impact other parts of the protocol. By providing trade-offs through L2s, Ethereum can largely avoid these risks.
Benefits of Layer 2 for Organization and Culture
Imagine a country split into two, one half becoming a capitalist state and the other half a highly government-led state (unlike real-world scenarios, suppose this is not the result of any traumatic war but just a natural border appearing one day).
In the capitalist part, restaurants are composed of different decentralized ownership, blockchains, and voting rights. In the government-led state, they are all government branches, like police stations. On the first day, there won't be much change. People will generally follow existing habits, what works and what doesn't, based on technical realities like labor skills and infrastructure. However, a year later, you'll see huge changes as different incentives and control structures lead to significant changes in behavior, affecting who comes and goes, what is built, what is maintained, and what is abandoned.
Industry organization theory talks a lot about such differences: it not only discusses the differences between government-managed economies and capitalist economies but also the differences between economies dominated by large franchise corporations and economies where each supermarket is run by an independent entrepreneur. I think the difference between an L1-centric ecosystem and an L2-centric ecosystem is similar.
As an L2-centric ecosystem, I think Ethereum's main advantages are as follows:
- Since Ethereum is an L2-centric ecosystem, you can independently build a sub-ecosystem with unique features while still being part of the larger Ethereum ecosystem.
- If you are only building an Ethereum client, you are part of the larger Ethereum, and although you have some room for innovation, it is far less than L2. However, if you are building a completely independent chain, your space for creation is vast, but you also lose the benefits of shared security and network effects. L2 is a good balance.
- It not only provides opportunities to try new execution environments and security trade-offs that can achieve scalability, flexibility, and speed but also offers an incentive mechanism that encourages developers to build and maintain and the community to support.
In fact, each L2 is isolated, which also means that deploying new methods does not require permission: you do not need to convince all core developers that your new method is "safe" for the entire chain. If your L2 fails, it's your responsibility. Anyone can propose wild ideas (e.g., Intmax's Plasma method), and even if Ethereum core developers completely ignore it, they can continue to build and eventually deploy.
L1 features and precompiles are not like this. Even in Ethereum, the success or failure of L1 development ultimately often depends on politics more than we would like. Whatever can theoretically be built, the different incentive mechanisms generated by L1-centric and L2-centric ecosystems will ultimately seriously affect what is actually built, the quality level, and the order in which it is built.
What Challenges Does Ethereum's L2-Centric Ecosystem Face?
This L2-centric approach faces a key challenge that an L1-centric ecosystem almost doesn't face to the same extent: coordination. In other words, while Ethereum has many L2s, the challenge is how to make it still feel like "Ethereum" and retain Ethereum's network effects rather than becoming N independent chains. Today, this situation is unsatisfactory in many ways:
- Cross-chain usually requires centralized bridges, which are very complex for ordinary users. If you have tokens on Optimism, you cannot just paste someone else's Arbitrum address into your wallet to transfer funds.
- Cross-chain contract support is not good for individual smart contract wallets and organizational wallets (including DAOs). If you change a key on one L2, you need to change it on every other L2.
- Decentralized verification infrastructure is often lacking. Ethereum finally started having decent light clients like Helios. However, if all activities happen on L2s, each needs its own centralized RPC, which is pointless. In principle, once you have Ethereum block headers, building light clients for L2s is not difficult; but in practice, this point has been too little emphasized.
The community is working to improve in these three areas. For cross-chain token exchange, the ERC-7683 standard is a new proposal that, unlike existing "centralized bridges," has no fixed centralized nodes, tokens, or governance. For cross-chain accounts, most wallets take the approach of using cross-chain replayable messages to update keys in the short term and keystore rollups in the long term. Light clients for L2s are beginning to appear, such as Beerus for Starknet. Additionally, recent improvements in user experience with next-generation wallets have solved more basic issues, like allowing users to access DApps without manually switching networks.
However, it must be acknowledged that an L2-centric ecosystem does face real challenges in coordination. Because a single L2 has no natural economic incentives to build infrastructure for coordination: small-scale L2s won't do it because they only gain a small portion of the benefits; large-scale L2s won't either because they can gain just as much or more by enhancing their own local network effects. If every L2 only considers itself and no one considers how to align with the broader Ethereum system, we will fail, just like the urban utopia described in the previous sections.
It's hard to say there's a perfect solution to this problem. I can only say that the ecosystem needs to more fully recognize that cross-L2 infrastructure is as much a type of Ethereum infrastructure as L1 clients, development tools, and programming languages and should be valued and funded. We have the Protocol Guild; maybe we need a Basic Infrastructure Guild.
Conclusion
In various public discussions, "L2" and "sharding" are often seen as two opposing strategies for blockchain scaling. But when you study the underlying technology, you find a dilemma: the actual underlying scaling methods are exactly the same. Whether it's data sharding, fraud verifiers or ZK-SNARK verifiers, or solutions for cross "Rollup, shard" communication, the main difference is: who is responsible for building and updating these components, and how much autonomy do they have?
An L2-centric ecosystem is essentially sharding from a truly technical perspective, but in sharding, you can build your own shards with your own rules. This is very powerful, with limitless creativity and a lot of autonomous innovation. But it also has some key challenges, especially in terms of coordination. For an L2-centric ecosystem like Ethereum to succeed, it must understand these challenges and tackle them head-on to gain as many benefits as possible from an L1-centric ecosystem and come as close as possible to the best of both worlds.