Author: Vitalik, founder of Ethereum; Translation: 0xjs@黄金财经
Two and a half years ago, I pointed out in my article on "The End of Ethereum" that from a technical perspective, different paths for the future development of blockchain look very similar. In both cases, there are a large number of transactions on the chain, and processing these transactions requires (i) a lot of computation and (ii) a lot of data bandwidth. Regular Ethereum nodes like the 2TB reth archive node running on the laptop I used to write this article are not sufficient to directly verify such a huge amount of data and computation, even with great software engineering works and Verkle trees.
In both the "L1 sharding" and rollup-centric worlds, ZK-SNARKs are used to verify computations, and DAS (Data Availability Sampling) is used to verify data availability. The DAS in both cases is the same. The ZK-SNARKs technology in both cases is also the same, the difference is that one is smart contract code and the other is an embedded feature of the protocol. From a technical perspective, Ethereum is actually undergoing sharding, and rollups are part of sharding.
This leads to a natural question: what is the difference between these two worlds? One answer is that the consequences of a code error are different: in the rollup world, tokens are lost, while in the sharded chain world, there is a consensus failure.But I expect that as the protocol solidifies and formal verification techniques improve, the importance of errors will decrease. So what are the long-term differences we can expect between these two visions?
Diversity of execution environments
One idea we briefly experimented with on Ethereum in 2019 was execution environments. Essentially, Ethereum would have different “zones” that could have different rules for how accounts work (including completely different approaches like UTXO), how the VM works, and other features. This would enable some approaches that would be difficult to achieve if Ethereum did everything on its own.
Ultimately, we abandoned some of the more ambitious plans and kept only the EVM. However, Ethereum’s L2 (including rollups, valdiums, and Plasmas) has served as an execution environment to some extent. Today, we often focus on the EVM-equivalent L2, but this ignores the diversity of many alternative approaches:
Arbitrum Stylus, which adds a second VM based on WASM in addition to the EVM.
Fuel, which uses a UTXO architecture similar to Bitcoin (but more complete).
Aztec, which introduces a new language and programming paradigm designed around privacy-preserving smart contracts with ZK-SNARKs.
Fuel's UTXO architecture
We can try to turn the EVM into a super virtual machine that covers all possible paradigms, but this will result in a much less effective implementation of each concept than if platforms like these focus on their respective areas.
Security tradeoffs: scale and speed
Ethereum L1 provides very strong security guarantees. If some data is in a block that is confirmed on L1, this whole consensus (including social consensus in extreme cases) ensures that the data cannot be edited in a way that violates the rules of the application, any execution triggered by the data cannot be undone, and the data will remain accessible. To achieve these guarantees, Ethereum L1 is willing to accept high costs. At the time of writing, transaction fees are relatively low: Layer 2 networks charge less than a penny per transaction, and even basic ETH transfers on L1 cost less than $1. If technology advances quickly enough that the growth of available block space can keep up with demand, these costs may remain low - but they may not. And even $0.01 per transaction is too high for many non-financial applications, such as social media or games.
ButSocial media and games don't need the same security model as L1. If someone pays a million dollars to undo the record of a chess game they lost, or to make your tweet look like it was posted three days after it was actually posted, that's acceptable. Therefore, these applications should not pay the same security cost. An L2-centric approach makes this possible by supporting a variety of data availability methods from rollups to plasma to validiums. Different use cases, different L2 types Another security trade-off arises when passing assets from L2 to L2. I expect that in 5-10 years, all rollups will be ZK rollups, and super-efficient proof systems like Binius and Circle STARKs combined with lookup and proof aggregation layers will enable L2s to provide final state roots at every slot. Currently, we have complex hybrid optimistic rollups and ZK rollups, with various proof time windows. If we implemented execution sharding in 2021, the security model for keeping the shards honest would be optimistic rollups, not ZK - so L1s would have to manage the system's complex fraud proof logic and have a one-week waiting period for assets to move from shard to shard. But I think this problem is ultimately temporary.
The third and equally persistent security tradeoff dimension is transaction speed. Ethereum produces a block every 12 seconds and is reluctant to go faster because that would over-centralize the network. However, many L2s are exploring block times of a few hundred milliseconds. 12 seconds isn’t too bad: on average, users submitting transactions have to wait about 6-7 seconds for them to be included in a block (not just 6 seconds, because there’s a chance that the next block won’t include them). That’s about the same time I wait when I pay with my credit card. But many applications need more speed, and L2 provides that.
To provide that higher speed, L2 relies on a pre-confirmation mechanism: L2’s own validators digitally sign a promise to include a transaction at a specific time, and they may be penalized if the transaction is not included. A mechanism called StakeSure generalizes this further.
L2 Pre-Confirmation
We could try to do all of this on L1. L1 could combine "fast pre-confirmation" and "slow final confirmation" systems. It could combine shards with different security levels. However, this would add a lot of complexity to the protocol. In addition, doing it all on L1 would risk overloading consensus, because many higher-scale or faster-throughput approaches have higher centralization risks or require stronger forms of "governance", and if done on L1, the impact of these stronger requirements would ripple through to other parts of the protocol. By offering these tradeoffs through L2, Ethereum can mostly avoid these risks.
Organizational and Cultural Advantages of L2
Imagine a country that is split in half, with one half becoming capitalist and the other becoming a highly government-dominated society (unlike this in reality, assume that in this thought experiment it is not the result of any kind of traumatic war; rather, a border magically appears one day and that's it). In the capitalist part, restaurants are run by various decentralized ownerships, chains, and franchises. In the government-dominated part, they are all branches of the government, like the police department. On the first day, not much will change. People generally follow existing habits, and what works and what doesn't depends on technical realities, such as labor skills and infrastructure. A year later, you'd expect to see big changes, as different incentive and control structures lead to big changes in behavior, affecting who comes, who stays, who leaves, what is built, what is maintained, and what is abandoned.
Industrial organization theory covers many of these distinctions: it talks not only about the difference between a government-run economy and a capitalist economy, but also about the difference between an economy dominated by large franchises and an economy where, for example, each supermarket is run by an independent entrepreneur. I think the distinction between an L1-centric ecosystem and an L2-centric ecosystem is similar.
The "core staff runs everything" architecture will have big problems
The key benefits of Ethereum as a second-layer network-centric ecosystem can be stated as follows:
Ethereum is an L2-centric ecosystem, you can freely and independently build a sub-ecosystem that belongs to you, with your unique characteristics, while being part of the larger Ethereum.
If you are just building an Ethereum client, you are part of the larger Ethereum, and although you have some room for creativity, it is much less than L2. If you’re building a completely independent chain, you have the most room for creativity, but you lose the benefits of shared security and shared network effects. L2s form a happy middle ground.
L2s not only create a technical opportunity to experiment with new execution environments and security tradeoffs to achieve scale, flexibility, and speed: they also create incentives for developers to build and maintain it, and for communities to form around it and support it.
The fact that each L2 is isolated means that deploying new approaches is permissionless: there’s no need to convince all the core developers that your new approach is “safe” for the rest of the chain. If your L2 fails, that’s on you. Anyone can work on completely weird ideas (such as Intmax’s approach to Plasma), and even if they’re completely ignored by Ethereum core developers, they can continue to build and eventually deploy. This is not the case with L1 features and precompiles, and even in Ethereum, the decisions about the success and failure of L1 development often depend on more politics than we’d like. Whatever can be built in theory, the different incentives created by L1-centric ecosystems and L2-centric ecosystems will ultimately greatly affect what is actually built, its quality, and its order.
Challenges facing Ethereum's L2-centric ecosystem
1 layer + 2 layer architecture will also have problems
This L2-centric approach faces a key challenge of coordination, while the L1-centric ecosystem hardly needs to face this problem. In other words, while Ethereum forks out, the challenge is to keep it feeling like “Ethereum” and having the network effects of being Ethereum rather than N separate chains. The situation today is not ideal in many ways:
Moving tokens from one L2 to another typically requires a centralized bridge platform and is complicated for the average user. If you have tokens on Optimism, you can’t just paste someone else’s Arbitrum address into your wallet and send funds.
Cross-chain smart contract wallet support is poor — both for personal smart contract wallets and organizational wallets (including DAOs). If you change your keys on one L2, you also need to change your keys on every other L2.
Decentralized validation infrastructure is often lacking. Ethereum is finally starting to have excellent light clients like Helios. However, this doesn’t make sense if all the activity happens on L2, which requires its own centralized RPC. In principle, it’s not hard to make light clients for L2 once you have the Ethereum header chain; in practice, too few people emphasize it.
There are efforts to improve all three aspects. For cross-chain token swaps, the ERC-7683 standard is an emerging option that, unlike existing “centralized bridges,” does not have any fixed central operator, token, or governance. For cross-chain accounts, the approach taken by most wallets is to use cross-chain replayable messages to update keys in the short term and key storage rollups in the long term. Light clients for L2 are starting to appear, such as Beerus for Starknet. Additionally, recent improvements to the user experience through next-generation wallets have solved many of the more fundamental issues, such as eliminating the need for users to manually switch to the correct network to access dapps.
Rabby displays a comprehensive view of asset balances across multiple chains. In the dark ages not so long ago, wallets didn't have this!
But it's important to recognize that an L2-centric ecosystem is indeed swimming upstream to some extent when it comes to coordination. Individual L2s have no natural economic incentive to build coordination infrastructure: small ones have none, because they would only see a small share of the benefit of their contribution, and large ones have none, because they would benefit more from strengthening their own local network effects. If each L2 optimizes its individual parts in isolation, and no one considers how each part fits into the greater whole, we get the urbanized dystopia shown in the picture a few paragraphs above.
I don’t claim to have a magical perfect solution to this problem. The best I can suggest is that the ecosystem needs to more fully recognize that cross-L2 infrastructure is a form of Ethereum infrastructure that should be valued and funded in the same way as L1 clients, development tools, and programming languages. We have the Protocol Guild; maybe we also need the Infrastructure Guild.
Conclusion
“L2” and “sharding” are often described as two opposing blockchain scaling strategies. But when you look at the underlying technology, it’s confusing: the actual underlying scaling methods are exactly the same.You have some kind of data sharding. You have fraud provers or ZK-SNARKs provers. You have solutions for cross-{rollup, shard} communication. The main difference is: who is responsible for building and updating these parts, and how autonomous are they?
From a technical perspective, an ecosystem centered around L2 is sharding, and you can create your own shard with your own rules. This kind of sharding is powerful and can inspire creativity and autonomous innovation. But it also faces key challenges, especially around coordination. For an L2-centric ecosystem like Ethereum to succeed, it needs to understand these challenges and confront them head-on to gain as many benefits of an L1-centric ecosystem as possible and get as close as possible to having the best of both worlds.