Original text: Vitalik Buterin, translated and edited by DeFi.
Special thanks to Georgios Konstantopoulos, Karl Florsch and Starkware for their feedback and review.
A recurring theme in the discussion of Layer 2 (L2) extensions is the concept of "Layer 3" (L3). If we can build an L2 protocol, which is anchored to L1 to achieve security and increase scalability on the top, then we can certainly build an L3 protocol to further expand, which is anchored to L2 to achieve security and increase more scalability on the top?
A simple version of this idea is: if you have a scheme that can give you secondary expansion, can you stack this scheme on itself and obtain exponential expansion? Such ideas include the scalability paper I published in 2015, the multi-layer expansion idea in Plasma paper, and so on. Unfortunately, such a simple L3 concept can rarely be solved so easily. There are always some things in the design that are not stackable, and can only give you a scalability improvement - data availability restrictions, dependence on L1 bandwidth for emergency withdrawals or many other problems.
The new ideas around L3 are more complex, such as the framework proposed by Starkware: they not only stack the same things on themselves, but also allocate different uses for L2 and L3. Some form of this approach may be a good idea - provided it is done in the right way. This article will introduce in detail which may or may not be meaningful in a three-tier architecture.
Rollups (see this long article I published) is an extension technology that combines different technologies to solve two major expansion bottlenecks in running blockchain: computing and data. Calculations are solved by fraud proofs or SNARKs, which rely on a very small number of participants to process and verify each block and require others to perform only a small amount of calculations to check whether the certification process is completed correctly. These schemes, especially SNARK, can be almost infinitely expanded; You can really continue to make "SNARKs of many SNARKs" to reduce more calculations to a single proof.
Data is different. Rollups uses a series of compression techniques to reduce the amount of data that transactions need to store on the chain: a simple currency transfer is reduced from~100 bytes to~16 bytes, ERC20 transfers in the EVM compatible chain are reduced from~180 bytes to~23 bytes, and a privacy protected ZK-SNARK transaction can be compressed from~600 bytes to~80 bytes. Approximately 8 times compression in all cases. However, rollback still needs to make data available on the chain in the media that can ensure users' access and authentication, so that users can independently calculate the status of rollback and join as a certifier when the existing certifier is offline. Data can be compressed once, but cannot be compressed again - If it is necessary to compress again, there is usually a way to put the logic of the second compressor into the first compressor and obtain the same benefits by compressing it once. Therefore, "roll up on roll up" does not actually provide significant benefits in terms of scalability - although, as we will see below, this model can be used for other purposes.
Well, let's take a look at what Starkware advocated in their post on L3. Starkware is composed of very smart cryptologists, who are actually rational. So if they advocate L3, their version will be much more complex than "if rollups compress data 8 times, then it is obvious that rollups above rollups will compress data 64 times".
This is the chart in Starkware's post:
Quote several points:
Figure 1 depicts an example of such an ecosystem. Its L3 includes:
We can refine the main points of this article into three visions of "L3":
In my opinion, all three visions are basically reasonable. The idea that private data compression needs its own platform may be the weakest proposition - it is very easy to design L2 with a common base layer compression scheme, and users can use application specific sub compressors to automatically expand - but whether the use cases are reasonable or not. But this still leaves a big question: is a three-tier structure the right way to achieve these goals? What's the point of anchoring authentication, privacy systems, and custom environments to L2 rather than just L1? Facts have proved that the answer to this question is quite complex.
Which is actually better?
A possible argument that the three-layer model is superior to the two-layer model is that the three-layer model allows the entire sub ecosystem to exist in a single rollup, which allows cross domain operations within the ecosystem to occur very cheaply without having to go through the expensive first layer.
But it turns out that you can deposit and withdraw money cheaply even between two L2 (or even L3) that promise the same L1! The key implementation is that tokens and other assets do not have to be issued in the root chain. That is, you can have ERC20 tokens on Arbitrum, create a wrapper on Optimism, and move back and forth between them without any L1 transactions!
Let's see how such a system works. There are two kinds of smart contracts: the basic contract on Arbitrum and the encapsulated token contract on Optimism. To transfer from Arbitrum to Optimism, you need to send the token to the underlying contract, which will generate a receipt. Once Arbitrum is finally determined, you can obtain the Merkle certificate of the receipt, which is rooted in L1 status, and send it to the packaged token contract on Optimism, which verifies it and issues you a packaged token. To move the token back, you can perform the same operation in reverse.
Even if the Merkle path required to prove the deposit on the Arbitrum passes through the L1 status, Optimism only needs to read the L1 status root to process the deposit - no L1 transaction is required. Please note that since rollup data is the most scarce resource, the actual implementation of this scheme will use SNARK or KZG certification instead of Merkle certification to save space.
Compared with L1 based tokens, this scheme has a key weakness, at least in terms of optimal rollback: deposits still need to wait for the fraud prevention window. If the token is rooted in L1, it will take a week to withdraw from Arbitrum or Optimism to L1, but the deposit is instant. However, in this scheme, both deposits and withdrawals require a one week delay. That is to say, it is not clear whether the three-tier architecture on optimal rollback is better: to ensure that the anti fraud game occurring in the system running on the anti fraud game itself is safe, there are many technical complexities.
Fortunately, none of these problems will become a problem for ZK rollup. For security reasons, ZK rollups do not need a waiting window of up to a week, but they still need shorter windows for the other two reasons (the first generation technology may take 12 hours). First of all, the more complex general ZK-EVM rollup needs a longer time to cover the non parallelizable computing time of the proof block. Secondly, for economic reasons, it is rarely necessary to submit proofs to minimize the fixed costs associated with the proof transaction. The next generation ZK-EVM technology, including dedicated hardware, will solve the first problem, while the batch verification with better architecture can solve the second problem. What we are going to discuss next is the problem of optimization and batch submission of proofs.
The roll up cost of each transaction is cheap: it is only 16-60 bytes of data, depending on the application. However, every time a batch of transactions is submitted to the chain, Rollups must also pay a high fixed cost: 21000 L1 gas for each batch of optional rollups, and more than 400000 gas for ZK rollups (if you want to use only the quantum security products of STARK, you need millions of gas).
Of course, rollback can simply choose to wait for L2 transactions with 10 million gas values to submit batches, but this will bring them a very long batch interval, forcing users to wait longer until they get high security confirmation. Therefore, they need to balance: long batch interval and optimal cost, or short batch interval and greatly increased cost.
In order to give us some specific figures, let's consider a ZK rollup with a cost of 600000 gas per batch, and process a fully optimized ERC20 transfer (23 bytes) with a transaction cost of 368 gas per transaction. It is assumed that this rollup is in the early to middle period of adoption, with an average of 5 TPS. We can calculate the gas between each transaction and batch:
If we enter a world with a large number of customized verifications and specific application environments, many of them will have TPSs far below 5. Therefore, it is important to confirm the tradeoff between time and cost. In fact, the "L3" paradigm does solve this problem! The ZK roll up in ZK roll up has a fixed cost of about 8000 layer-1 gas (500 bytes for proof), even if it is a naive implementation. This will change the table above to:
The problem was basically solved. Is L3 good? Maybe. However, it is worth noting that inspired by ERC 4337 aggregation verification, there is a different way to solve this problem.
The strategy is as follows. Today, if each ZK rollup or validity receives a certificate
: The new status root must be the result of correctly processing transaction data or status increment on the old status root, and it will accept a status root. In this new scheme, ZK rollback will accept a message from the bulk verifier contract, which says that it has verified a batch of proof statements, and the form of each statement is. This batch proof can be built through recursive SNARK scheme or halo aggregation.
This will be an open protocol: any ZK rollup can be joined, and any batch verifier can prove from any compatible ZK rollup aggregation, and obtain compensation for transaction costs from the aggregator. The batcher contract validates the proof once and then passes the message to each rollback with
triple ； The fact that this triple comes from the batch program contract will prove that the conversion is valid.
If properly optimized, the cost of each rollback in this scheme may be close to 8000: 5000 for adding new update status writes, 1280 for old roots and new roots, and an additional 1720 for miscellaneous data processing. Therefore, it will give us the same savings. Starkware actually has something similar, called SHARP, although it is not (yet) an open agreement without a license.
One response to this approach may be: but is it not actually just another L3 solution? Not the base layer<- rollup <- Validium, you have a base layer<- Batch processing mechanism<- Rollup or validity. From the perspective of some philosophical architecture, this may be true. However, there is an important difference: the middle tier is not a complex and complete EVM system, but a simplified and highly specialized object. Therefore, it is more likely to be secure. It is more likely to be built on all these without the need for another special token, and it is more likely to be minimized by governance and will not change over time.
A three-tier extension architecture consisting of stacking the same extension schemes on top of itself usually does not work well. In the rollup above the rollup, two layers of rollups use the same technology, but of course not. However, a three-tier architecture with different purposes in the second and third tier is feasible. Validiums on rollups really make sense, even if they are not sure that they are the best way to do things in the long run.
However, once we begin to deeply understand the meaningful details of which architecture, we will enter into the philosophical question: what is "layer" and what is not? Foundation layer<- Batch processing mechanism<- Rollup or authentication mode and base layer<- rollup <- Rollup or authentication mode performs the same work. But in terms of its working mode, it is proved that the polymer layer looks more like ERC-4337 than rollback. Generally, ERC-4337 is not called "Layer2". Similarly, we will not call Tornado Cash "Layer2" - therefore, if we want to be consistent, we will not call the privacy centered subsystem on Layer2 "layer2." Therefore, there is an unresolved semantic debate about what should be called "layer" first.
There are many possible schools of thought in this regard. My personal preference is to limit the term "L2" to things with the following attributes:
Therefore, optimal rollback and ZK rollback are Layer 2 (L2), but validity, proof aggregation scheme, ERC 4337, online privacy system and Solidity are different things. It may be meaningful to call some of them Layer 3 (L3), but it may not be all of them; In any case, it seems too early to determine the definition. However, the architecture of a multi rollup ecosystem is far from static, and most discussions are only conducted in theory.
In other words, language debate is not as important as which structure is actually the most meaningful technical problem. Obviously, a certain "layer" serving non extensible requirements such as privacy can play an important role, and it is obviously necessary to fill in important functions that prove aggregation in some way, preferably through open protocols. At the same time, there are sufficient technical reasons to make it as simple as possible to connect the user oriented environment to the middle tier of tier 1; In many cases, as the "adhesive layer" of EVM rollup, it may not be the correct method. I suspect that the more complex (and simpler) structures described in this article will begin to play a greater role as the Layer 2 (L2) extended ecosystem matures.