Written by: PREDA; Translated by: ChainFeeds Research
Content and Purpose of this Article
The design of parallel execution models is complex both in the traditional database field and in blockchain technology. This is because, in the design process, multiple dimensions need to be considered comprehensively, and the choice of each dimension will have a profound impact on the overall performance and scalability of the system. This article will explore in depth the most representative parallel architectures of the blockchain execution layer and present in detail the experimental results we have conducted on these architectures in terms of performance and scalability.
From one dimension, the blockchain field has been in a continuous pursuit of high performance and high scalability of the chain. Even after the emergence of multi-chain systems and Layer2 systems, the execution capabilities of each smart contract are still limited to the capabilities of a single virtual machine VM. With the emergence of Parallel VM, this limitation has been broken. Parallel VM allows transactions of a single smart contract to be executed on multiple EVM/VMs simultaneously, thereby utilizing more CPU cores to improve performance.
We believe that among the many high-performance blockchain systems that support parallel VMs, Sei (V2), Aptos, Sui, Crystality, and PREDA are the most representative, and each system has unique design advantages.
At the beginning of this article, we presented the first set of experimental results. The figure below shows the absolute value of the number of transactions per second (TPS) of Sei, Aptos, Sui, Crystality, and PREDA when executing the same ERC20 smart contract on a 128-core machine. From this set of experimental results, the PREDA model has a significant advantage in the TPS and scalability comparison of the five parallel execution systems.
We will elaborate on other experimental data and analysis later.
Below, we will explain in detail the specific methods and operations in our experiment:
We first compare the TPS values, i.e., throughput, of the five systems. The same transaction volume was used in the TPS comparison experiments on different chains.
Considering the different programming languages and underlying virtual machines used in different systems, a single throughput comparison cannot fully explain the pros and cons of the system. We also compared the relative speedup results, namely the Speedup Ratio, which is the speedup effect of the same number of transactions executed on multiple VMs relative to one VM. In Sui, Aptos, Crystality, and PREDA, each thread is assigned a dedicated CPU core.
For all detailed experimental data, including absolute TPS values and speedup ratios, please refer to the full experimental report.
The following table shows the data sources, implementation process, and evaluation methods used in the experiment.
Overview of Parallel Execution Models
Both the Aptos and Sui projects are derived from the failed blockchain project Diem by Meta (formerly Facebook). Both projects were founded by former Meta engineers - Aptos by Avery Ching and Sui by Sam Blackshear. The two subsequently followed different technical paths, with Aptos strictly following the original Move programming language developed for Diem, but Sui made extensive modifications to Move.
Next, we will explore the differences in the parallelization models of Aptos and Sui, analyze how their different approaches affect performance, and highlight the advantages of each.
Aptos: High-Performance Layer 1 with Optimistic Parallelism
Aptos is a Layer 1 that achieves high performance by implementing parallel execution of smart contracts through an optimistic parallelization mechanism. Specifically, in optimistic parallelization, transactions are initially assumed to be stateless and executed in parallel. After execution, the system checks for conflicts and resolves them by rolling back and serializing the execution or re-executing the conflicting transactions through different scheduling. This speculative execution method assumes that most transactions will not conflict, thereby maximizing the benefits of parallel execution while providing a backup mechanism for handling conflicts.
Advantages of optimistic parallelism: (1) No need to modify the program: It can be easily implemented without making changes to existing code. (2) Efficiency in scenarios with a low to medium percentage of conflicts: By allowing many transactions to proceed concurrently and handling conflicts when they arise, throughput is maximized. In many real-world scenarios, conflicts are relatively rare.
Aptos uses the MOVE programming language for smart contract development and the Aptos MOVE virtual machine in system implementation.
Sui: High-performance Layer 1 with pessimistic parallelism
Sui adopts a pessimistic parallelization strategy. In pessimistic parallelism, the system pre-checks whether transactions may cause resource contention before execution. The programmer needs to specify the resources (i.e., state) that each transaction needs to access. The system pre-checks each received transaction to detect potential conflicts. Only transactions that do not involve resource contention with currently executing transactions are sent to the execution engine for parallel execution.
Benefits of pessimistic parallelization: (1) Avoiding rollbacks: By identifying and avoiding conflicts before execution, this approach minimizes the need for rollbacks and re-executions, resulting in more predictable performance. (2) Efficiency in high-conflict scenarios: Very effective in high-contention environments, ensuring that only non-conflicting transactions are executed in parallel, reducing the overhead of conflict resolution.
Sui also uses the MOVE programming language, but has its own Sui MOVE extension and uses the Sui MOVE virtual machine in the system implementation.
Sei: Optimistic parallelization compatible with Solidity and EVM
When Sei first launched its public chain, it was positioned as a transactional application chain built on the Cosmos SDK, and has now been upgraded to the first parallelized EVM chain. In terms of parallel execution, Sei adopts an approach similar to the Aptos model, which we call optimistic parallelism.
The optimistic parallelism adopted by Sei (V2) is different in that it uses the Solidity programming language and the standard Ethereum Virtual Machine (EVM), ensuring EVM and Solidity compatibility.
Crystality and PREDA: Parallel Relay-Execution Architecture
Both Crystality and PREDA support the Parallel Relay-Execution Distributed Architecture. PREDA is specifically designed for parallelizing general smart contracts in multi-EVM blockchain architectures. The relationship between the two is that Crystality is a programming language for parallel EVM/GPU, based on the PREDA model. From a system perspective, PREDA makes it possible for the first time in the blockchain field to fully parallelize contract functions, thereby maximizing the concurrency of a set of transactions. This ensures efficient utilization of all EVM instances, thereby achieving the best performance and scalability under certain hardware configuration conditions.
Different from the sequential execution of Solidity and Move and the architectural design of Shared Everything, the PREDA model adopts the Shared Nothing architecture for the first time to break the state dependency in parallel execution and ensure that different EVM instances never access the same piece of contract state, thus almost completely avoiding write conflicts.
In PREDA, contract functions are decomposed into multiple ordered steps, each of which depends on a parallelizable and conflict-free part of the state. The transaction initiated by the user is first sent to an EVM holding the state of the user's address. During the transaction execution, the execution flow can be switched from one EVM holding the contract state required for the current management to another EVM by issuing a relay transaction, which achieves data immobility, while the execution flow moves between EVMs according to data dependencies.
Experimental data of five representative contracts
In our evaluation, we tested five widely used smart contracts - ETH TokenTransfer, Voting, Airdrop, CryptoKitties and MillionPixel, and MyToken (ERC20). These contracts are executed on various blockchain systems including Sei, Aptos, Sui, Crystality, and PREDA. We conducted detailed experiments to compare the performance of different parallel execution systems, focusing on transactions per second (TPS) and speedup, which measure the relative performance improvement when executing on multiple virtual machines versus a single virtual machine on each system.
All detailed experimental data, including absolute TPS values and speedup, can be found in the full experimental report.
ETH TokenTransfer Contract: The experiment uses the same actual historical ETH transactions as a standard ERC20 smart contract.
Voting Contract: The Voting Contract is an excellent example of how the PREDA model simplifies parallel voting algorithms. It leverages the data splitting, relaying, and execution mechanisms of Crystality and PREDA, outperforming both optimistic (Aptos) and pessimistic (Sui) parallelization methods in both absolute TPS and speedup. The original sequential algorithm in Solidity now allows parallel voting across virtual machines and aggregates the results from temporary arrays.
AirDrop: This contract triggers multiple token or NFT transfers from one address to multiple addresses. It has a one-to-many state change mode. In this case, two transactions in Sei, Aptos, or Sui cannot be executed in parallel. Only through the PREDA model with higher parallel granularity can these transactions be processed in parallel in pipeline mode.
CryptoKitties: This contract is a popular game contract on Ethereum that involves breeding offspring cats based on the genes of parent cats. Unlike the aforementioned contracts, this contract needs to access multiple address states, including "father cat", "mother cat", and "newborn cat" when processing user-initiated transactions. This contract also involves more complex calculations than the previous contract when calculating the genes of a newborn cat from the genes of its parents.
MillionPixel: In this game contract on Ethereum, users compete to mark coordinates on a map. This smart contract is used to demonstrate the flexibility of the PREDA model. In addition to partitioning the contract state by address, programmers can also customize the partition key, such as switching from address type to uint32 type in this case.
In order to help readers understand the above large amount of data, the following focuses on the analysis of two particularly representative contracts.
ETH Token Transfer Contract: When replaying ETH historical transaction data, the absolute throughput and scalability ratios of the five systems decreased compared to the ERC20 experiment. This is because duplicate addresses in historical transactions cause state contention (read-write conflicts or write-write conflicts), which hinders the concurrent execution of these transactions in parallel EVMs.
Voting Contract: Sei contracts can almost only be executed sequentially, and there is no speed improvement when running multiple EVMs. Similar results will occur in other systems if the algorithm is not transformed into a parallel algorithm. For the parallel implementation of Aptos and Sui, multiple resources must be initialized at different addresses for the temporary results of the "proposal" variable. In addition, the parallel implementation must also provide manual scheduling based on the voter's address, directing the voter's transactions to different virtual machines, and accessing temporary results for parallel execution.
Insights from the Experimental Results
We learned the following from the experimental results:
Comparison of optimistic and pessimistic parallel approaches
Aptos and Sui each have their best performance in different specific scenarios. In the ERC20 transfer case, Aptos outperforms Sui because ERC20 transfers use randomly generated addresses in each transaction, resulting in very few conflicts. In contrast, in the ETH test case, Sui outperforms Aptos due to the large number of conflicts caused by replaying ETH historical transactions.
Time analysis of Aptos execution
The following table shows the performance analysis data of Aptos when running these two contracts (using the same smart contract, but the transaction data is randomly generated or historical transaction data). Because performance analysis is very time-consuming, the number of parallel virtual machines used in the test is limited to 64.
Task management time involves two parts: Locking and waiting. Comparing the two charts, it can be seen that the proportion of task management time in the voting test to the entire execution time is significantly larger than that in the ETH Token transfer test. This is because in the Voting test, access to shared objects requires locking and waiting to avoid conflicts, making the task management time 2 to 4 orders of magnitude longer than the function execution time and queuing time. In contrast, in the ETH Token transfer test, since only owned objects were used, concurrency control was bypassed, and the task management time was much less.
Limitations of Aptos and Sui
In summary, Aptos uses optimistic parallelization, which allows parallel transaction execution even in the presence of conflicts. This approach based on optimistic concurrency control (OCC) is very effective for read-dominant workloads, which is more common in databases and big data systems where write requests are scarce. However, in blockchain systems, this approach may incur huge gas overhead due to the gas fees involved in on-chain execution. In practice, users usually send read-only requests (such as historical transactions or block queries) to off-chain databases like Etherscan, while write requests are used for on-chain execution. In this case, OCC systems like Aptos will frequently encounter transaction "Suspend" and hangs, thereby reducing the overall performance of parallel virtual machines.
In contrast, Sui adopts pessimistic parallelization, strictly verifies the state dependencies between transactions, and prevents conflicts during execution through a locking mechanism. This approach based on pessimistic concurrency control (PCC) is more suitable for computationally intensive workloads, in which the PCC-related overhead is even negligible. But in logically simple operations, PCC-related overhead can easily become a performance bottleneck. In the real world, many transactions performed on blockchain systems, such as ERC20 Token transfers, Move Token transfers, or NFT transfers, involve relatively simple operations. Specifically, ERC20 token transfers typically involve subtracting a certain amount from one address and adding it to another. Similarly, Move Token transfers or NFT transfers involve moving a resource or object from one address to another. Even if additional checks such as ownership verification are taken into account, these operations are very fast. At this point, the PCC-related overhead becomes a limiting factor in the performance of the parallel system.
To address these challenges, PREDA proposes a system that almost completely avoids PCC overhead and the need for OCC re-execution. This approach achieves almost conflict-free parallel execution by efficiently splitting the on-chain state.
Performance of Crystality and PREDA
In all contract tests, Crystality and PREDA significantly outperformed Sei, Aptos, and Sui, with PREDA being particularly outstanding because it executes in native binary mode rather than WASM. This high performance is due to almost conflict-free parallel execution. PREDA was designed with the following two key aspects in mind:
Define different contract state ranges, and the system will split and maintain the state based on this range.
To switch the execution flow of a transaction from one virtual machine to another.
The core of PREDA is the introduction of Programmable Contract Scopes, which splits the contract state into non-overlapping, parallelizable fine-grained fragments; and introduces Asynchronous Functional Relay, which is used to describe the execution flow switching between different EVMs.
Let's further explain the meaning of these concepts. In PREDA, a contract function is decomposed into multiple ordered steps, each of which depends on a single, parallelizable state fragment and does not conflict.
For example: Usually, a token transfer involves two steps: one is the extraction step, which is to access the state of the Sender and extract a specified number of tokens, and the other is the deposit step, which is to access the state of the Recipient and deposit the corresponding number of tokens. The latest parallel mechanisms implemented by Sei, Aptos, and Sui attempt to execute all steps in each transaction synchronously. If the access state between two transactions is shared or updated, such as when the Sender or Recipient is the same, the two transactions will not be able to execute in parallel.
However, PREDA adopts a splittable and asynchronous mechanism, in which the individual steps of the transaction are decomposed according to their data access dependencies, enabling each step to be executed asynchronously independently of other steps. Access to the same state is strictly serialized in the order determined in the original transaction block and guaranteed by the consensus algorithm, that is, sorted by the block creator.
For example, Token transfer transactions Txn 0 (transferring tokens from address state A to state B) and Txn 1 (transferring from state A to state C) can access A twice in sequence (for Txn 0 and Txn 1 respectively), and then access B and C in parallel.
Aptos, Sei and PREDA
Limitations of PREDA and Crystality
Although PREDA and Crystality can empower blockchain systems and provide significant performance advantages, their limitations are also reflected in the following aspects.
Unbalanced workload between parallel EVMs
Crystality's data splitting and execution flow redirection mechanism may cause load imbalance problems in parallel EVMs during runtime. We observed this problem when replaying historical ETH Token transfer transactions with the MyToken contract.
To evaluate the load distribution, we counted the number of transactions executed on each EVM, including original transactions and relay transactions, and then calculated the range and standard deviation of these numbers. The results show that the range of the number of transactions executed on 64 EVMs is comparable to the range on 2 EVMs, which means that there is a hot spot problem on some EVM addresses (i.e., historical transactions occur on a subset of addresses). Further investigation of the ETH dataset found that each hotspot address was involved in more than 4,000 transactions. It must be pointed out here that, as far as we know, Aptos and Sui are also unable to parallelize execution in this case.
Our test data shows that the standard deviation decreases as the number of EVMs increases, which means that adding more EVMs helps alleviate the load imbalance problem.
To solve the hotspot problem on the blockchain, a feasible solution is to use multiple addresses instead of a single address to send or receive tokens. If the load imbalance is caused by several non-hotspot addresses mapping to the same virtual machine, existing methods in sharding blockchains, such as data migration, may help.
Program Rewriting
Another significant limitation of PREDA and Crystality is that developers need to rewrite smart contracts using directives. If there is a tool that can automatically translate existing smart contracts written in Solidity, Move, or Rust into equivalent Crystality smart contracts, it will greatly optimize the developer experience. Based on previous experience, it is not difficult to achieve. Some studies have explored translation between different languages, such as from Solidity to Move and from Python to Solidity.
Technical advances in natural language processing have greatly enhanced the potential of automatic code generation. These advances combined with rule- and pattern-based compiler translation techniques (such as SQL to MapReduce translation for big data and computational graph to matrix calculation translation for deep learning) can fully provide support for the development of automated smart contract translation tools.
Conclusion
The performance comparison between Sei, Aptos, Sui and Crystality/PREDA highlights the continuous evolution of the blockchain parallelization field. Aptos (with Sei) and Sui respectively demonstrate the potential of optimistic and pessimistic parallelization mechanisms, each showing advantages in different scenarios. However, the significant performance improvements of Crystality and PREDA suggest that more advanced parallelization models may be the key to unlocking higher levels of scalability and efficiency.
To summarize our exploration and observations of the three main parallelization methods in the blockchain field, we have compiled a table. If you want to get a Takeaway from this post, here’s what’s in this form.
Preview
Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG