Author: Nickqiao & Wuyue, Geekweb3
In April this year, Vitalik visited the Hong Kong Blockchain Summit and delivered a speech entitled "Reaching the Limits of Protocol Design", in which he once again mentioned the potential of ZK-SNARKs in the Ethereum Danksharding roadmap and looked forward to the great help of ASIC chips in ZK acceleration.
Previously, Scroll co-founder Zhang Ye also pointed out that ZK's application space in traditional fields may be larger than that in Web3. There is a huge demand for ZK in trusted computing, databases, verifiable hardware, content anti-counterfeiting and zkML. If ZK proves that real-time generation can be implemented, Web3 and traditional industries are expected to usher in a paradigm-level change, but from the perspective of efficiency and economic cost, it is still a long way to get ZK into large-scale adoption.
In fact, as early as 2022, top venture capital firms a16z and Paradigm publicly published reports, clearly expressing their emphasis on ZK hardware acceleration. Paradigm even asserted that in the future, ZK miners' income may be comparable to that of Bitcoin or Ethereum miners, and hardware acceleration solutions based on GPU, FPGA, and ASIC will have huge market space. Since then, with the popularity of mainstream ZK Rollups such as Scroll and Starknet, hardware acceleration has become a hot concept sought after by the market, and this popularity has become more intense as projects such as Cysic are about to go online.
We have reason to believe that based on the huge demand for ZK, the ZK mining pool and the SaaS model of real-time ZKP generation can open up a brand-new industrial chain. In this new world with great potential, ZK hardware manufacturers with strong support and first-mover advantage are likely to become the next generation of Bitmain and dominate the fertile ground for hardware acceleration.
In the field of hardware acceleration, Cysic may be one of the most watched teams. The team has won important awards from the well-known ZKP technology competition platform ZPrize and will serve as a mentor for ZPrize in 2023. The ToB ZK mining pool and ToC ZK-Depin hardware included in its roadmap have attracted the attention of top VCs such as Polychain, ABCDE, OKX Ventures and Hashkey, and completed a large amount of financing totaling nearly US$20 million.
With the upcoming launch of the Cysic testnet at the end of July and the imminent opening of its ZK mining pool, discussions about Cysic in major communities are becoming increasingly heated. This article aims to let more people understand the product principles and business model of Cysic, and to provide a simple popularization of the principles of ZK hardware acceleration. In the following, we will briefly summarize the relevant knowledge of Cysic to help more people lower the threshold of understanding.
Understanding the ZK Proof System from the Workflow
The ZK proof system is actually very complex, but if you want to have a simple understanding of its general structure, you can decompose it from the perspective of functions and workflows. For a system that ZK-izes ordinary computing, its core process is summarized as follows:
First, we need to interact with the ZK system through the front end and submit the content to be proved to it. The front end will convert the format of these contents to facilitate processing by the ZK proof system. After that, the system will generate ZK Proof through a specific proof system or framework (such as Halo2, Plonk, etc.). This process can be divided into the following steps:
1. Problem setting:First, we need to determine what the content to be proved is. For example, the prover Prover declares that he has/knows certain data, "I know a solution N of the equation F(x)=w", but he does not want people to see the value of N.
2. Arithmeticization and CSP:After the prover submits the content to be proved, the system will establish a special mathematical model/program to equivalently express the content to be proved, and then convert the format to facilitate processing by the proof system. Specifically, the aforementioned statement "I know a solution N to the equation F(x)=w" will be transformed from the original mathematical equation into the form of logic gate circuits and polynomials.
3. After that, the system will select a suitable proof system such as Halo, Plonk, etc., and compile the content generated in the previous steps into a usable ZKP program. The prover uses the ZKP program to generate a proof and submit it to the verifier for verification.
ZK systems such as zkEVM, which are frequently used in Ethereum's second layer, essentially compile smart contracts into the underlying opcodes of EVM, and then convert the format of each opcode into the form of logic gate circuits/polynomial constraints, and then hand it over to the back-end ZK proof system for further processing.
It is worth mentioning that the ZKP technical solutions currently widely used in blockchains are mainly zk-SNARK (zero-knowledge succinct non-interactive argument of knowledge), and most ZK Rollups use the simplicity of SNARK rather than zero-knowledge. Simplicity means that ZKP takes up very little space, and a large amount of content can be compressed to a few hundred bytes, and the verification cost is very low.
In this way, the workload between Prover and Verifier is asymmetric. The cost of Prover generating ZKP is very high, but the verification cost of Verifier is very low. As long as this asymmetry is utilized, ZK can be used in the scenario of "single Prover, multiple Verifiers" to concentrate the overall cost on the Prover side, greatly reducing the cost of Verifier. This model is extremely beneficial to decentralized verification, and this is the idea of Ethereum Layer 2.
However, this model of shifting the verification cost to the ZK generation end is not a silver bullet. For the ZK Rollup project, the high cost of generating ZKP will inevitably be shifted back to UX and handling fees, which is not conducive to the long-term development of ZK Rollup.
Although ZK has a great role to play in the scenarios of trustlessness and decentralized verification, due to the bottleneck of generation time, neither zkEVM nor zkVM nor ZK Rollup nor ZK Bridge currently have the economic basis for large-scale adoption.
In response to this, ZK acceleration projects represented by Cysic, Ingonyama, Irreducible, etc. have emerged, trying to reduce the generation cost of ZKP from different directions. In the following, we will briefly introduce the main costs and acceleration methods of ZKP generation from a technical perspective, and why Cysic has great potential in the ZK acceleration track.
Computational cost: MSM and NTT
Many people know that the time cost of ZKP's Prover to generate proofs is very high. In the ZK-SNARK protocol, there is often a situation where the Verifier only needs one second to verify the proof, but the proof generation may take the Prover half a day or even a day. In order to use ZKP proof calculations efficiently, it is necessary to convert the calculation format from classical programs to ZK-friendly.
There are currently two ways to do this: one is to use some proof system frameworks to write circuits, such as Halo2; the other is to use domain-specific languages (DSLs), such as Cairo or Circom, to convert the calculations into intermediate expressions for subsequent submission to the proof system. The proof system generates ZK proofs based on the written circuits or the intermediate expressions compiled by the DSL.
The more complex the program operation, the longer it takes to generate the proof. In addition, some operations are inherently unfriendly to ZK, and implementing them requires additional work. For example, SHA or Keccak hash functions are ZKP-unfriendly, and using these functions will result in longer proof generation time. Even operations that are very cheap to execute on classical computers may be ZKP-unfriendly.
Apart from the computational tasks that are not friendly to ZK, although the ZK proof generation process may vary depending on the selected proof system, its bottlenecks are essentially similar. In the generation of ZK proofs, there are two computational tasks that consume the most computing resources: MSM (Multi-Scalar Multiplication) and NTT (Number Theoretic Transform). These two computational tasks can account for 80-95% of the proof generation time, depending on the commitment scheme and specific implementation of the ZKP.
MSM mainly handles multi-scalar multiplication on elliptic curves, while NTT is FFT (Fast Fourier Transform) on finite fields, which is used to accelerate the processing of polynomial multiplication. Using different scheme combinations will result in different FFT/MSM load ratios.
Take Stark as an example. Its PCS (Polynomial Commitment Scheme) uses FRI, a hash-based commitment, instead of elliptic curves like KZG or IPA, so there is no MSM calculation at all. The higher the table is, the more FFT operations are required, and the lower the table is, the more MSM operations are required.
Optimization Solution
Since MSM operations involve predictable memory access, although they can be massively parallelized, they require a lot of memory resources. In addition, MSM also has scalability challenges. Even if it is parallelized, it may be slow. Therefore, although MSM may be accelerated on hardware, they require huge memory and parallel computing resources.
NTTs often involve random memory access, which makes them hardware-unfriendly and difficult to handle on distributed infrastructures. This is because NTTs have random access characteristics. If they run in a distributed environment, they will inevitably have to access data from other nodes. Once network interactions are involved, performance will be greatly reduced.
Therefore, access to stored data and data movement become a major bottleneck, limiting the ability to parallelize NTT operations. Most of the work to accelerate NTT is focused on managing how computation interacts with memory.
In fact, the simplest way to solve the MSM and NTT efficiency bottlenecks is to completely eliminate these operations. Some newly proposed algorithms, such as Hyperplonk, have modified Plonk to eliminate NTT operations. This makes Hyperplonk easier to accelerate, but introduces new bottlenecks; another example is the sumcheck protocol, which has a higher computational cost. There is also the STARK algorithm, which does not require MSM, but its FRI protocol introduces a large number of hash calculations.
ZK Hardware Acceleration and the Ultimate Goal of Cysic
While optimizations at the software and algorithm levels are important and valuable, there are clear limitations. In order to fully optimize the efficiency of ZKP generation, hardware acceleration must be used, just as ASICs and GPUs eventually dominated the BTC and ETH mining markets.
Then the question is: What is the best hardware to accelerate ZKP generation? Currently, there are a variety of hardware that can achieve ZK acceleration, such as GPU, FPGA or ASIC, of course, they each have their own advantages and disadvantages.
We can compare these hardware:
First, let's use a simple example to illustrate their differences at the development level. For example, now we want to implement a simple parallel multiplication:
On the GPU, using the API provided by the CUDA SDK, we can develop like writing native code, so as to obtain the ability of parallel computing;
On the FPGA, we need to relearn the hardware description language and use this language to control the connection at the hardware level to implement parallel algorithms;
On the ASIC, the connection arrangement of transistors is directly fixed at the hardware level during the chip design stage, and it cannot be modified afterwards.
These solutions have their own advantages and disadvantages, and are suitable for different development stages of the ZK track. Cysic is committed to becoming the ultimate solution for ZK hardware acceleration, and its step-by-step strategy is:
Develop SDK based on GPU to provide solutions for ZK applications and integrate GPU resources across the network;
Use the flexibility and balanced characteristics of FPGA to quickly implement customized ZK hardware acceleration.
Independently develop ASIC-based ZK Depin hardware
And Cysic Network will integrate all the computing power of ZK Depin and GPU as a SAAS platform/mining pool to provide computing power and verification solutions for the entire ZK industry
Let us fully understand the subdivision differences of ZK acceleration solutions and Cysic's development ideas by interpreting multiple subdivision tracks.
ZK Mining Pool and SaaS Platform: Cysic Network
In fact, both well-known ZK Rollups such as Scroll and Polygon zkEVM have clearly proposed the concept of "decentralized Prover" in their roadmaps, which is actually the construction of a ZK mining pool. This market-oriented approach can reduce the burden on ZK Rollup project parties and encourage miners and mining pool operators to continuously optimize the ZK acceleration solution.
In Cysic's roadmap, a ZK mining pool and SaaS platform plan called Cysic Network has been clearly proposed.It will not only integrate Cysic's own computing power, but will also absorb third-party computing power resources through mining incentives, including idle GPUs and zk DePIN devices in the hands of ordinary users.
The diagram of the entire verification workflow is as follows:
The zk project party submits the proof generation task to the agent, whose job is to forward the proof task to the verification network. These agents will be officially run by Cysic at the beginning, and asset pledge will be introduced later, so that anyone can become an agent;
Prover accepts proof tasks and generates ZK proofs using hardware. Provers need to pledge tokens to participate in the contracting of proof tasks, and will receive rewards after completing the proof tasks;
The Validator Committee is responsible for checking the validity of the proof generated by the Prover and voting. When a certain number of votes is reached, the proof will be considered valid. Validators join the committee by staking tokens, participate in voting and receive rewards. This process can be combined with the AVS concept of EigenLayer and reuse the existing Restaking facilities.
The detailed interaction process is as follows:
In fact, there is a point in the above process. Whether it is asset pledge or incentive distribution, as well as the submission of computing tasks, actions need to rely on a dedicated platform, which requires blockchain as a dedicated facility.
For this purpose, Cysic Network has built a dedicated public chain and adopted a unique consensus algorithm called Proof of Compute (PoC). Its basic principle is based on the historical performance of VRF function and Prover, such as the availability of equipment, the number of proof submissions, the correctness of Proof, etc., to select the block producer responsible for the block (Note: the block here should be used to record the information of each device and distribute Token incentives).
Of course, in addition to the ZK mining pool and SaaS platform, Cysic has made a lot of layouts on ZK acceleration solutions based on different hardware. Next, let's take a look at its achievements in the three routes of GPU, FPGA and ASIC.
GPU, FPGA and ASIC
The core of ZK hardware acceleration is to parallelize some key operations as much as possible. From the functional characteristics of hardware, in order to achieve maximum flexibility and versatility, a large part of the area of the CPU chip is used to provide control functions and caches at all levels, which leads to its weak parallel computing capabilities.
In GPUs, the proportion of chip area used for computing has been greatly increased, which enables it to support large-scale parallel processing. Now GPUs are very popular. For example, libraries such as Nvidia Cuda can help developers take advantage of the parallelism of GPUs without having to understand the underlying hardware. Through the CUDA SDK, the CUDA ZK library can be encapsulated to accelerate MSM and NTT operations.
FPGAs, on the other hand, consist of an array of a large number of small processing units. To program an FPGA, you need to use a specialized hardware description language and then compile it into a combination of transistor circuits. So FPGAs actually implement specific algorithms directly with transistor circuits without the need for compilation of an instruction system. This customization and flexibility are far superior to GPUs.
Currently, FPGAs are only about one-third the price of GPUs, and their energy efficiency can be more than ten times higher than that of GPUs. This significant energy efficiency advantage is partly due to the fact that GPUs need to be connected to host devices, which usually consume a lot of power. It can be said that FPGAs can add more computing modules to meet the needs of MSM and NTT without increasing energy consumption. This makes FPGAs particularly suitable for computationally intensive ZK proof scenarios that require high data throughput and low response time.
However, the biggest problem with FPGAs is that few developers have programming experience. For the ZK project, it is extremely difficult to organize a team with both cryptography expertise and FPGA engineering expertise.
ASIC is equivalent to using hardware to implement a program. Once the design is completed, the hardware cannot be changed. Accordingly, the program that ASIC can execute cannot be changed and can only be used for specific tasks. ASIC also has the hardware acceleration advantages of FPGA in MSM and NTT mentioned above. And because it is a dedicated circuit design, ASIC has the highest efficiency and the lowest energy consumption among all solutions.
For the current mainstream ZK Circuit, Cysic hopes to achieve a speed of 1-5 seconds in the proof time. To achieve this goal, only ASIC can achieve it.
Although these advantages sound very attractive, ZK technology is developing rapidly, and the design and production cycle of ASIC usually takes 1-2 years and costs up to 10-20 million US dollars. Therefore, it is necessary to wait until ZK technology is stable enough before it can be put into large-scale production to avoid the chips produced becoming obsolete quickly.
In this regard, Cysic has made full arrangements in the three fields of GPU, FPGA and ASIC;
At the GPU acceleration solution level, with the emergence of various new ZK proof systems, Cysic has adapted them based on its self-developed CUDA acceleration SDK, and by gathering community resources, it has linked hundreds of thousands of top computing graphics cards in Cysic's GPU computing network. At the same time, Cysic CUDA SDK is 50%-80% faster or even higher than the latest open source framework.
On FPGA, Cysic has completed the implementation of the world's fastest MSM, NTT, Poseidon Merkle tree and other modules through its self-developed solutions, covering the most important part of ZK computing, and the solution has been prototyped by multiple top ZK projects.
Cysic's self-developed SolarMSM can complete 2^30 MSM calculations in 0.195 seconds, and SolarNTT can complete 2^30 NTT calculations in 0.218 seconds, which is the highest performance among all publicly available FPGA hardware acceleration results.
In the ASIC field, although there is still some distance from the large-scale application of ZK ASIC, Cysic has already laid out this track in advance and launched its self-developed ZK DePIN chips and equipment.
In order to attract C-end users and meet the performance and cost requirements of different ZK project parties, Cysic will launch two ZK hardware products: ZK Air and ZK Pro.
ZK Air is similar in size to a power bank or laptop power supply. Ordinary users can directly connect it to a laptop, iPad or even a mobile phone through the Type-C interface to provide computing power support for specific ZK projects and receive rewards. At present, the computing power of ZK Air still exceeds that of consumer-grade graphics cards, which can accelerate small-scale ZK proof generation tasks.
ZK Pro is similar to a traditional mining machine, and its computing power reaches the effect of multiple top-level consumer-grade graphics cards interconnected GPU servers. It can greatly accelerate the generation of ZK proofs and is suitable for large-scale ZK projects such as ZK-Rollup and ZKML (Zero knowledge machine learning).
Through these two devices, Cysic will eventually build a stable and reliable ZK-DePIN network. Currently, these two devices are still under development and are expected to be launched in 2025.
In addition, through Cysic Network, C-end users can join the zk hardware acceleration market with a very low threshold. Coupled with the ZK project's huge demand for computing power, this may once again set off a wave of enthusiasm in the market like Bitcoin mining, and the market size of the ZK computing field may once again usher in explosive growth.
reference
https://medium.com/amber-group/need-for-speed-zero-knowledge-1e29d4a82fcd
https://figmentcapital.medium.com/accelerating-zero-knowledge-proofs-cfc806 de611b