Interpretation of Arweave’s 17th edition white paper: Storing complete data copies is king

2024/04/01 12:55

Author: Gerry Wang @ Arweave Oasis, originally published on @ArweaveOasis Twitter

In Interpretation (III), we demonstrated the feasibility of #SPoRes through mathematical derivation. Bob and Alice in the article participated in this proof game together. In #Arweave mining, the protocol deploys a modified version of this SPoRes game. During mining, the protocol plays the role of Bob, and all miners in the network play the role of Alice together. Each valid proof of the SPoRes game is used to create the next block of Arweave. Specifically, the generation of Arweave blocks is related to the following parameters:

Where:

BI = Block Index of the Arweave network;

800*n_p = Each checkpoint unlocks a maximum of 800 hashes per partition, n_p is the number of partitions of 3.6 TB stored by the miner, and the multiplication of the two is the maximum number of hash operations that the miner can try per second.

d = The difficulty of the network.

A successful and valid proof is one that is greater than the difficulty value, and this difficulty value will be adjusted over time to ensure that an average of one block is mined every 120 seconds. If the time difference between block i and block (i+10) is t, then the adjustment from the old difficulty d_i to the new difficulty d_{i+10} is calculated as follows:

Where:

Formula Notes:From the above two formulas, it can be seen that the adjustment of network difficulty mainly depends on the parameter r, and r means the offset parameter of the actual time required for block generation relative to the standard time of 120 seconds per block expected by the system.

The newly calculated difficulty determines the probability of successful mining of a block based on each generated SPoA proof, as follows:

Formula Notes:After the above derivation, it can be obtained that the probability of successful mining under the new difficulty is the probability of success under the old difficulty multiplied by the parameter r.

Similarly, the difficulty of the VDF will be recalculated in order to keep the checkpoint cycle from occurring once per second in time.

Incentive mechanism for complete copies

Arweave generates each block through the SPoRes mechanism based on the following assumption:

Under incentives, both individual miners and group cooperative miners will use the maintenance of complete data copies as the best strategy for mining.

In the SPoRes game introduced earlier, the number of SPoA hashes released by two copies of the same part of the data set is the same as the full copy of the entire data set, which leaves the possibility of speculative behavior for miners. So Arweave made some modifications to this mechanism when it was actually deployed. The protocol divides the number of SPoA challenges unlocked per second into two parts:

One part specifies a partition in the partition stored by the miner to release a certain number of SPoA challenges;
The other part randomly specifies a partition in all Arweave data partitions to release SPoA challenges. If the miner does not store a copy of this partition, the number of challenges in this part will be lost.

You may feel a little confused here. What is the relationship between SPoA and SPoRes? The consensus mechanism is SPoRes, but why is the challenge released SPoA? In fact, there is a subordinate relationship between them. SPoRes is the general name for this consensus mechanism, which includes a series of SPoA proof challenges that miners need to do.

To understand this, we will examine how the VDF described in the previous section is used to unlock the SPoA challenge.

The above code details how to unlock the backtracking range consisting of a certain number of SPoA in the storage partition through VDF (encrypted clock).

Approximately every second, the VDF hash chain will output a checkpoint (Check);
This checkpoint Check will be used together with the mining address (addr), partition index (index(p)), and original VDF seed (seed) to calculate a hash value H0 using the RandomX algorithm. The hash value is a 256-bit number;
C1 is the backtracking offset, which is the remainder obtained by dividing H0 by the partition size size(p). It will be the starting offset of the first backtracking range;
The 400 256 KB data blocks within a continuous 100 MB range starting from this starting offset are the first backtracking range SPoA challenge that has been unlocked.
C2 is the starting offset of the second backtracking range, which is the remainder of H0 divided by the sum of all partition sizes, and it also unlocks 400 SPoA challenges for the second backtracking range.
The constraint of these challenges is that the SPoA challenges in the second range need to have SPoA challenges in the corresponding positions of the first range.

Performance of each packed partition

The performance of each packed partition refers to the number of SPoA challenges generated by each partition at each VDF checkpoint. When miners store Unique Replicas of a partition, the number of SPoA challenges will be greater than the number when miners store multiple backup Copies of the same data.

The concept of "unique copy" here is very different from the concept of "backup". For details, you can read the past article "Arweave 2.6 may be more in line with Satoshi Nakamoto's vision".

If the miner only stores the unique copy data of the partition, then each packaged partition will generate all the challenges of the first traceback range, and then generate the second traceback range that falls within the partition according to the number of stored partition copies. If there are m partitions in the entire Arweave weaving network, and the miner stores the only copy of n of them, then the performance of each packaged partition is:

When the partitions stored by the miner are backups of the same data, each packaged partition will still generate all the challenges of the first traceback range. But only in 1/m cases will the second backtracking range be within this partition. This imposes a significant performance penalty on this storage strategy behavior, with the ratio of the number of SPoA challenges generated being only:

Figure 1: Performance of a given partition improves as a miner (or a group of cooperating miners) finishes packaging their datasets.

The blue line in Figure 1 is the performance of storing the unique copy of a partition, perf_{unique}(n,m), which intuitively shows that when miners store only a few copies of a partition, the mining efficiency of each partition is only 50%. When all data set parts are stored and maintained, that is, n=m, the mining efficiency is maximized to 1.

Total Hash Rate

The total hash rate (see Figure 2) is given by the following equation, which is obtained by multiplying the value of each partition by n:

The above formula shows that as the size of the Weave network grows, if the unique copy of the data is not stored, the penalty function (Penalty Function) increases quadratically with the increase in the number of storage partitions.

Figure 2: Total mining hash rate of unique and backup datasets

Marginal partition efficiency

Based on this framework, we explore the decision problem faced by miners when adding new partitions, that is, whether to choose to copy a partition they already have, or to obtain new data from other miners and package it into a unique copy. When they already store unique copies of n partitions from the maximum possible m partitions, their mining hashrate is proportional:

So the additional benefit of adding a unique copy of a new partition is:

and the (smaller) benefit of copying an already packed partition is:

Dividing the first number by the second, we get the miner's relative marginal partition efficiency:

Figure 3: Miners are incentivized to build a complete replica (option 1) rather than making additional copies of the data they already have (option 2)

The rmpe value can be viewed as a penalty for miners to replicate existing partitions when adding new data. In this expression, we can treat m as approaching infinity and then consider the efficiency tradeoffs for different values of n:

The reward for completing a replica is highest when the miner has a nearly complete copy of the dataset. Because if n approaches m and m approaches infinity, the value of rmpe is 3. This means that, when close to a complete copy, finding new data is 3 times more efficient than repackaging existing data.
When miners store half of the Weave network, for example, when n= 1/2 m, rmpe is 2. This means that miners who find new data earn twice as much as those who copy existing data.

For lower values of n, the rmpe value tends to but is always greater than 1. This means that the benefits of storing a unique copy are always greater than the benefits of copying existing data.

As the network grows (m tends to infinity), miners' motivation to build a complete copy will increase. This promotes the creation of cooperative mining groups that jointly store a complete copy of at least one data set.

This article mainly introduces the details of the Arweave consensus protocol construction, of course, this is just the beginning of this core content. From the mechanism introduction and code, we can very intuitively understand the specific details of the protocol. I hope it can help everyone understand.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

More news about a5제곱+b5제곱

Apr 17
a16z、5つの新ファンドで72億ドル調達
Bullish
Bearish
Jan 04
Grayscale files GBTC Form 8-A12B with SEC
Bullish
Bearish
Dec 21
AI Training Dataset LAION-5B Withdrawn Over Discovery of Child Abuse Material
Bullish
Bearish
Dec 01
Milkomeda A1 (Algorand) validator will be shut down in late May next year
Bullish2
Bearish
Sep 22
Bitdeer plans to issue and sell up to $150 million of Class A common stock to B. Riley
Bullish
Bearish
Sep 20
Database company MotherDuck completed US$52.5 million in Series B financing, with participation from a16z and others
Bullish
Bearish
Aug 09
Bitdeer signs up to $150 million common stock purchase agreement with B. Riley
Bullish
Bearish
Jun 22
AI Web3 search engine Kaito raises $5.5M Series A led by Superscrypt and Spartan
Bullish
Bearish
Sep 20
Fraud detection platform Sardine completes $51.5 million in Series B funding, with investments from Visa, ConsenSys and a16z
Bullish
Bearish
Aug 23
Metaverse Avatar platform Ready Player Me completes $56 million in Series B financing, led by a16z
Bullish
Bearish

Interpretation of Arweave’s 17th edition white paper: Storing complete data copies is king

More news about a5제곱+b5제곱

More news about a5제곱+b5제곱

French Crypto Market Maker Flowdesk Skyrockets to $250M Valuation in Series B Triumph

Perplexity Skyrockets with $73M Series B: AI Search Engine Valued at $520M Leads the Charge

早报 - 5月5日2023

早报 - 5月5日2023

Microstrategy BTC Bet Loss Grows To $1.5b, Stocks Down After $6m BTC Buy

5 Must-Reads in the Evening | A16Z: 7 Basic Elements for Building the Metaverse

Optimism Saves Users $1B in Fees, Raises $150M in Series B Funding

Ethereum L2 Boba Network valued at $1.5B following Series A

Binance.US completes $200M seed round to hit $4.5B valuation

Ava Labs raises $350M at $5.25B valuation: Report

Interpretation of Arweave’s 17th edition white paper: Storing complete data copies is king

More news about a5제곱+b5제곱

More news about a5제곱+b5제곱

French Crypto Market Maker Flowdesk Skyrockets to $250M Valuation in Series B Triumph

Perplexity Skyrockets with $73M Series B: AI Search Engine Valued at $520M Leads the Charge

早报 - 5月5日2023

早报 - 5月5日2023

Microstrategy BTC Bet Loss Grows To $1.5b, Stocks Down After $6m BTC Buy

5 Must-Reads in the Evening | A16Z: 7 Basic Elements for Building the Metaverse

Optimism Saves Users $1B in Fees, Raises $150M in Series B Funding

Ethereum L2 Boba Network valued at $1.5B following Series A

Binance​.US completes $200M seed round to hit $4.5B valuation

Ava Labs raises $350M at $5.25B valuation: Report

Binance.US completes $200M seed round to hit $4.5B valuation