Author: Vitalik Buterin, @vitalik.eth; Compiled by: Songxue, Golden Finance
Incentive agreement One strategy for better decentralization is to punish correlation. That is, if one participant misbehaves (including accidentally), the more other participants who misbehave at the same time as them (as measured by total ETH), the greater the penalty they will receive. The theory goes that if you are a large player, any mistakes you make are more likely to be replicated across all the "identities" you control, even if you spread your tokens out into many nominally separate accounts.
This technology is already used in Ethereum’s slashing (and arguably inactivity leakage) mechanism. However, edge-case incentives that arise only in very specific attack scenarios may never materialize in practice and may not be sufficient incentives Decentralization.
This article suggests extending similar anticorrelated incentives to more "trivial" failure cases, e.g. Miss a proof, almost all validators do this at least occasionally. The theory is that larger stakers, including wealthy individuals and staking pools, will run many validators on the same internet connection or even the same physical computer, which will cause a disproportionate number of related failures . Such stakers could always have separate physical setups for each node, but if they end up doing that, it means we have completely eliminated the economies of scale from staking.
Sanity check: Are bugs from different validators in the same "cluster" actually more likely to be correlated?
We can check this by combining two data sets: (i) attestation data for some recent epochs, showing which validators should have attested during each time slot, and which validators actually did proof, and (ii) map validator IDs to publicly available data containing clusters of many validators (e.g. “Lido”, “Coinbase”, “Vitalik Buterin”). You can check here, here and here A dump of the former was found here orer's dump.
We then run a script to count the total number of co-failures: instances of two validators in the same cluster assigned to attest in the same time slot fail in that time slot.
We also calculate expected Common failures:The number of common failures that "should occur" if the failure were entirely the result of random chance.
For example, suppose there are 10 validators with a cluster size of 4, the other clusters are independent, and 3 validators fail: two within the cluster and one outside the cluster.
There is a common theme here Failure: 2nd and 4th validator in first cluster. If all four validators in the cluster fail, there will be six common failures, one for every six possible pairs.
But how much common failure “should” there be? This is a thorny philosophical question. A few ways to answer:
For each failure, assume that the number of common failures is equal to the failure rate of other validators in that slot multiplied by the number of validators in the cluster, and halve it to Compensate for double counting. For the example above, 2/3 is given.
Calculate the global failure rate, square it, then multiply it by [n*(n-1)]/2 for each cluster. This is given by [(3/10)^2]*6=0.54
Randomly redistribute each validator's failures throughout its history.
Every method is not perfect. The first two methods fail to account for different clusters with different quality settings. At the same time, the last approach fails to take into account the correlations arising from different slots with different inherent difficulties: for example, slot 8103681 has a large number of proofs that are not included in a single slot, probably because the block has Unusually late for release.
See "10216 ssfumbles" in this python output.
I ended up implementing three approaches: the first two approaches above, and a more complex approach where I compared "actual co-failure" with "false co-failure": Each cluster member is replaced with a (pseudo)random validator that fails with a similar failure rate.
I also made a clear distinction between Mistakes and Missed. I define these terms as follows:
Error: When the validator Missed the proof in the current period, but correctly proved it in the previous period;
Missed: When the validator missed a proof in the current epoch and also missed a proof in the previous epoch.
The goal is to distinguish between two distinct phenomena: (i) network failure during normal operation, and (ii) offline or long-term failure.
I also performed this analysis on two data sets simultaneously: maximum deadlines and single-slot deadlines. The first dataset will treat the validator as failing for an epoch only if it contains no proofs at all. The second data set treats the validator as failed if the proof is not contained in a single slot.
Here are my results for the first two methods of calculating expected common failures. Here SSfumbles and SSmisses refer to fumbles and misses using a single-slot data set.
For the first type Methods, actual lines differ because a more restricted dataset is used to improve efficiency:
The "Expected" and "Fake cluster" columns show how many common faults "should" be within the cluster if the clusters are not related, based on the above technique. The "Actual" column shows how many common faults there actually are. Consistently, we see strong evidence of “excessive correlated failures” within a cluster: two validators in the same cluster are significantly more likely to miss attestations at the same time than two validators in different clusters.
How do we apply this to penalty rules?
I came up with a simple argument: in each slot, let p be the current number of missed slots divided by the average of the last 32 slots.
This time slot proves The penalty should be proportional to p.
That is, The penalty for not proving a slot should be the same as the other closest slots Proportional to the number of failed validators in that slot.
A nice property of this mechanism is that it is not easily attacked: failure in any case reduces your penalty, and manipulating the average is enough Making an impact requires a lot of failure on your own.
Now, let's try to actually run it. The following are the total penalties for the four penalty schemes for large clusters, medium clusters, small clusters, and all validators (including non-clusters):
Basic: One point for each mistake (i.e. similar to status quo)
basic_ss:Same, but needs to include a single slot to count as a miss
Excess: Use the p calculated above to punish point p
extra_ss:Use the p calculated above to punish point p, requiring that single slot inclusions are not counted as misses
Use the "basic" solution, the larger solution is better than the smaller solution There is an advantage of about 1.4x (about 1.2x in the single-slot data set). Using the "extra" scenario, this value drops to about 1.3x (about 1.1x in the single-slot dataset). Through several other iterations, using slightly different data sets, the excess penalty scheme uniformly narrowed the advantage of the "big guys" over the "little guys."
What's going on?
The number of faults per slot is small: usually only a few dozen. That's much smaller than almost any "large stake." In fact, it’s less than the number of active validators in a single slot for large stakers (i.e. 1/32 of their total stock). If a large staker runs many nodes on the same physical computer or internet connection, any failure could affect all of its validators.
This means: When a large validator fails to prove inclusion, they will single-handedly change the current slot. failure rate, which in turn increases their penalties. Small validators don't do this.
In principle, large shareholders could circumvent this penalty scheme by putting each validator on a separate Internet connection. But this comes at the expense of the economies of scale of being able to reuse the same physical infrastructure among large stakeholders.
Further analysis
Find other strategies to Confirming the magnitude of this effect, validators in the same cluster are likely to fail attestation at the same time.
Try to find the ideal (but still simple so as not to overfit and be unexploitable) reward/penalty scheme that minimizes the average advantage of large validators over small ones .
Try to prove the security properties of such incentive schemes, ideally identifying a "design space region" where the risk of strange attacks (e.g., strategically going offline at specific times to manipulate averages) is too costly and not worth it.
Clustering by geographic location. This could determine whether the mechanism also incentivizes geographical decentralization.
Clustering via (execution and beaconing) client software. This can determine whether the mechanism also incentivizes use by a small number of customers.