From Web2 to Web3: New Problems Brought by Data Scale

2024/07/30 17:35

Author: Kerman Kohli Source: substack Translation: Shan Ouba, Golden Finance

It's 2024, and you would think that getting encrypted data would be easy because with Etherscan, Dune, and Nansen, you can view the data you want at any time. On the surface, it does seem like that.

Scale

You see, in the normal web2 world, when your company has 10 employees and 100,000 customers, the amount of data you generate may not exceed 100 GB (at the upper hand). This data scale is small enough that your iPhone can handle any of your questions and store everything. However, once you have 1,000 employees and 100,000,000 customers, the amount of data you process may now be hundreds of TB, or even PB.

This is fundamentally a completely different challenge because the scale you are dealing with requires more consideration. To process hundreds of TB of data, you need a distributed computer cluster to send the job. When sending these jobs, you have to consider:

What happens if a worker fails to perform their duties
What happens if one worker takes much longer than the others
How do you determine which worker to assign which job
How do you merge all the results together and ensure the calculations are correct

These are all things to consider when dealing with big data calculations across multiple machines. Scale creates problems that are invisible to those who don't work with it. Data is one of those areas where the larger the scale, the more infrastructure you need to manage it correctly. To most people, these problems are invisible. To handle this scale, you face other challenges:

Extremely specialized talent that knows how to operate machines at this scale
The cost of storing and computing all that data
Forward planning and architecture to ensure your needs can be supported

Interestingly, in web2, everyone wanted data to be public. In web3, it’s finally public, but few people know how to do the necessary work to understand it. The deceptive fact is that with some help, you can get your dataset out of the global dataset fairly easily, meaning that “local” data is easy, but “global” data is hard to get (the stuff that’s about everyone and everything).

Fragmentation

As if things weren’t already challenging because of the scale you have to deal with. Now there’s a new dimension that makes crypto data challenging, and that’s the constant fragmentation of crypto data due to the economic incentives of the market. For example:

The rise of new blockchains. There are nearly 50 L2s live, 50 known to be coming soon, and hundreds more in the pipeline. Each L2 is effectively a new database source that needs to be indexed and configured. Hopefully they are standardized, but you can’t always be sure!
The rise of new kinds of VMs. The EVM is just one area. SVM, Move VM, and countless others are coming to market. Each new kind of VM means a whole new data scheme that must be considered from a fundamental and deeply understood perspective. How many VMs are there? Investors are incentivizing new things with billions of dollars!
The rise of new account primitives. Smart contract wallets, custodial wallets, account abstractions introduce new complexities to how you actually interpret data. The sender address may not actually be the real user because it was submitted by a relay, and the real user may be somewhere in the mix (if you look closely).

Fragmentation can be particularly challenging because you can’t quantify what you don’t know. You’ll never know all the L2s in existence in the world and how many VMs there will be in total. Once they get to enough scale, you’ll be able to keep up, but that’s another story.

Open, but not interoperable

The last problem I think surprises a lot of people is that the data is open, but not easily interoperable. You see, all the smart contracts that teams have pieced together are like small databases within a large database. I like to think of them as schemas. All the data is out there, but the team developing the smart contract generally knows how to piece it together. You can spend time trying to understand it yourself if you want, but you’d have to do this hundreds of times for all the potential schemas — and how can you do that without spending a fortune without having a buyer on the other side of the transaction?

If this seems too abstract, let me give you an example. You say “How often does this user use a bridge?”. While this seems like one question, there are many questions nested within it. Let’s break it down:

First, you need to know all the bridges that exist. And the chains you care about. If it’s all chains, then we’ve already mentioned above why this is challenging.
Then, for each bridge, you need to understand how their smart contracts work
Once you understand all the permutations, you now need to reason through a model that can unify all of these individual patterns

Each of the above challenges is difficult to solve and requires a lot of resources.

The Result

So where does all this lead to? Well, the state of our ecosystem today is…

No one in the ecosystem really knows what is really going on. There are only notions of activity that are difficult to properly quantify.
User numbers are inflated and Sybil attacks are difficult to detect. Metrics start to become irrelevant and untrustworthy! True or false doesn’t even matter to market participants because they all look the same.
The main problem with making on-chain identity real. Accurate data is essential if you want to have a strong sense of identity, otherwise your identity will be misrepresented!

I hope this article helped you understand the realities of the crypto data landscape.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

More news about justin sun liquidation

Aug 05
Justin Sun refutes liquidation rumors after $280 million loss from Ethereum crash
Bullish
Bearish
Aug 05
Tron Founder Justin Sun Dismisses Liquidation Rumors As Crypto Markets Plunge
Bullish
Bearish
Aug 05
Justin Sun Liquidation Rumors Debunked Amid The Sudden Fall Of ETH Price
Bullish
Bearish
Aug 05
Justin Sun Unveils $1 Billion Fund to Counteract FUD and Ethereum Liquidation Rumors
Bullish
Bearish
Aug 05
Justin Sun Denies Liquidation Rumors amid Crypto Market Downturn
Bullish
Bearish
Aug 05
Justin Sun Calls Liquidation Rumors “False”, Slams Leveraged Trading Strategy
Bullish
Bearish
May 11
Justin Sun Acquires 3.62 Million EIGEN Tokens
Bullish
Bearish
Oct 10
Justin Sun's address pledged 5,000 ETH to Lido
Bullish
Bearish
Nov 30
Justin Sun: Reviewing FTX Ventures’ portfolio
Bullish
Bearish1
Oct 14
Justin Sun: Owning "tens of millions" of HT
Bullish
Bearish

From Web2 to Web3: New Problems Brought by Data Scale

Scale

Fragmentation

Open, but not interoperable

The Result

More news about justin sun liquidation

More news about justin sun liquidation

Tron Founder Justin Sun Dispels Leverage Trading Liquidation Rumours

Justin Sun’s Realm Under Siege: USDD Depreciation Strikes Again!

SEC Alleges Justin Sun of Unregistered Securities Sales

HTX Services Fully Restored Following DDoS Attack, Justin Sun Confirms

Justin Sun: Blockchain patient

Circle Refutes Purported Connections With Justin Sun and Hamas

Justin Sun calls for unified crypto regulation

FTX warns against Justin Sun-backed FUD tokens

Justin Sun and Brad Garlinghouse Eye FTX Assets

Why Justin Sun Was Blocked From This Ethereum DeFi Protocol