Author: Michael O’Rourke Source: cointelegraph Translation: Shan Ou Ba, Golden Finance
To realize the full potential of open data and enjoy low-cost large-scale language model (LLM) training, convenient research data sharing, and unstoppable DApp hosting, we must transition it from centralized infrastructure to decentralized architecture.
Currently, open data is a major driving force in the emerging global technology economy, with a market valuation of more than US$350 billion. However, many open data sources rely on centralized infrastructure, which runs counter to the autonomy and censorship resistance of Web3.
To unleash the full potential of open data, a shift to decentralized infrastructure is necessary. Once the open data ecosystem shifts to a decentralized and open architecture, multiple vulnerabilities of user applications will be resolved.
Decentralized infrastructure has a wide range of application scenarios, including:
• Hosting decentralized applications (DApps)
• Running trading bots
• Sharing research data
• LLM training and inference
Diving deeper into these use cases, we will find that compared with centralized infrastructure, decentralized architecture is more efficient and practical in utilizing open data.
LLM training and reasoning costs are lower
The release of open source AI DeepSeek, which once caused the US technology market to evaporate $1 trillion, fully demonstrated the power of open source protocols. This is a warning that we should pay attention to the new global economy centered on open data.
Currently, closed, centralized AI models are expensive to train, which also affects LLM's ability to train and generate high-precision results. For example, DeepSeek R1's final training cost was only about $5.5 million, in contrast, OpenAI's GPT-4 training cost more than $100 million. However, the emerging AI industry still relies on centralized infrastructure platforms (such as LLM API providers), which contradicts the concept of open source innovation.
In fact, hosting open source LLMs such as Llama 2 and DeepSeek R1 is both simple and cheap. Unlike stateful blockchains, which require constant synchronization, LLMs are stateless and only require periodic updates.
Although LLMs are relatively simple to run, the computational cost of performing inference on open source models is still high, as node operators require GPU computing power. But it is worth noting that these models do not need to be synchronized in real time, which can save a lot of costs in the long run.
The rise of general base models such as GPT-4 makes it possible to develop new products based on contextual reasoning. However, centralized companies like OpenAI will not allow any random network to access their trained models for inference. Instead, decentralized node operators can support the development of open source LLMs by acting as AI endpoints, providing deterministic data to customers. Decentralized networks lower the barrier to entry by empowering operators to launch gateways on the network. These decentralized infrastructure protocols process millions of requests on their permissionless networks by open-sourcing core gateway and service infrastructure. As a result, any entrepreneur or operator can deploy their gateway and enter the emerging market. For example, a team can leverage decentralized computing resources to train LLMs on Akash, a permissionless protocol that can provide customized computing services at a price that is 85% lower than centralized cloud service providers.
Currently, AI companies spend about $1 million per day on infrastructure maintenance to run LLM inference services. This means that the annual size (SAM) of the AI infrastructure market could reach about $365 million.
The data shows that market conditions are pointing to the huge growth potential of decentralized infrastructure, and the decentralized development of AI computing resources in the future will bring greater innovation space to the industry.
Accessible Research Data Sharing
In the field of scientific research, data sharing combined with machine learning and large language models (LLMs) has the potential to accelerate research progress and improve human life. However, data access is limited due to the high-cost academic journal system. These journals only selectively publish research approved by their committees, and most of them are hidden behind expensive subscription fees and are difficult to access widely.
With the rise of blockchain-based zero-knowledge (ZK) machine learning models, data can now be shared and computed in a trustless environment while preserving privacy without revealing sensitive information. As a result, researchers and scientists can share and access research data without de-anonymizing potentially restricted personally identifiable information.
To sustainably share open research data, researchers need a decentralized infrastructure that rewards them for data access and eliminates intermediaries. An incentivized open data network can ensure that scientific data remains accessible outside of expensive journals and private companies.
Unstoppable DApp Hosting
Centralized data hosting platforms such as Amazon Web Services (AWS), Google Cloud, and Microsoft Azure are very popular among application developers. While these platforms are easily accessible, centralized platforms present a single point of failure risk that affects reliability and can result in rare but reasonable service outages.
Technology history is littered with instances where Infrastructure as a Service (IaaS) platforms have failed to provide uninterrupted service. For example:
• In 2022, MetaMask temporarily denied access to users in certain regions due to Infura’s compliance with U.S. sanctions. Although MetaMask is decentralized, its default connections and endpoints rely on centralized Infura to access Ethereum.
• In 2020, Infura customers also experienced outages.
• Solana and Polygon’s centralized remote procedure call (RPC) services were overloaded during peak traffic, causing network congestion.
In a thriving open source ecosystem, it is difficult for a single company to meet the variety of developer needs. Currently, there are thousands of Layer 1 blockchains, Rollup solutions, indexing services, storage protocols, and other middleware protocols on the market, covering different niche use cases.
Most centralized platforms (such as RPC providers) continue to build the same infrastructure, which not only creates friction, but also slows growth and affects scalability as protocols focus on rebuilding the foundation rather than developing new features.
On the contrary, the success of decentralized social networking applications such as BlueSky and AT Protocol shows that user demand for decentralized protocols is growing. By abandoning centralized RPC and turning to open data access, these protocols remind us of the importance of building and adopting decentralized infrastructure.
For example, decentralized finance (DeFi) protocols can obtain on-chain price data from Chainlink without relying on centralized APIs for price information and real-time market data.
Currently, the Web3 market has about 100 billion serviceable RPC requests, and the cost per million requests is between $3 and $6. Therefore, the total addressable market size (TAM) of Web3 RPC is about $100 million to $200 million per year. As the new data availability layer steadily grows, RPC requests could exceed 1 trillion per day.
In order to keep up with the growth of open data transfer and enter the open source data market, the move to decentralized infrastructure is imperative.
Open Data Requires Decentralized Infrastructure
In the long term, we will see general purpose blockchain clients offload storage and networking functions to specialized middleware protocols.
For example, Solana pioneered the push for decentralized storage, first storing its data on chains like Arweave. As a result, Solana and Phantom once again became the primary tools for handling the transaction flow of Trump’s presidential campaign meme token, a significant moment in financial and cultural history.
In the future, we will see more and more data flowing through infrastructure protocols, which will make middleware platforms an important dependency at the protocol layer. As protocols become more modular and extensible, this will create space for open source, decentralized middleware to be integrated at the protocol layer.
It will become unfeasible in the future for centralized companies to act as intermediaries for light client header data. Decentralized infrastructure is trustless, distributed, cost-effective, and censorship-resistant.
Therefore, decentralized infrastructure will become the default choice for application developers and enterprises, promoting a mutually beneficial growth model.