TL;DR: Data Challenge: Block time competition on high-performance public chains has entered the sub-second era. High concurrency, high traffic volatility, and multi-chain heterogeneity on the consumer side have increased data complexity, requiring data infrastructure to shift to real-time incremental processing and dynamic scalability. Traditional batch ETL processing has latency ranging from minutes to hours, making it difficult to meet the demands of real-time transactions. Emerging solutions such as The Graph, Nansen, and Pangea introduce streaming computing, compressing latency to real-time tracking levels. A paradigm shift in data competition: The previous cycle focused on "understanding"; this cycle emphasizes "profitability." Under the Bonding Curve model, a one-minute delay can cost several times the original cost. Tool iteration: From manual slippage settings → bot sniping → the GMGN integrated terminal. The ability to bring transactions on-chain is becoming increasingly commoditized, and the core competitive frontier is shifting to data itself: whoever can capture signals faster will help users profit. The dimensional expansion of transaction data: Memes are essentially the financialization of attention, with the key being narrative, attention, and subsequent dissemination. The closed loop of off-chain public opinion and on-chain data: Narrative tracking and summarization, as well as sentiment quantification, have become the core of trading. "Underwater data": Capital flows, persona profiling, and smart money/KOL address labeling reveal the hidden game behind anonymous on-chain addresses. The next-generation trading terminal integrates multi-dimensional on- and off-chain signals at the second level, enhancing market entry and risk avoidance decisions. AI-driven actionable signals: From information to profits. Competitive goals in the new phase: Fast, automated, and capable of delivering excess returns. LLM + multimodal AI can automatically extract decision signals and combine them with copy trading and stop-loss/take-profit execution. Risks and challenges: Illusions, short signal lifespan, execution delays, and risk control. Balancing speed and accuracy: reinforcement learning and simulation backtesting are key. Data dashboard survival decisions: Lightweight data aggregation/dashboard applications lack a competitive advantage, and their niche market is shrinking. Downward: Deepen the integration of high-performance underlying pipelines and data research. Upward: Extend to the application layer, directly address user scenarios, and increase data access activity. Future landscape: Either become the infrastructure for Web3's water, electricity, and coal, or become a user platform for Crypto Bloomberg. The moat is shifting toward "actionable signals" and "underlying data capabilities." The closed loop of long-tail assets and transaction data presents a unique opportunity for crypto-native entrepreneurs. The window of opportunity for the next 2–3 years: Upstream infrastructure: Web2-level processing power + Web3-native requirements → Web3 Databricks/AWS. Downstream execution platform: AI agents + multi-dimensional data + seamless execution → Crypto Bloomberg Terminal. Thanks to Hubble AI, Space & Time, OKX DEX, and other projects for their support of this research report! Introduction: The Triple Resonance of Meme, High-Performance Public Chain, and AI In the previous cycle, the growth of on-chain transactions relied primarily on infrastructure iteration. Entering the new cycle, as infrastructure matures, super applications such as Pump.fun are becoming new growth engines for the crypto industry. This type of asset issuance model, with its unified issuance mechanism and sophisticated liquidity design, has created a fair and pristine trading environment where get-rich-quick stories are common. The replicability of this high-multiplier wealth effect is profoundly changing users' return expectations and trading habits. Users demand not only faster market entry but also the ability to access, parse, and execute multi-dimensional data in record time. Existing data infrastructure is struggling to meet this density and real-time demand. This brings with it demands for a more advanced trading environment: lower friction, faster confirmations, and deeper liquidity. Trading venues are rapidly migrating to high-performance public chains and Layer 2 Rollups, such as Solana and Base. The volume of transaction data on these public chains has increased by more than tenfold compared to Ethereum's previous iteration, posing even more severe data performance challenges for existing data providers. With the imminent launch of new-generation high-performance public chains like Monad and MegaETH, the demand for on-chain data processing and storage will grow exponentially. Meanwhile, the rapid maturity of AI is accelerating the achievement of intelligent equality. GPT-5's intelligence has reached doctoral levels, and large multimodal models like Gemini can easily interpret candlestick charts... With the help of AI tools, once complex trading signals can now be understood and executed by ordinary users. Under this trend, traders are beginning to rely on AI to make trading decisions, and AI trading decisions require multi-dimensional, highly effective data. AI is evolving from an "auxiliary analysis tool" to a "core trading decision-making platform," and its widespread adoption has further intensified the demand for real-time, interpretable, and scalable data processing. The triple resonance of the meme trading frenzy, the expansion of high-performance public chains, and the commercialization of AI has led to an increasingly urgent need for a new data infrastructure within the on-chain ecosystem. With the rise of high-performance public chains and high-performance rollups, the scale and speed of on-chain data have entered a new era. With the widespread adoption of high-concurrency and low-latency architectures, daily transaction volumes easily exceed 10 million, with raw data volumes measured in hundreds of GB. Solana, for example, has averaged over 1,200 TPS over the past 30 days, with daily transactions exceeding 100 million. On August 17th, it reached a record high of 107,664 TPS. According to statistics, Solana's ledger data is growing at a rate of 80-95 TB per year, translating to 210-260 GB per day. ▲ Chainspect, 30-day average TPS ▲ Chainspect, 30-day transaction volume Not only has throughput increased, but block times on emerging public chains have also reached millisecond levels. BNB Chain's Maxwell upgrade has reduced block times to 0.8s, while Base Chain's Flashblocks technology has reduced block times to 200ms. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation times to 150ms. The MegaETH mainnet is aiming for real-time block times of 10ms. These consensus and technological breakthroughs significantly improve transaction real-time performance, but they also place unprecedented demands on block data synchronization and decoding capabilities. However, downstream data infrastructure still largely relies on batch ETL pipelines, which inevitably introduce data latency. For example, on Dune, contract interaction event data on Solana is typically delayed by approximately 5 minutes, while protocol-level aggregated data can take up to an hour. This means that on-chain transactions that could be confirmed within 400ms are now delayed hundreds of times before being visible in analytical tools, making them unacceptable for real-time trading applications. ▲ Dune, Blockchain Freshness To address data supply-side challenges, some platforms have shifted to streaming and real-time architectures. The Graph leverages Substreams and Firehose to reduce data latency to near real-time. Nansen, by introducing stream processing technologies like ClickHouse, achieved a tenfold performance improvement in Smart Alerts and real-time dashboards. Pangea, by aggregating the compute, storage, and bandwidth provided by community nodes, provides real-time streaming data with sub-100ms latency to business-side providers such as market makers, quantitative analysts, and central limit order books (Clobs). ▲ Chainspect Besides the massive amount of data, on-chain transactions also exhibit significant traffic imbalance. Over the past year, Pumpfun's weekly transaction volume has varied nearly 30-fold from its lowest to its highest. In 2024, the meme trading platform GMGN experienced six server outages within four days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After the migration, the system's horizontal scalability and computing elasticity were greatly improved, business agility increased by approximately 30%, and the pressure during peak trading periods was significantly alleviated.

▲ Dune, Pumpfun Weekly Volume

▲ Odaily, TiDB's Web3 Service Case
The multi-chain ecosystem further exacerbates this complexity. The differences in log formats, event structures, and transaction fields across public chains require customized parsing logic for each new chain, placing a significant strain on the flexibility and scalability of data infrastructure. Some data providers are therefore adopting a "customer-first" strategy: prioritizing access to services from chains with active trading activity, balancing flexibility with scalability. If data processing remains stuck in fixed-interval batch ETL (Extract, Transform, and Load) amid the prevalence of high-performance blockchains, it will face latency backlogs, decoding bottlenecks, and query lags, failing to meet the demands of real-time, refined, and interactive data consumption. Therefore, on-chain data infrastructure must evolve to a streaming, incremental processing and real-time computation architecture, complemented by load balancing mechanisms to handle the concurrency pressures brought on by periodic trading peaks in the cryptocurrency market. This is not only a natural progression in technology development but also a critical step in ensuring the stability of real-time queries. It will also mark a true differentiator among the next generation of on-chain data platforms. Speed is Wealth: A Paradigm Shift in On-Chain Data Competition The core proposition of on-chain data has shifted from "visualization" to "actionability." In the last cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors for "understandable" data, and people used SQL charts to piece together on-chain narratives. GameFi and DeFi players rely on Dune to track capital inflows and outflows, calculate gold farming returns, and exit in time before market turning points. NFT players use Dune to analyze trading volume trends, whale holdings, and distribution patterns to predict market activity. However, in this cycle, meme players are the most active consumer group. They have driven the phenomenal app Pump.fun to cumulative revenue of $700 million, nearly double the total revenue of Opensea, the leading consumer app in the previous cycle. In the meme market, time sensitivity is magnified to the extreme. Speed is no longer a bonus, but a core variable that determines profitability. In the primary market, where pricing is driven by the Bonding Curve, speed is cost. Token prices rise exponentially with demand, and even a one-minute delay can significantly increase the cost of entry. According to Multicoin research, the most profitable players in the game often pay 10% slippage to enter the block three points ahead of their competitors. The wealth effect and the "get-rich-quick" myth drive players to chase second-by-second candlestick charts, same-block transaction execution engines, and one-stop decision-making dashboards, competing for the fastest information collection and order placement.

▲ Binance
In the manual trading era of Uniswap, users had to set their own slippage and gas, and prices were not visible on the front end. Trading was more like "buying a lottery ticket"; in the era of BananaGun sniper bot, automatic sniping and slippage technology allowed retail players to start from the same starting line as scientists; then in the PepeBoost era, the bot pushed pool opening information and front-row position data at the same time; and finally developed to the current GMGN era, creating a terminal that integrates K-line information, multi-dimensional data analysis and transaction execution, becoming the "Bloomberg Terminal" of meme trading. As trading tools continue to evolve and execution barriers gradually disappear, the frontier of competition inevitably shifts to the data itself: those who can capture signals faster and more accurately will establish a trading advantage in a rapidly changing market and help users profit. Dimensionality is advantage: The truth beyond the candlestick chart. The essence of Memecoin is the financialization of attention. High-quality narratives can continuously break through the market, attracting attention and driving up prices and market capitalization. For meme traders, real-time performance is certainly important, but achieving significant results depends on answering three crucial questions: what is the token's narrative, who is paying attention, and how can this attention continue to grow in the future? These merely leave shadows on the K-line chart; the true driving force relies on multi-dimensional data—off-chain sentiment, on-chain addresses and holdings, and the precise mapping of the two. On-chain × Off-chain: A Closed Loop from Attention to Transactions Users attract attention off-chain and complete transactions on-chain. This closed-loop data is becoming the core advantage of Meme transactions. Narrative Tracking and Dissemination Chain Identification On social platforms like Twitter, tools like XHunt help meme enthusiasts analyze a project's KOL followlist to identify the associated individuals and potential attention chains behind the project. 6551 DEX aggregates data from Twitter, official websites, tweet comments, offer history, and KOL followers to generate comprehensive AI-powered reports for traders that evolve in real-time with public opinion, helping them accurately capture narratives. Quantifying Sentiment Indicators: Infofi tools such as Kaito and Cookie.fun aggregate content and analyze public opinion on Crypto Twitter, providing quantifiable metrics for mindshare, sentiment, and influence. Cookie.fun, for example, overlays these two metrics directly onto price charts, transforming off-chain sentiment into readable "technical indicators." ▲ Cookie.fun: On-chain and off-chain are equally important. OKX DEX displays Vibes analysis alongside market data, aggregating KOL call times, top-ranked KOLs, narrative summaries, and comprehensive scores to shorten off-chain information retrieval time. Narrative summaries have become a highly popular AI product feature. Underwater Data Demonstration: Transforming "Visible Ledgers" into "Usable Alpha" Traditional finance's order flow data is controlled by large brokers, and quantitative firms pay hundreds of millions of dollars annually to access it in order to optimize their trading strategies. In contrast, crypto's trading ledgers are completely public and transparent, making valuable intelligence "open source" and creating an open-pit gold mine waiting to be mined. The value of underwater data lies in extracting invisible intentions from visible transactions. This includes capital flows and role profiling—including clues about market makers building or distributing positions, KOL alt-account addresses, concentrated or dispersed chips, bundled transactions, and unusual capital flows. It also involves address profiling—categorizing addresses as smart money, KOL/VC, developer, troll, and insider trading, and linking these to off-chain identities, connecting on-chain and off-chain data. These signals are often difficult for ordinary users to detect, yet they can significantly influence short-term market trends. By analyzing address labels, position characteristics, and bundled transactions in real time, trading assistance tools are revealing underlying market dynamics, helping traders mitigate risk and seek alpha in sub-second market fluctuations.
For example, GMGN further integrates label analysis such as smart money, KOL/VC addresses, developer wallets, insider trading, phishing addresses, and bundled transactions on top of the on-chain real-time transaction and token contract data sets, maps on-chain addresses with social media accounts, and aligns capital flows, risk signals, and price behaviors to the second level, helping users make faster market entry and risk avoidance decisions. ▲ GMGN AI-Driven Actionable Signals: From Information to Profits "The next wave of AI will sell benefits, not tools." — Sequoia Capital This assessment also holds true in the crypto trading sector. Once data speed and dimensionality meet standards, the next competitive objective will be the ability to directly transform complex, multi-dimensional data into actionable trading signals in the data decision-making phase. The evaluation criteria for data decision-making can be summarized into three key points: speed, automation, and excess returns. Fast Enough: With the continuous advancement of AI capabilities, the advantages of natural language processing and multimodal LLM will gradually be realized. They can not only integrate and understand massive amounts of data, but also establish semantic connections between data and automatically extract decisive conclusions. In the high-intensity, low-volume on-chain trading environment, each signal has a very short timeliness and capital capacity, and speed directly affects the signal's potential return. Automation: Humans cannot monitor the market 24 hours a day, but AI can. For example, users can place copy trading conditional buy orders with stop-loss and take-profit settings with agents on the Senpi platform. This requires AI to poll or monitor data in real time in the background, automatically deciding to place orders when it detects a recommended signal.

Return Rate:Ultimately, the effectiveness of any trading signal depends on whether it can consistently generate excess returns. AI not only needs to have a sufficient understanding of on-chain signals but also incorporates risk control to maximize risk-return ratios in highly volatile environments. For example, this can take into account unique on-chain factors affecting return, such as slippage and execution latency.
This capability is reshaping the business logic of data platforms: from selling "data access" to selling "return-driven signals." The competition among next-generation tools will no longer hinge on data coverage, but rather on the actionability of signals—the ability to truly bridge the gap from insight to execution. Some emerging projects have already begun exploring this direction. For example, Truenorth, an AI-driven discovery engine, incorporates "decision execution rate" into its information effectiveness assessment. Using reinforcement learning, it continuously optimizes output, minimizes ineffective noise, and helps users build actionable information flows directly targeted at order placement. While AI holds great potential for generating actionable signals, it faces multiple challenges. On-chain data is highly heterogeneous and noisy. When parsing natural language queries or multimodal signals, LLMs are prone to hallucinations or overfitting, impacting signal yield and accuracy. For example, when multiple tokens share the same name, AI often fails to find the contract address corresponding to CT Ticker. For example, with many AI signal products, discussions of AI in CT often refer to sleepless AI. Signal Lifespan: The trading environment is constantly changing. Any delay erodes returns, and AI must complete data extraction, inference, and execution in an extremely short period of time. Even the simplest copy trading strategy can turn positive returns negative if smart money isn't followed. Risk Control: In high-volatility scenarios, if AI repeatedly fails to upload to the blockchain or experiences excessive slippage, not only will it fail to generate excess returns, but it could also wipe out the entire principal within minutes. Therefore, finding a balance between speed and accuracy, and reducing error rates through mechanisms like reinforcement learning, transfer learning, and simulation backtesting, are key competitive advantages for AI in this field. Up or down? The survival decision for data dashboards With AI now able to directly generate actionable signals and even assist with order placement, "light middle-tier applications" that rely solely on data aggregation are facing an existential crisis. Whether it's creating dashboards by piecing together on-chain data or trading bots that layer execution logic on top of aggregation, both lack a sustainable moat. In the past, these tools thrived thanks to convenience or user preference (for example, users are accustomed to checking Dexscreener for token CTO information). However, with the availability of the same data in multiple locations, the increasing commoditization of execution engines, and the ability for AI to generate decision signals and trigger execution directly from the same data, their competitiveness is rapidly eroding. In the future, efficient on-chain execution engines will continue to mature, further lowering the barrier to entry for trading. In this trend, data providers must make a choice: either focus on developing faster data acquisition and processing infrastructure, or expand to the application layer to directly control user scenarios and consumer traffic. Models stuck in the middle, focusing solely on data aggregation and lightweight packaging, will continue to see their niche squeezed. Focusing on the bottom means building an infrastructure moat. While developing trading products, Hubble AI realized that relying solely on TG Bot would not create long-term advantages, so it shifted to upstream data processing, striving to create "Crypto Databricks." After achieving extreme data processing speed on Solana, Hubble AI is evolving from data processing to an integrated data and research platform, occupying a position upstream in the value chain and providing underlying support for the data needs of the US "finance on-chain" initiative and on-chain AI agent applications. Moving upwards, this means expanding into application scenarios and targeting end users. Space and Time initially focused on sub-second SQL indexing and oracle push, but recently began exploring consumer scenarios with the launch of Dream.Space, a "vibe coding" product on Ethereum. Users can write smart contracts or generate data analysis dashboards in natural language. This transformation not only increases the frequency of data service calls but also fosters direct user engagement through the end-user experience. This shows that those caught in the middle, relying solely on selling data interfaces, are losing their niche. The future B2B2C data landscape will be dominated by two types of players: infrastructure companies that control the underlying pipelines and become the "on-chain water, electricity, and gas"; and platforms that closely integrate user decision-making scenarios and transform data into application experiences. Summary: Driven by the triple resonance of the meme craze, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data landscape is undergoing a structural shift. Advances in transaction speed, data dimensionality, and execution signals are making "visible charts" no longer the core competitive advantage. Instead, the true competitive advantage is shifting to "actionable signals that help users monetize" and the underlying data capabilities that underpin it all. Over the next two to three years, the most attractive entrepreneurial opportunities in the crypto data sector will emerge at the intersection of Web2-level infrastructure maturity and Web3's native on-chain execution model. Data for major currencies like BTC and ETH, due to their high standardization and characteristics similar to traditional financial futures products, has gradually been included in the data coverage of traditional financial institutions and some Web2 fintech platforms. In contrast, data for meme coins and long-tail on-chain assets is highly non-standardized and fragmented. From community narratives and on-chain public opinion to cross-chain liquidity, this information requires the integration of on-chain address profiling, off-chain social signals, and even second-by-second transaction execution to interpret. It is precisely this difference that creates a unique window of opportunity for crypto-native entrepreneurs in the processing and trading of long-tail assets and meme data. We favor projects that focus on the following two areas: Upstream Infrastructure—On-chain data companies with streaming data pipelines, ultra-low-latency indexing, and unified cross-chain parsing frameworks that rival the processing power of Web2 giants. These projects have the potential to become the Web3 equivalent of Databricks or AWS. As users gradually migrate to on-chain, transaction volumes are expected to increase exponentially, and the B2B2C model offers long-term compounding value. Downstream Execution Platforms — Applications that integrate multi-dimensional data, AI agents, and seamless trade execution. By transforming fragmented on-chain and off-chain signals into directly executable trades, these products have the potential to become crypto-native Bloomberg Terminals. Their business model no longer relies on data access fees, but rather monetizes through excess returns and signal delivery. We believe these two types of players will dominate the next generation of crypto data and build sustainable competitive advantages.