Author: Zhao Ying, Wall Street Insights
The focus of the AI computing power race is quietly shifting from GPUs to a long-neglected role—the CPU.
With the explosive growth of AI agents and reinforcement learning (RL) workloads, the strategic position of CPUs in data centers is undergoing a structural reassessment. In an in-depth interview on April 8th, Dylan Patel, chief analyst at the well-known semiconductor analysis firm SemiAnalysis, bluntly stated that the paradigm of AI workloads is evolving from simple text generation to complex agents and reinforcement learning, and CPUs are facing an "extremely severe capacity shortage."
... A recent report from market research firm TrendForce confirms this assessment: Currently, the CPU to GPU ratio in AI data centers is approximately 1:4 to 1:8, but in the era of agent-based AI, this ratio is expected to narrow significantly to 1:1 to 1:2. This structural shift has already triggered a chain reaction on both the supply and demand sides. Intel and AMD raised prices for some of their CPU product lines by the end of the first quarter of 2026. Meanwhile, Nvidia and Arm both announced their entry into the server CPU market in March 2026—a GPU giant and an IP licensor making the same choice in the same month is no coincidence, but a concentrated release of market signals. The Rise of Intelligent Agents: CPUs Become Bottlenecks In the early stages of AI development, the CPU's role was quite marginal. Dylan Patel described it as: "The workload was very light. You send a string, it sends a string back, simple inference, not much demand on the CPU." At that time, GPUs dominated AI computing power demands with their massively parallel matrix operation capabilities, and CPUs only played an auxiliary role in compressing and routing memory data to GPUs. However, the new generation of inference models, represented by OpenAI o1, and the emerging AI agent architecture, fundamentally changed this landscape. Unlike static large language models, agent AI needs to dynamically interact with the environment—planning tasks, calling tools, transferring data between sub-agents, and evaluating task completion. The entire coordination work of this "orchestration layer" falls precisely on the CPU, making it a typical CPU-intensive load. The academic paper "A CPU-Centric Perspective on Agentic AI," published in November 2025, further quantifies this pressure: in agentic AI scenarios, latency generated by CPU tool processing (including Python interpretation, web crawling, lexical summarization, database retrieval, etc.) can account for up to 90.6% of the total latency; in high-volume processing scenarios, CPU dynamic energy consumption can reach 44% of the total system dynamic energy consumption. Arm's calculations reveal the magnitude of the demand gap from a capacity perspective: traditional AI data centers require approximately 30 million CPU cores per gigawatt (GW), while in the era of agentic AI, this demand will surge to 120 million cores—a fourfold increase. Intel Under Pressure, AMD Expands Seizes Opportunity. The structural rise in CPU demand has first triggered a reshaping of the traditional x86 market. Intel's Xeon processors once held over 95% of the data center CPU market share. This dominance began to crumble in 2021—yield issues with Intel's 7-process technology delayed the release of Xeon Sapphire Rapids by nearly two years, opening a market gap for AMD's EPYC Milan. In 2026, Intel plans to launch two flagship products: the Xeon 6+ (Clearwater Forest) based on the Darkmont architecture, featuring 288 cores/288 threads and a TDP of approximately 450W; and the Xeon 7 (Diamond Rapids) based on the Panther Cove-X architecture, with up to 256 cores/256 threads and a TDP as high as 650W. Both products are based on Intel's most advanced 18A process and introduce Foveros Direct hybrid bonding technology for the first time. However, TrendForce points out that due to ongoing yield issues with the 18A process, mass production of both products may be delayed until 2027. In contrast, AMD's pace is more stable. AMD's 2026 flagship product, EPYC Venice, will utilize TSMC's N2 process and Zen 6 architecture, featuring CoWoS-L and SoIC advanced packaging. Through Simultaneous Multithreading (SMT) technology, it will achieve 256 cores/512 threads—the highest thread count currently on the market. **TrendForce predicts that AMD will continue to erode Intel's market share in 2026.** Nvidia and Arm's strong entry will rewrite the competitive landscape. Beyond the traditional x86 giants, a group of non-traditional players are flooding into the server CPU market at an unprecedented pace, fundamentally reshaping the competitive landscape. In March 2026, Nvidia announced that it would sell the Vera CPU as a standalone product to meet customers' needs for more flexible CPU:GPU configurations. Vera utilizes NVIDIA's proprietary Olympus architecture, based on TSMC's N3 process and CoWoS-R packaging, offering 88 cores/176 threads and featuring an NVLink-C2C interconnect with speeds of 1.8 TB/s, enabling memory sharing with NVIDIA GPUs. Initial partners include Alibaba, ByteDance, Cloudflare, CoreWeave, and Oracle. NVIDIA also launched the Vera CPU rack, integrating 256 CPUs per rack, totaling 22,528 cores/45,056 threads and 400 TB of total memory. In the same month, Arm announced the launch of its first self-developed CPU product, the Arm AGI CPU, ending its 35-year history as a pure licensor. Based on TSMC's N3 process and Neoverse V3 architecture, the product offers 136 cores/136 threads, a TDP of 300W, and supports DDR5-8800 memory and PCIe Gen6. Initial partners include Meta, OpenAI, Cerebras, Cloudflare, and SK Telecom. Arm simultaneously launched two rack configurations: an air-cooled version integrating 60 AGI CPUs (8,160 cores, approximately 180 TB of memory), and a liquid-cooled version supporting 336 CPUs (45,696 cores, 1 PB of memory). Major cloud service providers (CSPs) are also accelerating their deployment of self-developed CPUs. AWS will release the Graviton5 (192 cores/192 threads) based on TSMC's N3 process in December 2025, and will deploy it in conjunction with its self-developed Trainium 3 AI ASIC to reduce AI computing costs; Microsoft will launch the Cobalt 200 (N3 process, 132 cores/132 threads) in November 2025; Google plans to launch the Axion C4A.metal bare metal version and the next-generation Axion N4A in 2026, focusing on the highest cost-performance ratio. IC Back-End Design Service Providers Usher in Incremental Opportunities
The large-scale entry of non-traditional players is creating considerable incremental business for IC back-end design service providers.
TrendForce points out that AWS still insists on completing CPU back-end design in-house, while Google and Microsoft have outsourced their CPU back-end design services to Global Unichip Corp. (GUC). As more CSPs and emerging CPU manufacturers enter the market, this outsourcing demand is expected to continue to expand.
TrendForce predicts that between 2026 and 2028, ASIC design service providers such as Broadcom, Marvell, GUC, Alchip, and MediaTek will successively undertake new projects from these clients.