Agentic AI is shifting the AI industry's narrative from cost-driven to profit-driven. Goldman Sachs believes that with a surge in token consumption and the underlying computing power cost declining faster than token pricing, the gross margin inflection point for hyperscale cloud vendors and large-scale model providers may arrive within the next 3 to 12 months. According to a report released by Goldman Sachs on May 5th, the bank expects that by 2030, consumer and enterprise AI agents will drive global token consumption 24 times higher than in 2026, reaching approximately 120 quadrillion tokens per month; if enterprise agents reach peak adoption in 2040, this figure will further expand to 55 times. Meanwhile, Goldman Sachs' projected price and cost curves show that the pricing of mainstream large-scale tokens has stabilized or even slightly rebounded from its previous annual decline of approximately 40%, while the cost per token driven by chips such as Nvidia, AMD, Google TPU, and Trainium continues to decline at a rate of 60% to 70% annually. This divergence between the two curves is opening up profit margins for the industry. Large-scale capital expenditures on AI infrastructure may gain more sustainable economic support due to improved profit margins. Token Economics Turning Point: Costs Decline Faster Than Prices, Profit Margins Are Opening Up The core argument of the Goldman Sachs report is that the AI industry is moving from a phase of "uncertain inference economics that could dilute profits" to a new phase where "token increments are realized with attractive profit margins." In the first phase of the AI cycle, investors generally viewed computing power and tokens as cost drivers—more usage meant more inference load, more accelerators, more electricity, and higher capital expenditures. However, Goldman Sachs' inferred price versus cost curve suggests that this logic is shifting. While the pricing of mainstream large-scale model tokens has decreased significantly, it has now stabilized and even rebounded in some cases. Meanwhile, the total cost per token for Nvidia, Google TPU (Broadcom), AMD, and Trainium (Marvell) continues to decline rapidly and steadily. If token pricing stabilizes above the token cost, the increased adoption of agent AI will lead to positive profit expansion, rather than just revenue growth. Goldman Sachs further points out that agent AI may form a self-reinforcing economic flywheel: lower per-token computing power costs lead to richer and more complex agents; richer agents consume more tokens through longer contexts, more loops, more verification, and continuous monitoring; higher utilization improves the economics of AI infrastructure, thereby supporting providers to continuously invest in model quality and distribution capabilities. Goldman Sachs believes this flywheel is distinctly different from the mainstream narrative in the market that "AI use will bring unsustainable cost burdens." However, Goldman Sachs also cautioned against the risks: not all AI workloads can guarantee a positive profit inflection point. For highly commoditized text-based chatbots, competition may still force token pricing to decline faster than computing power costs. Consumer-side agents: From fragmented conversations to "permanent" assistants, token consumption will increase 12-fold. Goldman Sachs estimates that by 2030, consumer-side AI agents could increase global token consumption by 12-fold, adding approximately 60 quadrillion tokens per month. The report categorizes consumer-side agents into two types: "on-demand" agents, such as browser-based agents like OpenAI Operator and Claude Code, which autonomously plan, execute, and return results after users initiate tasks; and "resident" agents, such as email monitoring, calendar management, or digital life assistants that continuously run in the background. Goldman Sachs believes the biggest leap in token consumption will occur when agents shift from user-initiated tasks to continuous background operation—the agents continuously monitor the context and act proactively when needed. Simulated data shows that a typical LLM chatbot consumes approximately 1,000 tokens per session, an embedded Copilot consumes over 5,000 tokens per day, while a resident agent can consume over 100,000 tokens per day. Goldman Sachs projects that daily AI queries will increase from approximately 5 billion in 2025 to approximately 23 billion by 2030, with up to 30% flowing to proxies in areas such as search, shopping, travel, email, and personal productivity. Meanwhile, the share of traditional search engines in query volume is expected to decrease from 68% in 2025 to 36% in 2030, while the share of LLM-native applications will rise from 12% to 31%. Enterprise-side Proxy: Workflow Complexity Drives Token Strength, Consumption May Increase 55-Fold by 2040
Goldman Sachs predicts that enterprise-side AI proxies will become the largest token multiplier, driving a 24-fold increase in global token consumption by 2030, and further increasing to 55-fold by peak adoption in 2040. At that time, enterprise-side workloads will account for more than 70% of total global token usage.
The reason why enterprise-side proxies have greater token strength than consumer-side proxies is that their workflows require proxies to perform more complex and precise operations—monitoring tasks, retrieving context, inferring anomalies, verifying outputs, updating systems, and continuously reporting issues throughout the workday.
Furthermore, enterprise agents often involve heavier multimodal inputs (voice, images, documents, screen activity, application data, logs, and structured system records), which significantly increases token strength. Goldman Sachs quantified token consumption across different professions by constructing simulated agents. The results showed that programming agents consume approximately 7 million tokens per day, with API costs around $13 per day, far lower than manual costs. This explains why agent adoption is fastest in software development. Call center agents consume approximately 2 million tokens per day, but relying on real-time voice processing can cost as much as $92 per day, making full voice automation economically uncompetitive. Data entry agents consume approximately 25 million tokens per day, with costs around $60 per day, still lower than manual costs. Goldman Sachs points out that the adoption rate of enterprise-level agents will depend on four variables: token quantity, API cost, modality combination, and implementation complexity. Text-based workflows with mature tool ecosystems will scale first; voice-based workflows or those deeply integrated with backend systems may progress more slowly. From an adoption curve perspective, Goldman Sachs believes that enterprise-level agent AI is most likely to follow an S-shaped curve, with a peak adoption rate of approximately 35% to 40% of knowledge workers, reaching its peak in about 15 years, faster than the median of historical technology diffusion (29 years). Capital Expenditure Sustainability: Improved Profits Provide More Room for Hyperscale Cloud Vendors
A key investment conclusion of the Goldman Sachs report is that improved profit margins for hyperscale cloud vendors will make current high infrastructure investments more sustainable, thereby alleviating core market concerns about the return on AI capital expenditures.
The report points out that operators are still constrained by supply in meeting current and future computing power demands. Google and Meta have both raised their fiscal year 2026 capital expenditure forecasts, and Amazon's management reiterated its strategy of maintaining high capital expenditures after its Q1 earnings report. Goldman Sachs expects that as the profit inflection point approaches, investors will increasingly look for evidence of visible returns.
Regarding specific targets, Goldman Sachs' core rationale for Amazon lies in the renewed acceleration of AWS revenue growth (28% year-over-year growth in Q1) and its $364 billion revenue backlog. Its view on Google is based on its cloud business's 63% year-over-year growth in Q1 and its backlog nearly doubling sequentially to approximately $460 billion. Its assessment of Meta is based on its advertising business growth significantly outpacing the overall digital advertising industry, and the continued contribution of AI computing power to improving user engagement and ad monetization. In the software sector, Goldman Sachs believes that lower token costs make it easier for software vendors to embed agents into existing products without significantly impacting gross margins, while supporting pricing around results, productivity, or units of work rather than simply the number of seats, thereby expanding the software addressable market. For IT service companies, as agents shift their AI consumption from standalone tools to enterprise-level, highly integrated workflow transformation, the demand for integration, governance, and managed orchestration will increase significantly. Accenture is seen as a major beneficiary of this trend.