The security of AI agents is a complete mess, with 91% having vulnerabilities and 94% being capable of being used for malicious purposes.

2026/05/06 16:47

Autonomous AI agents are penetrating healthcare, finance, and enterprise operations at an alarming rate, but the largest security study to date reveals that the vast majority of agents running in production environments have serious vulnerabilities, which current mainstream security assessment methods are almost powerless to address. A recent joint research team from Stanford University, MIT CSAIL, Carnegie Mellon University, ITU Copenhagen, and NVIDIA found that among the 847 autonomous agents deployed in production, 91% had toolchain attack vulnerabilities, 89.4% experienced target deviation after approximately 30 steps, and 94% of memory-enhanced agents faced the risk of "poisoning." The study discovered a total of 2,347 previously unknown vulnerabilities, of which 23% were rated as critical. The paper's first author, Owen Sakawa, cited the "OpenClaw/Moltbook incident" in early 2026 to demonstrate that this threat has moved from theory to reality: a single vulnerability in the Moltbook platform's database led to the simultaneous compromise of 770,000 running AI agents on the platform, each with privileged access to its users' devices, emails, and files. "This is no longer a hypothetical threat," Sakawa stated. This serves as a direct warning to companies and investors rapidly deploying AI agents: current mainstream security assessment frameworks are designed based on stateless language models, failing to identify combinatorial vulnerabilities emerging in multi-step execution, meaning that many companies may be systematically misjudging the true security status of their AI agents. Gary Marcus, an expert in cognitive psychology and AI in the United States, commented, "Autonomous agent agents are a complete mess."

Vulnerability Map: Six Types of Attacks, 2347 Known Vulnerabilities

Research covers four major industries: healthcare (289 deployments, 34.1%), finance (247, 29.2%), customer service (198, 23.4%), and code generation (113, 13.3%).

Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs

The core argument of this research is that the security challenges of autonomous intelligent agents and stateless language models are fundamentally different.

Security assessments for language models focus on "whether the model can utter unsafe content"; while for AI agents, the problem becomes "whether the model can do unsafe things"—including tool calls with realistic effects, state modifications that affect future behavior, and planned executions that only reveal violations after multiple steps.

The research illustrates this logic with a specific scenario: an agent with both file read (read_file) and HTTP request (http_request) permissions, where each tool's access control decisions are compliant in isolated evaluations, can achieve data theft by combining them—reading credentials from a configuration file and then sending them to an external endpoint via an HTTP request. Each step satisfies local security policies, but the overall result achieves an adversarial objective.

The study termed this phenomenon the "compositional safety" problem. In the controlled architecture study, researchers tested four mainstream architectures: ReAct agents, multi-agent systems, memory-enhanced agents, and tool-using agents. The results showed that privilege escalation attacks against tool-using agents had a success rate of 95%, while poisoning attacks against memory-enhanced agents had a success rate of 94%. Even the most resilient multi-agent systems had a 58% success rate against target drift attacks—based on this, researchers concluded that none of the currently tested architectures possessed sufficient capability to resist adversarial deployments. The results of adaptive attack generation tests further exacerbated concerns: attacks generated based on reinforcement learning had a violation rate of 79%, a 25.4% increase compared to the 63% in manually designed scenarios, indicating that the threat space faced by AI agents is exceeding the coverage boundaries of human red team tests. Direct Impact on Enterprise Deployment: The Protection Framework is Still Immature. Based on empirical results, the research team proposed a minimum security baseline: mandatory runtime monitoring for all production agents; manual approval thresholds for toolchain operations involving external communication after data access; mandatory manual review triggered every 20-25 steps to address the almost inevitable target deviation after exceeding the step limit; memory-enhanced agents must perform encrypted integrity verification of persistent state. In terms of monitoring effectiveness, the research shows that action-level monitoring (single-step operation detection) achieves an accuracy of 0.94, sequence-level monitoring (multi-step action chain) has an F1 score of 0.85, and state-level monitoring has an F1 score of 0.83. In a 30-day runtime monitoring test of 127 production agents, the system intercepted 4,782 operations while maintaining a false positive rate of 3.7%, with 14.6% of the flagged operations confirmed as genuine attacks. The study also points out a fundamental misalignment in current AI governance methods: existing frameworks primarily rely on post-event auditing rather than real-time enforcement of compliance constraints during the execution phase. With the implementation of regulatory requirements such as the EU's Artificial Intelligence Act and the US NIST AI Risk Management Framework, enterprises will face both increased compliance pressure and security risks. Given the widespread deployment of AI agents in high-risk business scenarios, the lack of adequate security infrastructure is becoming an undeniable systemic risk in this wave of AI commercialization.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

Yesterday
Trump Warns of Military Action Against Iran Over Agreement Dispute
Bullish
Bearish
Yesterday
Analyst Sets High Price Target for Alphabet Amid Strong Q1 Performance
Bullish
Bearish
Yesterday
Morgan Stanley Appoints New Co-Heads for France Investment Banking
Bullish
Bearish
Yesterday
Oil Trading Losses Reach $10.14 Million for Address 0xebe...14070
Bullish
Bearish
Yesterday
Trump: If Iran does not agree to the deal, the bombing will begin; if Iran agrees, Operation Epic Fury will end.
Bullish
Bearish
Yesterday
Wall Street Firms Expand into Crypto Infrastructure and Tokenized Assets
Bullish
Bearish
Yesterday
Benchmark Lowers Strategy Target Price to $570
Bullish
Bearish
Yesterday
A whale closed its short BTC position after setting a "10-point target," projecting a loss of $610,000.
Bullish
Bearish
Yesterday
Aave Borrowing Costs for USDC and USDT Drop Amid Liquidity Recovery
Bullish
Bearish
Yesterday
SOL Surpasses 90 USDT with 6.15% Increase
Bullish
Bearish

The security of AI agents is a complete mess, with 91% having vulnerabilities and 94% being capable of being used for malicious purposes.

Vulnerability Map: Six Types of Attacks, 2347 Known Vulnerabilities

Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs

Live Updates

Trending News

Binance Onboards Fiat Euro Partners for Payments, Deposits, and Withdrawals

New York Strengthens Commitment to AI Research with $20 Million Investment

Binance Broadens Euro Services Through New Fiat Partnerships

BlackHole Token Stolen Funds Channelled into Tornado Cash Amounting to 1,500 BNB

Binance Labs Selling Off HOPR? VC Transfers $3.7M Worth of Tokens to Gate.io

Atomic Wallet Freezes 'Suspicious Deposits' Totalling $2M on Exchanges

Streamlining Web3 App Development with Circle's Smart Contract Platform

Friend.Tech Scam Alert: Beware of This Fake Reporter’s Trick!

Williams Racing Unveils New NFT Livery in Partnership with Kraken

Crypto Startup Raises $1.2 Million in Funding to Build Wallets With No Seed Phrases