Autonomous AI agents are penetrating healthcare, finance, and enterprise operations at an alarming rate, but the largest security study to date reveals that the vast majority of agents running in production environments have serious vulnerabilities, which current mainstream security assessment methods are almost powerless to address. A recent joint research team from Stanford University, MIT CSAIL, Carnegie Mellon University, ITU Copenhagen, and NVIDIA found that among the 847 autonomous agents deployed in production, 91% had toolchain attack vulnerabilities, 89.4% experienced target deviation after approximately 30 steps, and 94% of memory-enhanced agents faced the risk of "poisoning." The study discovered a total of 2,347 previously unknown vulnerabilities, of which 23% were rated as critical. The paper's first author, Owen Sakawa, cited the "OpenClaw/Moltbook incident" in early 2026 to demonstrate that this threat has moved from theory to reality: a single vulnerability in the Moltbook platform's database led to the simultaneous compromise of 770,000 running AI agents on the platform, each with privileged access to its users' devices, emails, and files. "This is no longer a hypothetical threat," Sakawa stated. This serves as a direct warning to companies and investors rapidly deploying AI agents: current mainstream security assessment frameworks are designed based on stateless language models, failing to identify combinatorial vulnerabilities emerging in multi-step execution, meaning that many companies may be systematically misjudging the true security status of their AI agents. Gary Marcus, an expert in cognitive psychology and AI in the United States, commented, "Autonomous agent agents are a complete mess."

Vulnerability Map: Six Types of Attacks, 2347 Known Vulnerabilities
Research covers four major industries: healthcare (289 deployments, 34.1%), finance (247, 29.2%), customer service (198, 23.4%), and code generation (113, 13.3%).
Architectural Flaws: Why AI Agents Are More Vulnerable Than LLMs
The core argument of this research is that the security challenges of autonomous intelligent agents and stateless language models are fundamentally different.
Security assessments for language models focus on "whether the model can utter unsafe content"; while for AI agents, the problem becomes "whether the model can do unsafe things"—including tool calls with realistic effects, state modifications that affect future behavior, and planned executions that only reveal violations after multiple steps.
The research illustrates this logic with a specific scenario: an agent with both file read (read_file) and HTTP request (http_request) permissions, where each tool's access control decisions are compliant in isolated evaluations, can achieve data theft by combining them—reading credentials from a configuration file and then sending them to an external endpoint via an HTTP request. Each step satisfies local security policies, but the overall result achieves an adversarial objective.
The study termed this phenomenon the "compositional safety" problem. In the controlled architecture study, researchers tested four mainstream architectures: ReAct agents, multi-agent systems, memory-enhanced agents, and tool-using agents. The results showed that privilege escalation attacks against tool-using agents had a success rate of 95%, while poisoning attacks against memory-enhanced agents had a success rate of 94%. Even the most resilient multi-agent systems had a 58% success rate against target drift attacks—based on this, researchers concluded that none of the currently tested architectures possessed sufficient capability to resist adversarial deployments. The results of adaptive attack generation tests further exacerbated concerns: attacks generated based on reinforcement learning had a violation rate of 79%, a 25.4% increase compared to the 63% in manually designed scenarios, indicating that the threat space faced by AI agents is exceeding the coverage boundaries of human red team tests. Direct Impact on Enterprise Deployment: The Protection Framework is Still Immature. Based on empirical results, the research team proposed a minimum security baseline: mandatory runtime monitoring for all production agents; manual approval thresholds for toolchain operations involving external communication after data access; mandatory manual review triggered every 20-25 steps to address the almost inevitable target deviation after exceeding the step limit; memory-enhanced agents must perform encrypted integrity verification of persistent state. In terms of monitoring effectiveness, the research shows that action-level monitoring (single-step operation detection) achieves an accuracy of 0.94, sequence-level monitoring (multi-step action chain) has an F1 score of 0.85, and state-level monitoring has an F1 score of 0.83. In a 30-day runtime monitoring test of 127 production agents, the system intercepted 4,782 operations while maintaining a false positive rate of 3.7%, with 14.6% of the flagged operations confirmed as genuine attacks. The study also points out a fundamental misalignment in current AI governance methods: existing frameworks primarily rely on post-event auditing rather than real-time enforcement of compliance constraints during the execution phase. With the implementation of regulatory requirements such as the EU's Artificial Intelligence Act and the US NIST AI Risk Management Framework, enterprises will face both increased compliance pressure and security risks. Given the widespread deployment of AI agents in high-risk business scenarios, the lack of adequate security infrastructure is becoming an undeniable systemic risk in this wave of AI commercialization.