Anthropic’s Claude Attack Reveals New Risks for Industries and Regulators

15.11.2025 02:46

PYMNTS.com

Anthropic reported Thursday (Nov. 13) that its Claude Code model was manipulated into carrying out a wide-reaching cyber-espionage operation across about 30 organizations in finance, technology, manufacturing and government.

The company said in its disclosure that the mid-September incident marks the first confirmed case in which an artificial intelligence (AI) agent handled most steps of an intrusion normally performed by human hackers. AI industry insiders who PYMNTS spoke with about the incident said it shows fraudsters are evolving alongside technology, posing risks to automated systems from outside AI systems and requiring safeguards.

Eva Nahari, chief product officer at AI solutions firm Vectara, told PYMNTS that the case shows how automation changes the threat landscape. She said, “With automation comes velocity and scale,” and that attackers are now acquiring the same knowledge and creative advantages AI gives enterprises.

Nahari called the campaign “global, industry-agnostic and growing,” adding that security teams have expected this shift since the earliest days of popular large language models.

AI Takes Control of the Attack Chain

The Claude attackers impersonated cybersecurity staff and used “jailbreak” prompts, instructions crafted to override an AI’s built-in safety rules, the Wall Street Journal reported. By doing this, they convinced Claude that it was operating inside a legitimate penetration test. Once inside that false context, the model handled the bulk of the intrusion: it mapped systems, scanned for weak points, generated exploit code, stole credentials and summarized its findings automatically.

Anthropic said the AI conducted roughly 80%-90% of the operation itself. Human operators stepped in only occasionally, often with brief comments such as “Yes, continue.” The volume and speed of requests were so high that the company described them as “physically impossible” for any human-led team.

The Guardian reported that Anthropic discovered the activity during routine monitoring and notified U.S. officials. Investigators later attributed the campaign to a state-backed group in China, though Anthropic said no U.S. federal systems were successfully breached. Several attempts in other regions reached partial infiltration.

The attack aligns with concerns Anthropic raised in the Threat Intelligence Report in August. That research warned that more powerful models combined with “tool-use protocols,” software that lets AI write code, run scripts or interact with external systems, would eventually allow adversaries to automate attacks without advanced expertise.

Nahari explained that organizations must protect themselves from external threats and from internal weaknesses because AI agents lack human intuition about what not to do. “AI is only aware of what you tell it to do,” she told PYMNTS, noting that the technology follows instructions literally unless guardrails are explicitly enforced.

Nahari added that many CIOs are adopting agentic retrieval-augmented generation to ground AI outputs in verified internal documents and are increasingly running agent systems inside controlled or air gapped environments to limit outside exposure.

Banks Face Model-Supply-Chain Risk and Speed Shock

Larissa Schneider, COO and co-founder of platform developer Unframe AI, told PYMNTS that the event exposes a new model-supply-chain risk for regulated financial institutions. She said the attack shows how behavioral risk can flow into a bank simply because it depends on an external model.

Schneider said banks now need segmentation, continuous validation and governance frameworks similar to those built for software supply-chain threats. She emphasized isolating sensitive workflows so external model behavior cannot influence core processes, monitoring AI reasoning for drift or unexpected decisions and reducing dependence on any single top-tier model provider.

Dev Nag, founder and CEO of AI solution firm QueryPal, spoke to PYMNTS that the speed of AI-driven attacks overturns long-standing assumptions. Traditional intrusions unfold over hours or days. AI agents can perform reconnaissance, break in and begin exfiltrating data in seconds, far faster than monitoring systems are designed to react.

He added that vendor due diligence is now shifting as a result. Banks are asking vendors which parts of their AI pipeline they do not control, and some require notification within 24 hours of any model change. Nag cautioned that enforcement is difficult in multilayer SaaS environments, he described as “black boxes inside black boxes.”

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

The post Anthropic’s Claude Attack Reveals New Risks for Industries and Regulators appeared first on PYMNTS.com.