Claude Code · Anthropic · Claude · anthropic.com
Given this, Claude Code shipped with the simplest possible defense: allow reads, require approval for write, bash
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
◌ Single Source
However, as mentioned, approval fatigue showed up within weeks.
Key facts
- On Gray Swan's Agent Red Teaming benchmark, which tests susceptibility to prompt injection, Claude Opus 4.7 holds attack success to roughly 0.1% on single attempts, and around 5–6% after 100 adaptive
- Between mid-2025 and January 2026, they received reports of vulnerabilities in Claude Code through their responsible disclosure program
- In February 2026, during a controlled internal red-team exercise, a researcher successfully phished an employee into launching Claude Code with a malicious prompt
- Across 25 retries of that prompt, Claude completed the exfiltration 24 times
Summary
Twelve months ago, they'd have rejected out of hand the idea of granting Claude access sufficient to take down an internal Anthropic service. Today that level of access is routine, and Anthropic developers are more productive for it. The first is to supervise the agent’s behavior via a human-in-the-loop. Claude Code previously protected against agents taking unintended actions by asking users for permission at each turn. The second approach to capping the blast radius—and the focus of much of this post—is containment. Over the past two years, they've shipped three primary agentic products: claude.ai, Claude Code, and Claude Cowork. User misuse: A user—either maliciously or through carelessness—directs the agent to do something harmful.