Anthropic · Claude · Mythos · theverge.com
Anthropic apologizes for invisible Claude Fable guardrails
Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.
◎ Multiple-sources
The company says it will make the covert safeguard preventing model distillation as visible as other safety measures.
Key facts
- Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems
- Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X
- In some cases, notably biology, the safeguards have been calibrated so broadly that Fable is practically unusable for even basic queries, something Anthropic spokesperson Paruul Maheshwary
- In Fable’s system card, a public document AI developers release to explain how a system works, Anthropic said it would handle queries it believed were distillation attempts by altering and degrading
Summary
Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. Fable is the first widely available model in Anthropic’s Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. In Fable’s system card, a public document AI developers release to explain how a system works, Anthropic said it would handle queries it believed were distillation attempts by altering and degrading the model’s answers directly. Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X.