Anthropic apologizes for invisible Claude Fable guardrails

Thu, Jun 11 · 12:05 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.

◎ Multiple-sources

The company says it will make the covert safeguard preventing model distillation as visible as other safety measures.

Key facts

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems
Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X
In some cases, notably biology, the safeguards have been calibrated so broadly that Fable is practically unusable for even basic queries, something Anthropic spokesperson Paruul Maheshwary
In Fable’s system card, a public document AI developers release to explain how a system works, Anthropic said it would handle queries it believed were distillation attempts by altering and degrading

Summary

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. Fable is the first widely available model in Anthropic’s Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. In Fable’s system card, a public document AI developers release to explain how a system works, Anthropic said it would handle queries it believed were distillation attempts by altering and degrading the model’s answers directly. Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X.

#Anthropic #Claude #Mythos