Anthropic · Wired

Anthropic Confirms That Claude Contains Its Own Kind of Emotions

Thu, Apr 2 · 4:00 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted.

Claude has been through a lot lately—a public fallout with the Pentagon, leaked source code— so it makes sense that it would be feeling a little blue.

Key facts

Researchers at the company probed the inner workings of Claude Sonnet 4.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions
When Claude says it is happy to see you, for example, a state inside the model that corresponds to “happiness” may be activated
While Anthropic’s latest study might encourage people to see Claude as conscious, the reality is more complicated
To understand how Claude might represent emotions, the Anthropic team analyzed the model’s inner workings as it was fed text related to 171 different emotional concepts

Summary

A new study from Anthropic suggests models have digital representations of human emotions like happiness, sadness, joy, and fear, within clusters of artificial neurons—and these representations activate in response to different cues. Researchers at the company probed the inner workings of Claude Sonnet 4.5 and found that so-called “functional emotions” seem to affect Claude’s behavior, altering the model’s outputs and actions. Anthropic’s findings may help ordinary users make sense of how chatbots work. “What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions,” says Jack Lindsey, a researcher at Anthropic who studies Claude’s artificial neurons.

Read full article at Wired →

#anthropic #claude