← Back to KHAO

Research ·

MLSN #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Gemini 3.1 Pro’s signed wellbeing for a variety of different situations.

TLDR: they measure AIs’ expressions of pleasure and pain, finding consistent and surprising preferences.

Key facts

Summary

AIs display behaviors that mimic human emotions, such as attempting to debug code and saying “EUREKA!” or “I am a failure…” At the Center for AI Safety, they investigated these phenomena and measured functional wellbeing, which refers to behavioral signatures that, in beings with clear moral status, would indicate positive or negative welfare. Self-Reports of model wellbeing on a 1-7 scale of various emotions such as happiness and calmness. Signed Utilities encompassing which past and future experiences the model prefers over others, with either a positive or negative valence (sign).

Read full article at AI Safety Newsletter →