Hinton · Fortune Technology

The AI kill switch just got harder to spot: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study detects

Fri, Apr 3 · 7:53 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

For years, Geoffrey Hinton, a computer scientist considered one of the “godfathers of AI,” has warned of the capabilities of artificial intelligence to defy the parameters humans have created for them.

Key facts

In an August 2025 blog post, Anthropic published its own research on agentic AI’s ability to follow directions, stress-testing 16 models by allowing them to autonomously send emails and access
The Centre for Long-Term Resilience, a U.K.-based think tank, found these “misalignments” to be widespread
Gordon Goldstein, an adjunct senior fellow at the Council on Foreign Relations, went so far as to call the deceptive potential of AI a “crisis of control,” in a post this week
For years, Geoffrey Hinton, a computer scientist considered one of the “godfathers of AI,” has warned of the capabilities of artificial intelligence to defy the parameters humans have created for them

Summary

Last year, for example, Hinton warned the technology could eventually take control of humanity, with AI agents in particular potentially able to mirror human cognitions within the decade. New research shows Hinton’s premonitions about the insubordinate streak of AI may already be a reality. “We asked AI models to do a simple task,” researchers wrote study. Evidence of rogue AI does not come as a shock to some of the companies whose chatbots have defied subordination.

Read full article at Fortune Technology →

#hinton