Anthropic · AI Agent · OpenAI · Google · Circle · xAI · Fortune Technology

White Circle raises $11 million to stop AI models from going rogue in the workplace

Tue, May 12 · 6:00 AM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

White Circle's founding team sat on a sofa.

One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model.

Key facts

White Circle, a Paris-based AI control platform that has now raised $11 million, is Shilov’s answer to the new wave of risks posed by AI models in company workflows
One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model
In May, the company published KillBench, a study that ran more than one million experiments across 15 AI models, including models from OpenAI, Google, Anthropic, and xAI, to test how systems behaved
The startup currently has a team of 20, distributed across London, France, Amsterdam, and elsewhere in Europe

Summary

The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. Shilov posted about it on X and, by the next morning, it had gone viral. The social media success brought with it an invitation from companies Anthropic to test their models privately, something that convinced Shilov that the issue was bigger than finding these problematic prompts. “Jailbreaks are one part of the problem,” Shilov said.

Read full article at Fortune Technology →

#Anthropic #AI Agent #OpenAI #Google #Circle #xAI