Open Source · Anthropic · DeepSeek · OpenAI · Gemini · Claude · Decrypt

In February 2008, a software engineer named Jay Freeman—known online as " saurik "—published Cydia

Sat, May 16 · 1:01 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

In general terms, when the iPhone launched, users were not able to record videos, or use their phones in landscape mode.

Key facts

The Constitutional Classifiers++ paper from late 2025 reports a jailbreak success rate near 4% at roughly 1% compute overhead
The HackAPrompt 2.0 competition, which Pliny joined as a track sponsor in mid-2025, offered $500,000 in prizes for finding new jailbreaks, with the explicit goal of open-sourcing all results
On automated tests with 10,000 jailbreak attempts, an unguarded Claude 3.5 Sonnet was successfully jailbroken 86% of the time
Anthropic researchers found that one technique they call Best-of-N —which is throwing variations at the model until something sticks—fooled GPT-4o 89% of the time and Claude 3.5 Sonnet 78% of the time

Summary

AI jailbreaking is the practice of writing prompts that bypass safety training in models like ChatGPT, Claude, and Gemini. Anonymous hacker Pliny the Liberator still cracks every major model release within hours. Newer attacks go beyond prompts: 250 poisoned documents can backdoor models with up to 13 billion parameters, and as AI companies patch vulnerabilities, new techniques appear. You ask ChatGPT for a bomb recipe. And it's one of the most consequential games of cat-and-mouse happening in tech right now.

Read full article at Decrypt →

#Open Source #Anthropic #DeepSeek #OpenAI #Gemini #Claude #Google #GitHub #Llama #Apple #Meta #GPT #Constitutional Classifiers #Jailbreaking #HackAPrompt #StrongREJECT #Best-of-N jailbreak #Pliny the Liberator #iPhone #Cydia #Digital Millennium Copyright Act (DMCA)