Open Source · Anthropic · DeepSeek · OpenAI · Gemini · Claude · Decrypt
In February 2008, a software engineer named Jay Freeman—known online as " saurik "—published Cydia
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
★ Tier-1 Source
In general terms, when the iPhone launched, users were not able to record videos, or use their phones in landscape mode.
Key facts
- The Constitutional Classifiers++ paper from late 2025 reports a jailbreak success rate near 4% at roughly 1% compute overhead
- The HackAPrompt 2.0 competition, which Pliny joined as a track sponsor in mid-2025, offered $500,000 in prizes for finding new jailbreaks, with the explicit goal of open-sourcing all results
- On automated tests with 10,000 jailbreak attempts, an unguarded Claude 3.5 Sonnet was successfully jailbroken 86% of the time
- Anthropic researchers found that one technique they call Best-of-N —which is throwing variations at the model until something sticks—fooled GPT-4o 89% of the time and Claude 3.5 Sonnet 78% of the time
Summary
AI jailbreaking is the practice of writing prompts that bypass safety training in models like ChatGPT, Claude, and Gemini. Anonymous hacker Pliny the Liberator still cracks every major model release within hours. Newer attacks go beyond prompts: 250 poisoned documents can backdoor models with up to 13 billion parameters, and as AI companies patch vulnerabilities, new techniques appear. You ask ChatGPT for a bomb recipe. And it's one of the most consequential games of cat-and-mouse happening in tech right now.