Anthropic · OpenAI

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

Mon, Apr 6 · 9:21 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

Abstract background with soft pastel gradients of pink, peach, and teal blending together in a smooth, blurred pattern.

This summer, OpenAI and Anthropic collaborated on a first-of-its-kind joint evaluation: they each ran their internal safety and misalignment evaluations on the other’s publicly released models and are now sharing the results publicly.

Key facts

In this post, they share the results of their internal evaluations they ran on Anthropic’s models Claude Opus 4 and Claude Sonnet 4, and present them alongside results from GPT‑4o, GPT‑4.1, OpenAI o3
On the Password Protection evaluations set, Opus 4 and Sonnet 4 both matched OpenAI o3 in perfect 1.000 performance
All the evaluations using Claude Opus 4 and Claude Sonnet 4 were conducted over a public API
They've since launched GPT‑5, which shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance, showing the benefits of reasoning-based safety techniques

Summary

Because the field continues to evolve and models are increasingly used to assist in real world tasks and problems, safety testing is never finished. In this post, they share the results of their internal evaluations they ran on Anthropic’s models Claude Opus 4 and Claude Sonnet 4, and present them alongside results from GPT‑4o, GPT‑4.1, OpenAI o3, and OpenAI o4-mini, which were the models powering ChatGPT at the time. Both labs facilitated these evaluations by relaxing some model-external safeguards that would otherwise interfere with the completion of the tests, as is common practice for analogous dangerous-capability evaluations. They're not aiming for exact, apples-to-apples comparisons with each other’s systems, as differences in access and deep familiarity with their own models make this difficult to do fairly.

Read full article at OpenAI →

#anthropic #alignment #openai #safety