← Back to KHAO

Anthropic ·

Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

Abstract background with soft pastel gradients of pink, peach, and teal blending together in a smooth, blurred pattern.

This summer, OpenAI and Anthropic collaborated on a first-of-its-kind joint evaluation: they each ran their internal safety and misalignment evaluations on the other’s publicly released models and are now sharing the results publicly.

Key facts

Summary

Because the field continues to evolve and models are increasingly used to assist in real world tasks and problems, safety testing is never finished. In this post, they share the results of their internal evaluations they ran on Anthropic’s models Claude Opus 4 and Claude Sonnet 4, and present them alongside results from GPT‑4o, GPT‑4.1, OpenAI o3, and OpenAI o4-mini, which were the models powering ChatGPT at the time. Both labs facilitated these evaluations by relaxing some model-external safeguards that would otherwise interfere with the completion of the tests, as is common practice for analogous dangerous-capability evaluations. They're not aiming for exact, apples-to-apples comparisons with each other’s systems, as differences in access and deep familiarity with their own models make this difficult to do fairly.

Read full article at OpenAI →

#anthropic #alignment #openai #safety