← Back to KHAO

Claude ·

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Jack Clark's avatar.

What is MirrorCode: “Each MirrorCode task consists of a command-line (CLI) program that an agent is tasked to reimplement exactly.

Key facts

Summary

Welcome to Import AI, a newsletter about AI research. AI can reverse engineer software that contains thousands of lines of code: …MirrorCode demonstrates some of the long-horizon capabilities of modern AI systems… AI measurement organizations METR and Epoch have built MirrorCode, a benchmark meant to test out how well AI models can autonomously reimplement complex existing software. The results show that AI systems are more capable than most people think at certain types of coding task, suggesting AI progress may be even faster than they previously thought. “The full MirrorCode benchmark includes more than 20 target programs spanning different areas of computing: Unix utilities, data serialization and query tools, bioinformatics, interpreters, static analysis, cryptography, and compression.” The results: Today’s AI models are extremely capable at some of these tasks: “Claude Opus 4.6 successfully reimplemented gotree, a bioinformatics toolkit with ~16,000 lines of Go and 40+ commands.

Read full article at Import AI →

#AI Agent #Claude