Open Source · Hugging Face

Specifically, this data mix yields a 10% gain on the evaluation with OpenCode harness while maintaining performance

Tue, Jun 9 · 3:56 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

Similarly, the official Terminal-Bench uses its own Terminus 2 harness, where all the agent-CLI interactions are communicated via plain-text chat turns (instead of native tool calling).

Key facts

The final SFT model achieves 80.2% pass@10 on SWE-Bench Verified [ 2 ] and 55.1% pass@10 on Terminal-Bench v2 [ 7 ]
Higher performance and robustness with online RL, RLVR training improved the performance of the final model from the SFT initialization by 7.9% (absolute) pass@1 in Terminal-Bench v2 and 3.0%
The team deduplicate their environments against the repository sources from SWE-Bench [ 2 ] and SWE-Bench-Pro [ 3 ] to avoid source leakage during evaluation [ 4 ]
For Terminal-based tasks, they configure the agent with a simple ReAct harness employing a single terminal-use tool based on Harbor's Tmux session implementation [ 14 ], whereas for SWE tasks, they

Summary

North Mini Code is the first model in Cohere’s new family of models, and is specifically designed and trained for agentic software engineering tasks. Figure 1: North Mini Code’s performance in agentic coding tasks and complex code generation benchmarks, compared to leading open-source models of similar size. North Mini Code is optimized for complex software engineering workflows, terminal-based agentic tasks, and high-quality code generation. Real-world code agents depend on model quality and robustness across agent harnesses. Figure 2: North Mini Code is a Mixture-of-Experts Transformer decoder with interleaved sliding-window self-attention and full self-attention.

Read full article at Hugging Face →

#Open Source