Claude · ChatGPT · China · GPT · Decrypt
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
★ Tier-1 Source
Most people know Xiaomi as the Chinese phone brand.
Key facts
- The team covered the V2.5 Pro launch in April—it matches Claude Opus on most coding benchmarks and runs at roughly $0.43 input / $0.87 output per million tokens
- Claude Opus 4.6 lands around 71 with the lower end model, Haiku, touching 98 tokens per second
- Xiaomi released MiMo-V2.5-Pro-UltraSpeed, a serving mode for its trillion-parameter flagship that hits over 1,000 tokens per second—peaking near 1,200 in demos
- It hit 969 tokens per second on Meta's Llama 3.1 405B—impressive, but that's a 405-billion-parameter model, less than half the size of MiMo-V2.5-Pro
Summary
The speed comes from FP4 quantization on the model's expert layers and DFlash speculative decoding, which proposes a full block of tokens in one pass instead of one at a time. A limited API trial opens June 9 through June 23, priced at 3× standard MiMo rates for roughly 10× the generation speed. Xiaomi released MiMo-V2.5-Pro-UltraSpeed, a serving mode for its trillion-parameter flagship that hits over 1,000 tokens per second—peaking near 1,200 in demos. Parameters are the internal numerical weights that define how a model thinks—the more you have, the more complex the patterns it can recognize.