MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

Mon, Jun 8 · 3:27 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

◌ Single Source

MiMo-V2.5-Pro UltraSpeed real-time generation speed comparison (up to ~1200 tokens/s)

The speed of AI reasoning is no different, it defines the boundaries of intelligence itself.

Key facts

The MiMo-V2.5-Pro-UltraSpeed API launches simultaneously at a limited-time promotional price, 3× the cost of MiMo-V2.5-Pro, but delivering approximately 10× the generation speed
[1] OCP Microscaling Formats (MX) v1.0 Spec: opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
Today, they are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, breaking the 1000 tokens/s decode speed on a 1-trillion-parameter model for the first time
To ensure quality and fairness under resource constraints, the following rules apply: each account may enter the queue up to 10 times per day; each session is capped at 30 minutes; sessions idle

Summary

From the first roaring racer of the combustion age to the sonic boom that shattered the sound barrier, humanity's hunger for speed is written into their DNA. Today, they are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with TileRT, breaking the 1000 tokens/s decode speed on a 1-trillion-parameter model for the first time! The MiMo-V2.5-Pro-UltraSpeed API launches simultaneously at a limited-time promotional price, 3× the cost of MiMo-V2.5-Pro, but delivering approximately 10× the generation speed! 3× the price, 10× the output experience. (API only; Token Plan not supported.) Due to limited high-speed inference resources, MiMo-V2.5-Pro-UltraSpeed will be available through an application-based, limited-time window. API platform: platform.xiaomimimo.com/ultraspeed. For standard model access, please follow the MiMo-V2.5 model series.

Read full article at mimo.xiaomi.com →

#AI Reasoning #AI Agent