Xiaomi MiMo V2.5 API pricing dropped to a flat $1 per million input tokens and $3 per million output tokens on May 27, 2026, and the team made the new rate permanent. The old long-context tier, where prompts past 256K tokens carried a steep multiplier on the base rate, is gone. One price now, regardless of context length. For most workloads the headline is a single sentence: MiMo V2.5 is one of the three cheapest 1M-context models in production, and it stays that way.
TL;DR
- Xiaomi MiMo V2.5 permanent pricing as of May 27, 2026: $1.00 input, $3.00 output, $0.20 cached per million tokens, with a 1M-token context window.
- The “up to 99% off” claim is real on the long-context tier. The prior schedule scaled hard past 256K input tokens. The new flat rate kills the multiplier.
- Token Plan customers got a 5x to 8x quota increase and a full reset of used credits within their validity window.
- The cut is permanent, not promotional. Xiaomi’s official notice says “permanently renovate the entire model pricing system.”
- Context: Xiaomi is the second Chinese lab to make a permanent frontier-tier cut this week. DeepSeek made V4-Pro permanent at 1/4 of list price three days earlier.
What changed on May 27, 2026
Xiaomi’s official price-update notice lays out three changes. All three took effect at 00:00 Beijing time on May 27, which is 16:00 UTC on May 26.

1. Flat pricing across context windows. The old MiMo V2.5 schedule used tiered rates: a base price for prompts up to 32K input tokens, a multiplier for the 32K to 256K band, and an even steeper rate above 256K. The new schedule has one number per token type. Long-context applications stop paying a long-context tax.
2. Permanent, not promotional. The notice uses the phrase “Permanent Price Reduction” twice and “permanently renovate the entire model pricing system” once. No expiry date. No rollback clause. Treat it as the new list price.
3. Token Plan rewards reset. If you’re on a Token Plan (Xiaomi’s prepaid quota system), your credit balance was increased 5 to 8 times and every credit you’d already consumed within your validity window was refunded. The validity period itself didn’t extend, so existing plans got a budget windfall but not more time.

The headline “up to 99% off” claim applies to the long-context band specifically. The prior price for 256K+ input tokens was high enough that flattening it to $1/M produces a 90%+ reduction. For workloads that lived in the base tier, the cut is smaller but still material.
The new permanent price sheet
Pricing per 1 million tokens, USD, effective immediately and permanent:
| Model | Input | Output | Cached | Context |
|---|---|---|---|---|
| MiMo V2.5 Pro | $1.00 | $3.00 | $0.20 | 1M tokens |
| MiMo V2 Flash | ~$0.10 | ~$0.40 | $0.02 | 256K tokens |
A few details the table doesn’t make obvious:
- The cache rate ($0.20/M for V2.5 Pro) is 5x the input rate. That’s a worse ratio than DeepSeek’s 120:1 input-miss-to-input-hit. Xiaomi’s cache is still useful for repeated system prompts, but the savings are smaller in absolute terms.
- The 1M context window is the part most articles undersell. Most US-hosted frontier models cap at 200K to 400K. MiMo V2.5 Pro takes the full document.
- The notice mentions but doesn’t itemize the V2.5 Omni and TTS variants. Verify those separately on the platform.
For the older V2-Pro pricing as a reference point, see our standing MiMo V2-Pro & Omni pricing guide.
What MiMo V2.5 brings beyond cheaper pricing
The May 27 announcement is a pricing event, but V2.5 itself is also a meaningful upgrade over V2-Pro launched in April. Three changes worth noting:
- Longer practical context. V2.5 Pro keeps the 1M-token theoretical window, but Xiaomi tightened retrieval quality in the 200K to 800K band where most long-context models degrade. Needle-in-haystack accuracy holds above 95% out to 800K tokens.
- Better tool-call format compliance. V2-Pro had known issues with parallel tool calls returning malformed JSON inside streamed responses. V2.5 reduces those failures, though not to zero. Plan on JSON schema validation either way.
- Refreshed training corpus. V2.5 was trained with data through Q1 2026. Citations and knowledge cutoff land roughly three months ahead of V2-Pro.
None of these are headline benchmarks, but they’re the changes that show up in real production deployments. Pair the cheaper pricing with the longer reliable context window and you have an option that didn’t exist for serious long-document work before May 27.
How MiMo V2.5 stacks up against the rest of the field
The interesting comparison isn’t V2.5’s old self. It’s against the other frontier-tier API options shipping in May 2026:
| Model | Input ($/MTok) | Output ($/MTok) | Context |
|---|---|---|---|
| Xiaomi MiMo V2.5 Pro | $1.00 | $3.00 | 1M |
| DeepSeek V4-Pro | $0.435 | $0.87 | 128K |
| GPT-5.5 | $5.00 | $30.00 | 200K |
| Claude Opus 4.7 | $3.00 | $15.00 | 200K |
| Gemini 3.5 Flash | ~$1.50 | ~$9.00 | 1M |
Three takeaways:
- DeepSeek V4-Pro is still cheaper than MiMo V2.5 on a per-token basis. Roughly 2.3x cheaper on input and 3.5x cheaper on output. If raw cost-per-token is your only metric, DeepSeek wins.
- MiMo V2.5 wins on 1M-context workloads. Gemini 3.5 Flash is the only other 1M-context option in the table, and it’s 1.5x more expensive on input and 3x more expensive on output.
- MiMo V2.5 is 5x cheaper than GPT-5.5 on input and 10x cheaper on output, with comparable benchmark performance per Artificial Analysis.
For the DeepSeek side of this comparison, see DeepSeek V4-Pro 75% Price Cut Is Now Permanent. The two articles are companion reads. Both cover this week’s permanent frontier-tier cuts from Chinese labs.
Three workloads, three new bills
Three concrete cases using the new permanent rates:
1. Long-document RAG over enterprise PDFs. 50,000 queries/day, 800K-token context per query, 1K-token answers. Old MiMo V2.5 long-context tier (estimated $50/M effective rate): about $60,000/month. New flat rate: about $1,225/month. Savings: $58,775/month.
2. Code-review agent. 5,000 pull requests/day, 30K-token repo context, 2K-token comment output. Old GPT-5.5 monthly bill: about $5,250. New MiMo V2.5: about $510. Savings: $4,740/month.
3. Customer support chatbot. 200,000 turns/day, 4K-token system prompt, 300-token responses. Old Claude Opus 4.7 monthly bill: about $11,250. New MiMo V2.5: about $805. Savings: $10,445/month.
Workload #1 is where MiMo V2.5 separates from the rest. Long-context jobs were prohibitively expensive on every frontier API before this cut. They aren’t anymore. The same documents that used to ship to summarizers and chunking pipelines can now go to the model whole, with no token-budget gymnastics.
A short note on cache hits
The $0.20/M cached input rate is 5x cheaper than the $1.00 cache-miss rate. That’s a smaller cache discount than DeepSeek’s 120:1 ratio, but it’s still meaningful for any agent that reuses a stable system prompt.
A worked example. Suppose your assistant uses a 6,000-token system prompt and handles 80,000 chat turns per day, with an average user message of 250 input tokens and an average response of 600 output tokens:
- Without cache hits: 80,000 turns × 6,250 input × $1.00 / 1,000,000 = $500 per day on input alone.
- With 60% cache hits on the system-prompt prefix: 80,000 × (250 × $1.00 + 6,000 × (0.6 × $0.20 + 0.4 × $1.00)) / 1,000,000 = about $271 per day. A 46% reduction.
That’s not the 88% DeepSeek caching delivers, but on a workload that runs to $500/day on input, half off is real money. Pin the system prompt, sort retrieved context stably, and don’t inject per-request timestamps into the prefix. The same rules that win cache hits everywhere else apply here too.
When MiMo V2.5 is the right call, when it isn’t
The new pricing makes MiMo V2.5 the default choice for two workload classes and a poor choice for one.
Right call:
- Long-document RAG, code-base agents, repo-wide refactors. Anything that fits naturally into a >200K-token context. The flat pricing plus 1M window is unmatched in the cheap tier.
- High-volume document processing. Pricing is predictable and the cached rate ($0.20/M) lets you batch identical prefixes cheaply. See How prompt caching supercharges LLM performance and reduces costs for the cache mechanics across providers.
Poor choice:
- Latency-critical interactive chat. MiMo V2.5 Pro is not the fastest first-token model. For typeahead, autocomplete, or sub-second chat, DeepSeek V4-Flash or Gemini 3.5 Flash are better latency profiles at similar cost.
Caveats:
- Data residency. Calls route through Xiaomi’s infrastructure in China. Same procurement conversation as DeepSeek.
- Reliability. Xiaomi’s first-party API has a shorter operational history than US-hosted frontier models. For SLA-backed production, route through OpenRouter or another aggregator.
- Function calling parity. OpenAI-compatible at the schema level, with edge cases around streamed tool arguments and parallel tool calls. Test before you ship.
For the V2-Pro launch context that sets up V2.5, see Xiaomi Just Dropped Its Own AI Model, And It’s Free on OpenRouter. For the free-tier on-ramp, Xiaomi MiMo Orbit free 100T token program covers eligibility and signup.
Testing MiMo V2.5 with Apidog
The platform’s OpenAI compatibility is good, not perfect. Verify your integration before you flip production traffic.

Apidog lets you point a Chat Completions request at https://platform.xiaomimimo.com/v1 with your MiMo API key, then:
- Record golden responses from V2.5 Pro and replay them on every prompt change so drift shows up before users do.
- Validate
tool_callsshapes with JSON Schema assertions. Streaming function arguments are where the OpenAI-compatibility cracks tend to show. - Run side-by-side comparisons against your current model (GPT-5.5, Claude, DeepSeek V4-Pro) with the same input batch using Apidog’s test scenarios.
Download Apidog, import the OpenAI Chat Completion schema, change the base URL, and you have a working V2.5 test harness in under ten minutes. Same workflow we recommended in How to use the DeepSeek V4 API.
How the 2026 LLM price war is shaping up
MiMo V2.5 is the second permanent frontier-tier cut from a Chinese lab in a single week. DeepSeek made V4-Pro permanent at 1/4 of list price on May 22. Kimi K2 cut earlier in Q1. OpenAI O3 dropped 80% in February. The pattern is clear:
- Chinese labs are competing on price. These cuts aren’t promo flags. They’re structural.
- US labs are competing on capability and bundling. OpenAI and Anthropic are holding their flagship-tier prices and shipping features (thinking modes, MCP servers, agentic workflows) to justify the premium.
- The benchmark gap is small enough that most workloads should re-test. Public benchmarks put MiMo V2.5 within single-digit percentage points of GPT-5.5 on most coding and reasoning tasks per Artificial Analysis.
For the rest of this picture:
- DeepSeek V4-Pro permanent price cut covers the comparable Chinese-lab move.
- Kimi K2 API pricing walks through the third major Chinese cut of 2026.
- OpenAI O3 pricing drop covers the US response in February.
- Gemini 3.0 API cost maps Google’s tier strategy.
- The full Claude API cost breakdown walks through where Opus, Sonnet, and Haiku fit. MiMo-7B sits in a different niche; see MiMo-7B-RL benchmarks for the small-model side of Xiaomi’s lineup.
Where this leaves your build
The MiMo V2.5 cut isn’t a marketing stunt. It’s a structural repricing of the 1M-context tier, and the cut is permanent. If you’ve been deferring long-document RAG, repo-wide code agents, or any workload that needs >200K-token context on cost grounds, the budget you priced last quarter probably overstates this quarter’s need by an order of magnitude.
Three concrete next steps:
- Pull your top three workloads by token volume and re-cost them at the new flat rate. The ones running long contexts will surprise you.
- Run a 100-sample eval against V2.5 Pro and your current model with identical prompts. Most teams find the quality band is acceptable for 70% to 85% of traffic.
- Wire up an Apidog regression suite so the next price cut, and there will be one, takes hours to evaluate instead of weeks.
The price floor moved again. Build accordingly.



