How Much Does It Cost to Use Xiaomi MiMo V2.5 in 2026?

MiMo V2.5 API now costs $1 input / $3 output / $0.20 cached per million tokens, permanently. Up to 99% off old long-context rates as of May 27, 2026.

Ashley Innocent

Ashley Innocent

27 May 2026

How Much Does It Cost to Use Xiaomi MiMo V2.5 in 2026?

Xiaomi MiMo V2.5 API pricing dropped to a flat $1 per million input tokens and $3 per million output tokens on May 27, 2026, and the team made the new rate permanent. The old long-context tier, where prompts past 256K tokens carried a steep multiplier on the base rate, is gone. One price now, regardless of context length. For most workloads the headline is a single sentence: MiMo V2.5 is one of the three cheapest 1M-context models in production, and it stays that way.

TL;DR

What changed on May 27, 2026

Xiaomi’s official price-update notice lays out three changes. All three took effect at 00:00 Beijing time on May 27, which is 16:00 UTC on May 26.

1. Flat pricing across context windows. The old MiMo V2.5 schedule used tiered rates: a base price for prompts up to 32K input tokens, a multiplier for the 32K to 256K band, and an even steeper rate above 256K. The new schedule has one number per token type. Long-context applications stop paying a long-context tax.

2. Permanent, not promotional. The notice uses the phrase “Permanent Price Reduction” twice and “permanently renovate the entire model pricing system” once. No expiry date. No rollback clause. Treat it as the new list price.

3. Token Plan rewards reset. If you’re on a Token Plan (Xiaomi’s prepaid quota system), your credit balance was increased 5 to 8 times and every credit you’d already consumed within your validity window was refunded. The validity period itself didn’t extend, so existing plans got a budget windfall but not more time.

The headline “up to 99% off” claim applies to the long-context band specifically. The prior price for 256K+ input tokens was high enough that flattening it to $1/M produces a 90%+ reduction. For workloads that lived in the base tier, the cut is smaller but still material.

The new permanent price sheet

Pricing per 1 million tokens, USD, effective immediately and permanent:

Model Input Output Cached Context
MiMo V2.5 Pro $1.00 $3.00 $0.20 1M tokens
MiMo V2 Flash ~$0.10 ~$0.40 $0.02 256K tokens

A few details the table doesn’t make obvious:

For the older V2-Pro pricing as a reference point, see our standing MiMo V2-Pro & Omni pricing guide.

What MiMo V2.5 brings beyond cheaper pricing

The May 27 announcement is a pricing event, but V2.5 itself is also a meaningful upgrade over V2-Pro launched in April. Three changes worth noting:

None of these are headline benchmarks, but they’re the changes that show up in real production deployments. Pair the cheaper pricing with the longer reliable context window and you have an option that didn’t exist for serious long-document work before May 27.

How MiMo V2.5 stacks up against the rest of the field

The interesting comparison isn’t V2.5’s old self. It’s against the other frontier-tier API options shipping in May 2026:

Model Input ($/MTok) Output ($/MTok) Context
Xiaomi MiMo V2.5 Pro $1.00 $3.00 1M
DeepSeek V4-Pro $0.435 $0.87 128K
GPT-5.5 $5.00 $30.00 200K
Claude Opus 4.7 $3.00 $15.00 200K
Gemini 3.5 Flash ~$1.50 ~$9.00 1M

Three takeaways:

For the DeepSeek side of this comparison, see DeepSeek V4-Pro 75% Price Cut Is Now Permanent. The two articles are companion reads. Both cover this week’s permanent frontier-tier cuts from Chinese labs.

Three workloads, three new bills

Three concrete cases using the new permanent rates:

1. Long-document RAG over enterprise PDFs. 50,000 queries/day, 800K-token context per query, 1K-token answers. Old MiMo V2.5 long-context tier (estimated $50/M effective rate): about $60,000/month. New flat rate: about $1,225/month. Savings: $58,775/month.

2. Code-review agent. 5,000 pull requests/day, 30K-token repo context, 2K-token comment output. Old GPT-5.5 monthly bill: about $5,250. New MiMo V2.5: about $510. Savings: $4,740/month.

3. Customer support chatbot. 200,000 turns/day, 4K-token system prompt, 300-token responses. Old Claude Opus 4.7 monthly bill: about $11,250. New MiMo V2.5: about $805. Savings: $10,445/month.

Workload #1 is where MiMo V2.5 separates from the rest. Long-context jobs were prohibitively expensive on every frontier API before this cut. They aren’t anymore. The same documents that used to ship to summarizers and chunking pipelines can now go to the model whole, with no token-budget gymnastics.

A short note on cache hits

The $0.20/M cached input rate is 5x cheaper than the $1.00 cache-miss rate. That’s a smaller cache discount than DeepSeek’s 120:1 ratio, but it’s still meaningful for any agent that reuses a stable system prompt.

A worked example. Suppose your assistant uses a 6,000-token system prompt and handles 80,000 chat turns per day, with an average user message of 250 input tokens and an average response of 600 output tokens:

That’s not the 88% DeepSeek caching delivers, but on a workload that runs to $500/day on input, half off is real money. Pin the system prompt, sort retrieved context stably, and don’t inject per-request timestamps into the prefix. The same rules that win cache hits everywhere else apply here too.

When MiMo V2.5 is the right call, when it isn’t

The new pricing makes MiMo V2.5 the default choice for two workload classes and a poor choice for one.

Right call:

Poor choice:

Caveats:

For the V2-Pro launch context that sets up V2.5, see Xiaomi Just Dropped Its Own AI Model, And It’s Free on OpenRouter. For the free-tier on-ramp, Xiaomi MiMo Orbit free 100T token program covers eligibility and signup.

Testing MiMo V2.5 with Apidog

The platform’s OpenAI compatibility is good, not perfect. Verify your integration before you flip production traffic.

Apidog lets you point a Chat Completions request at https://platform.xiaomimimo.com/v1 with your MiMo API key, then:

Download Apidog, import the OpenAI Chat Completion schema, change the base URL, and you have a working V2.5 test harness in under ten minutes. Same workflow we recommended in How to use the DeepSeek V4 API.

How the 2026 LLM price war is shaping up

MiMo V2.5 is the second permanent frontier-tier cut from a Chinese lab in a single week. DeepSeek made V4-Pro permanent at 1/4 of list price on May 22. Kimi K2 cut earlier in Q1. OpenAI O3 dropped 80% in February. The pattern is clear:

For the rest of this picture:

Where this leaves your build

The MiMo V2.5 cut isn’t a marketing stunt. It’s a structural repricing of the 1M-context tier, and the cut is permanent. If you’ve been deferring long-document RAG, repo-wide code agents, or any workload that needs >200K-token context on cost grounds, the budget you priced last quarter probably overstates this quarter’s need by an order of magnitude.

Three concrete next steps:

The price floor moved again. Build accordingly.

Explore more

The 2026 Chinese LLM Price War: Top 5 Frontier API Costs Compared

The 2026 Chinese LLM Price War: Top 5 Frontier API Costs Compared

DeepSeek $0.87, MiMo $3, Qwen $3.90, Kimi $0.07 cache, GLM $3.20. Full 2026 pricing comparison for the top 5 Chinese LLM APIs, with a buyer's matrix.

27 May 2026

How to use Local LLMs as APIs ?

How to use Local LLMs as APIs ?

Run local LLM APIs with Ollama, vLLM, or llama.cpp behind an OpenAI-compatible endpoint and test the whole flow with Apidog. Code, costs, latency.

26 May 2026

How to Use DeepSeek V4-Pro with Cursor: The Reasoning Proxy Setup Guide (2026)

How to Use DeepSeek V4-Pro with Cursor: The Reasoning Proxy Setup Guide (2026)

DeepSeek V4-Pro is a thinking model. Cursor strips reasoning_content from tool calls and breaks. Set up the open-source proxy in 5 minutes with this guide.

25 May 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs