TL;DR
MiniMax M2.5 is a frontier AI model released on February 12, 2026, achieving state-of-the-art performance in coding (80.2% on SWE-Bench Verified), agentic tool use, and office productivity tasks. At just $0.30 per hour at 50 tokens/second throughput, it's priced at one-tenth to one-twentieth of competitors like Claude Opus 4.6 and GPT-5, making it the first "intelligence too cheap to meter" frontier model. The model completes complex coding tasks 37% faster than its predecessor, matching Claude Opus 4.6's speed while costing 90% less per task.
Introduction
MiniMax just introduced M2.5, a frontier model that challenges everything we thought we knew about the cost-performance tradeoff in large language models. The official announcement provides full technical details. With an 80.2% score on SWE-Bench Verified-the gold standard for coding capability-M2.5 isn't just competitive with top-tier models like Claude Opus 4.6 and GPT-5. In many metrics, it surpasses them.
But here's what makes this announcement genuinely disruptive: the pricing. At $0.30 per hour to run continuously at 50 tokens per second, or just $1 per hour at 100 tokens per second, MiniMax claims M2.5 delivers "intelligence too cheap to meter." For developers and businesses, the barrier to deploying sophisticated AI agents just collapsed.
What is MiniMax M2.5?
MiniMax M2.5 is the latest flagship model from Chinese AI company MiniMax, representing the third iteration in the company's M2 series released over just three and a half months (M2 in late October, M2.1 in late 2025, and M2.5 in February 2026).

What sets M2.5 apart is its focus on real-world productivity rather than just benchmark performance. Trained extensively with reinforcement learning across hundreds of thousands of complex real-world environments, M2.5 is designed to handle economically valuable tasks that developers and knowledge workers face daily.

The model comes in two variants:
- M2.5: 50 tokens per second throughput, half the cost of Lightning
- M2.5-Lightning: 100 tokens per second, optimized for speed
Both versions support context caching and are functionally identical in capability, differing only in speed and pricing.
Key Specifications at a Glance
| Specification | Value |
|---|---|
| Release Date | February 12, 2026 |
| SWE-Bench Verified | 80.2% |
| Multi-SWE-Bench | 51.3% |
| BrowseComp | 76.3% |
| Throughput (Standard) | 50 TPS |
| Throughput (Lightning) | 100 TPS |
| Input Pricing | $0.30 per million tokens |
| Output Pricing | $2.40 per million tokens |
Coding Capabilities
If there's one area where MiniMax M2.5 flexes its muscles most dramatically, it's coding. The model achieves 80.2% on SWE-Bench Verified-a benchmark that tests the ability to resolve real-world GitHub issues-a figure that places it firmly in state-of-the-art territory.

But raw benchmark scores don't tell the full story. What makes M2.5 particularly interesting for developers is its architectural thinking capability. During training, the model developed what MiniMax describes as a "Spec-writing tendency"-before writing any code, M2.5 actively decomposes and plans features, structure, and UI design from the perspective of an experienced software architect.
Multilingual Programming Excellence
M2.5 was trained on over 10 programming languages across more than 200,000 real-world environments:
- Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby
This isn't just about bug-fixing. The model handles the entire development lifecycle:
- 0-to-1: System design and environment setup
- 1-to-10: System development
- 10-to-90: Feature iteration
- 90-to-100: Comprehensive code review and system testing
Cross-Platform Full-Stack Development
Unlike many coding assistants that focus primarily on frontend demos, M2.5 tackles full-stack projects across multiple platforms: Web, Android, iOS, and Windows. It handles server-side APIs, business logic, databases, and complex system architecture-not just webpage components.
Benchmark Performance Against Competition
MiniMax tested M2.5 on different coding agent harnesses to evaluate generalization across out-of-distribution environments:
| Scaffold | M2.5 | Opus 4.6 |
|---|---|---|
| Droid | 79.7% | 78.9% |
| OpenCode | 76.1% | 75.9% |
M2.5 edges out Claude Opus 4.6 on both popular agent scaffolds, suggesting strong generalization capabilities.
When building AI-powered applications with M2.5, you'll need to test the APIs that connect your app to the model. Apidog lets you create test scenarios that validate request/response handling, authentication flows, and error handling-essential for production AI applications.
Agentic Tool Use and Search
Modern AI isn't just about answering questions-it's about taking action. M2.5 demonstrates strong agentic capabilities, particularly in tool calling and autonomous search.
BrowseComp and Wide Search
On benchmarks like BrowseComp and Wide Search, M2.5 achieves industry-leading performance. But more importantly, MiniMax built RISE (Realistic Interactive Search Evaluation) to test real-world professional search tasks-the kind that require deep exploration across information-dense webpages, not just simple search queries.
Efficient Decision-Making
Perhaps the most impressive aspect of M2.5's agentic capabilities is its efficiency. Across multiple agentic tasks including BrowseComp, Wide Search, and RISE, M2.5 achieved better results with approximately 20% fewer reasoning rounds compared to M2.1. This indicates the model doesn't just get the right answer-it finds efficient paths to get there.
This has practical implications: fewer API calls, lower costs, and faster task completion when deploying M2.5 as an autonomous agent.
Office Productivity Features
Coding isn't the only area where M2.5 excels. MiniMax specifically designed the model for real-world office productivity, collaborating with senior professionals in finance, law, and social sciences to train the model on genuinely deliverable outputs.
Word, PowerPoint, and Excel Mastery
M2.5 demonstrates significant capability improvements in high-value workspace scenarios:
- Word: Document creation, formatting, and professional writing
- PowerPoint: Presentation design and slide generation
- Excel: Financial modeling and complex spreadsheet operations
MiniMax built an internal evaluation framework called GDPval-MM that assesses both output quality and the professionalism of the agent's entire workflow trajectory. In head-to-head comparisons against other mainstream models, M2.5 achieved a 59.0% average win rate.
Finance Modeling Specialization
The model was specifically trained on financial modeling problems constructed by industry experts. These involve end-to-end research and analysis tasks performed via Excel tools, scored using expert-designed rubrics. For finance professionals, this could represent a significant productivity leap.
Performance and Speed
Speed matters in real-world deployments. A model that's smarter but slower often provides worse user experience than a slightly less capable but faster alternative.
Token Generation Speed
M2.5 is served natively at 100 tokens per second for the Lightning variant-nearly twice as fast as other frontier models. This native throughput advantage compounds significantly when handling long-running agentic tasks.
SWE-Bench Runtime Comparison
| Metric | M2.1 | M2.5 | Opus 4.6 |
|---|---|---|---|
| Avg tokens/task | 3.72M | 3.52M | - |
| Avg runtime | 31.3 min | 22.8 min | 22.9 min |
| Speed improvement | - | -37% | - |

M2.5 completes the SWE-Bench Verified evaluation 37% faster than M2.1, matching Claude Opus 4.6's runtime while using only 3.52 million tokens per task (compared to M2.1's 3.72M).
Pricing and Cost Efficiency
This is where M2.5 becomes genuinely disruptive. MiniMax has positioned the model as the first frontier AI where users "do not need to worry about cost."
Pricing Structure
| Model | Throughput | Input Price | Output Price |
|---|---|---|---|
| M2.5 | 50 TPS | $0.30/million tokens | $2.40/million tokens |
| M2.5-Lightning | 100 TPS | $0.60/million tokens | $4.80/million tokens |
Cost Comparisons
At full output throughput:
- $1 per hour at 100 TPS (Lightning)
- $0.30 per hour at 50 TPS (standard)
This translates to approximately one-tenth to one-twentieth the cost of Opus, Gemini 3 Pro, and GPT-5 based on output pricing.
Real-World Cost Example
Running M2.5 continuously for an hour costs just $1 at full speed. At 50 TPS, that drops to $0.30. For context, you could run four M2.5 instances continuously for an entire year for $10,000.
For businesses deploying AI agents at scale, this pricing fundamentally changes the economics. Tasks that were prohibitively expensive become viable. Experimental projects that would have burned through budget constraints become affordable explorations.
Technical Architecture
Reinforcement Learning at Scale
A key driver of M2.5's capabilities is the scaling of reinforcement learning. MiniMax converted most company tasks and workspaces into training environments-hundreds of thousands of real-world scenarios where the model learns through trial and error.
Forge: Agent-Native RL Framework
MiniMax developed Forge, an in-house agent-native RL framework that introduces an intermediary layer fully decoupling the underlying training-inference engine from the agent. This supports integration of arbitrary agents and enables optimization across different agent scaffolds and tools.

Key optimizations include:
- Asynchronous scheduling strategies balancing throughput against sample off-policyness
- Tree-structured merging strategy for training samples
- Approximately 40x training speedup achieved
CISPO Algorithm
For algorithmic stability during large-scale MoE (Mixture of Experts) training, M2.5 continues using the CISPO algorithm that MiniMax proposed in early 2025. To address credit assignment challenges in long contexts, they introduced a process reward mechanism for end-to-end monitoring of generation quality.
Training Environment Scale
By the numbers:
- Hundreds of thousands of real-world training environments
- 10+ programming languages
- 200,000+ code environments
- Tasks spanning web, Android, iOS, and Windows development
MiniMax Agent Integration
M2.5 isn't just an API-it's already powering MiniMax's own products.
Office Skills Integration
MiniMax distilled core information-processing capabilities into standardized Office Skills deeply integrated within MiniMax Agent. In MAX mode, when handling Word formatting, PowerPoint editing, and Excel calculations, the Agent automatically loads corresponding Office Skills based on file type.
Expert Creation
Users can combine Office Skills with domain-specific industry expertise to create reusable Experts for specific task scenarios. For example:
- Industry research: Merge a research framework SOP with Word Skills to automatically fetch data, organize logic, and output formatted reports
- Financial modeling: Combine proprietary modeling standards with Excel Skills to follow specific risk control logic and calculation standards
Adoption Metrics
- Over 10,000 Experts created on MiniMax Agent
- 30% of MiniMax's overall tasks autonomously completed by M2.5
- 80% of newly committed code at MiniMax is generated by M2.5
This isn't theoretical capability-it's production-hardened technology.
How M2.5 Compares to Competitors
vs Claude Opus 4.6
| Metric | M2.5 | Opus 4.6 |
|---|---|---|
| SWE-Bench Verified | 80.2% | ~77% |
| Droid scaffold | 79.7% | 78.9% |
| OpenCode scaffold | 76.1% | 75.9% |
| Runtime on SWE-Bench | 22.8 min | 22.9 min |
| Cost/task | ~$1.50 | ~$15+ |
M2.5 matches or exceeds Opus 4.6 on coding benchmarks while costing approximately 10% per task.
vs GPT-5
- Significantly lower cost (1/10th to 1/20th the price)
- Competitive coding benchmarks
- Native office productivity features
- Faster inference speed (100 TPS vs typical 30-50 TPS)
vs Gemini 3 Pro
- Much lower pricing
- Higher SWE-Bench scores
- Better office productivity integration
- More aggressive RL scaling approach
Conclusion
MiniMax M2.5 represents a genuine paradigm shift in the AI landscape. For the first time, we have a frontier model that combines state-of-the-art capability with pricing that enables unlimited deployment.
The key takeaways:
- Top-tier coding performance (80.2% SWE-Bench, outperforming Opus 4.6 on multiple scaffolds)
- Agentic efficiency (20% fewer reasoning rounds, 37% faster than M2.1)
- Office productivity (59% win rate against competitors on real-world office tasks)
- Unbeatable pricing ($0.30-$1/hour, 1/10th to 1/20th of competitors)
- Production-ready (already powering MiniMax's own products, generating 80% of company code)
The question isn't whether M2.5 is worth trying-it's whether you can afford not to.
Ready to build and test AI-powered APIs? Download Apidog free and create comprehensive test suites for your MiniMax integrations. Import your existing Postman collections with one click and start testing in minutes.
FAQ
What is MiniMax M2.5?
MiniMax M2.5 is a frontier AI model released in February 2026 that achieves state-of-the-art performance in coding, agentic tasks, and office productivity. It's notable for its combination of top-tier benchmarks and extremely low pricing.
How does MiniMax M2.5 compare to Claude Opus 4.6?
M2.5 matches or exceeds Claude Opus 4.6 on most coding benchmarks (80.2% vs ~77% on SWE-Bench Verified) while costing approximately 90% less per task. It matches Opus 4.6's runtime speed (22.8 vs 22.9 minutes on SWE-Bench).
What is the pricing for MiniMax M2.5?
M2.5 costs $0.30 per million input tokens and $2.40 per million output tokens (at 50 TPS). At full throughput, running M2.5 continuously for an hour costs just $0.30-$1.00, depending on the variant.
What programming languages does M2.5 support?
M2.5 was trained on over 10 languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby across more than 200,000 real-world environments.
Is MiniMax M2.5 good for office work?
Yes. M2.5 was specifically trained for office productivity tasks including Word, PowerPoint, and Excel financial modeling. It achieved a 59% win rate against other mainstream models on office tasks in MiniMax's internal evaluations.
Can I use MiniMax M2.5 via API?
Yes. MiniMax provides API access through their platform at minimax.io. The API supports both the standard M2.5 (50 TPS) and M2.5-Lightning (100 TPS) variants.
What makes MiniMax M2.5 special?
M2.5 is the first "frontier model" where the cost is low enough that users don't need to worry about it-the company claims it's "intelligence too cheap to meter." Combined with top-tier coding benchmarks and agentic capabilities, this makes it viable for large-scale agent deployment.
How fast is MiniMax M2.5?
M2.5-Lightning generates at 100 tokens per second-nearly twice as fast as other frontier models. Even the standard M2.5 runs at 50 TPS. On SWE-Bench tasks, it completes evaluations 37% faster than M2.1.



