What is MiniMax M2.5?

TL;DR

MiniMax M2.5 is a frontier AI model released on February 12, 2026, achieving state-of-the-art performance in coding (80.2% on SWE-Bench Verified), agentic tool use, and office productivity tasks. At just $0.30 per hour at 50 tokens/second throughput, it's priced at one-tenth to one-twentieth of competitors like Claude Opus 4.6 and GPT-5, making it the first "intelligence too cheap to meter" frontier model. The model completes complex coding tasks 37% faster than its predecessor, matching Claude Opus 4.6's speed while costing 90% less per task.

Introduction

MiniMax just introduced M2.5, a frontier model that challenges everything we thought we knew about the cost-performance tradeoff in large language models. The official announcement provides full technical details. With an 80.2% score on SWE-Bench Verified-the gold standard for coding capability-M2.5 isn't just competitive with top-tier models like Claude Opus 4.6 and GPT-5. In many metrics, it surpasses them.

But here's what makes this announcement genuinely disruptive: the pricing. At $0.30 per hour to run continuously at 50 tokens per second, or just $1 per hour at 100 tokens per second, MiniMax claims M2.5 delivers "intelligence too cheap to meter." For developers and businesses, the barrier to deploying sophisticated AI agents just collapsed.

💡

When building applications that integrate with AI models like MiniMax M2.5, you'll need to thoroughly test your API integrations. Apidog provides a comprehensive API testing platform that supports HTTP, WebSocket, and GraphQL endpoints-perfect for validating AI-powered applications.

What is MiniMax M2.5?

MiniMax M2.5 is the latest flagship model from Chinese AI company MiniMax, representing the third iteration in the company's M2 series released over just three and a half months (M2 in late October, M2.1 in late 2025, and M2.5 in February 2026).

What sets M2.5 apart is its focus on real-world productivity rather than just benchmark performance. Trained extensively with reinforcement learning across hundreds of thousands of complex real-world environments, M2.5 is designed to handle economically valuable tasks that developers and knowledge workers face daily.

The model comes in two variants:

M2.5: 50 tokens per second throughput, half the cost of Lightning
M2.5-Lightning: 100 tokens per second, optimized for speed

Both versions support context caching and are functionally identical in capability, differing only in speed and pricing.

Key Specifications at a Glance

Specification	Value
Release Date	February 12, 2026
SWE-Bench Verified	80.2%
Multi-SWE-Bench	51.3%
BrowseComp	76.3%
Throughput (Standard)	50 TPS
Throughput (Lightning)	100 TPS
Input Pricing	$0.30 per million tokens
Output Pricing	$2.40 per million tokens

Coding Capabilities

If there's one area where MiniMax M2.5 flexes its muscles most dramatically, it's coding. The model achieves 80.2% on SWE-Bench Verified-a benchmark that tests the ability to resolve real-world GitHub issues-a figure that places it firmly in state-of-the-art territory.

But raw benchmark scores don't tell the full story. What makes M2.5 particularly interesting for developers is its architectural thinking capability. During training, the model developed what MiniMax describes as a "Spec-writing tendency"-before writing any code, M2.5 actively decomposes and plans features, structure, and UI design from the perspective of an experienced software architect.

Multilingual Programming Excellence

M2.5 was trained on over 10 programming languages across more than 200,000 real-world environments:

Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby

This isn't just about bug-fixing. The model handles the entire development lifecycle:

0-to-1: System design and environment setup
1-to-10: System development
10-to-90: Feature iteration
90-to-100: Comprehensive code review and system testing

Cross-Platform Full-Stack Development

Unlike many coding assistants that focus primarily on frontend demos, M2.5 tackles full-stack projects across multiple platforms: Web, Android, iOS, and Windows. It handles server-side APIs, business logic, databases, and complex system architecture-not just webpage components.

Benchmark Performance Against Competition

MiniMax tested M2.5 on different coding agent harnesses to evaluate generalization across out-of-distribution environments:

Scaffold	M2.5	Opus 4.6
Droid	79.7%	78.9%
OpenCode	76.1%	75.9%

M2.5 edges out Claude Opus 4.6 on both popular agent scaffolds, suggesting strong generalization capabilities.

When building AI-powered applications with M2.5, you'll need to test the APIs that connect your app to the model. Apidog lets you create test scenarios that validate request/response handling, authentication flows, and error handling-essential for production AI applications.

Agentic Tool Use and Search

Modern AI isn't just about answering questions-it's about taking action. M2.5 demonstrates strong agentic capabilities, particularly in tool calling and autonomous search.

BrowseComp and Wide Search

On benchmarks like BrowseComp and Wide Search, M2.5 achieves industry-leading performance. But more importantly, MiniMax built RISE (Realistic Interactive Search Evaluation) to test real-world professional search tasks-the kind that require deep exploration across information-dense webpages, not just simple search queries.

Efficient Decision-Making

Perhaps the most impressive aspect of M2.5's agentic capabilities is its efficiency. Across multiple agentic tasks including BrowseComp, Wide Search, and RISE, M2.5 achieved better results with approximately 20% fewer reasoning rounds compared to M2.1. This indicates the model doesn't just get the right answer-it finds efficient paths to get there.

This has practical implications: fewer API calls, lower costs, and faster task completion when deploying M2.5 as an autonomous agent.

Office Productivity Features

Coding isn't the only area where M2.5 excels. MiniMax specifically designed the model for real-world office productivity, collaborating with senior professionals in finance, law, and social sciences to train the model on genuinely deliverable outputs.

Word, PowerPoint, and Excel Mastery

M2.5 demonstrates significant capability improvements in high-value workspace scenarios:

Word: Document creation, formatting, and professional writing
PowerPoint: Presentation design and slide generation
Excel: Financial modeling and complex spreadsheet operations

MiniMax built an internal evaluation framework called GDPval-MM that assesses both output quality and the professionalism of the agent's entire workflow trajectory. In head-to-head comparisons against other mainstream models, M2.5 achieved a 59.0% average win rate.

Finance Modeling Specialization

The model was specifically trained on financial modeling problems constructed by industry experts. These involve end-to-end research and analysis tasks performed via Excel tools, scored using expert-designed rubrics. For finance professionals, this could represent a significant productivity leap.

Performance and Speed

Speed matters in real-world deployments. A model that's smarter but slower often provides worse user experience than a slightly less capable but faster alternative.

Token Generation Speed

M2.5 is served natively at 100 tokens per second for the Lightning variant-nearly twice as fast as other frontier models. This native throughput advantage compounds significantly when handling long-running agentic tasks.

SWE-Bench Runtime Comparison

Metric	M2.1	M2.5	Opus 4.6
Avg tokens/task	3.72M	3.52M	-
Avg runtime	31.3 min	22.8 min	22.9 min
Speed improvement	-	-37%	-

M2.5 completes the SWE-Bench Verified evaluation 37% faster than M2.1, matching Claude Opus 4.6's runtime while using only 3.52 million tokens per task (compared to M2.1's 3.72M).

Pricing and Cost Efficiency

This is where M2.5 becomes genuinely disruptive. MiniMax has positioned the model as the first frontier AI where users "do not need to worry about cost."

Pricing Structure

Model	Throughput	Input Price	Output Price
M2.5	50 TPS	$0.30/million tokens	$2.40/million tokens
M2.5-Lightning	100 TPS	$0.60/million tokens	$4.80/million tokens

Cost Comparisons

At full output throughput:

$1 per hour at 100 TPS (Lightning)
$0.30 per hour at 50 TPS (standard)

This translates to approximately one-tenth to one-twentieth the cost of Opus, Gemini 3 Pro, and GPT-5 based on output pricing.

Real-World Cost Example

Running M2.5 continuously for an hour costs just $1 at full speed. At 50 TPS, that drops to $0.30. For context, you could run four M2.5 instances continuously for an entire year for $10,000.

For businesses deploying AI agents at scale, this pricing fundamentally changes the economics. Tasks that were prohibitively expensive become viable. Experimental projects that would have burned through budget constraints become affordable explorations.

Technical Architecture

Reinforcement Learning at Scale

A key driver of M2.5's capabilities is the scaling of reinforcement learning. MiniMax converted most company tasks and workspaces into training environments-hundreds of thousands of real-world scenarios where the model learns through trial and error.

Forge: Agent-Native RL Framework

MiniMax developed Forge, an in-house agent-native RL framework that introduces an intermediary layer fully decoupling the underlying training-inference engine from the agent. This supports integration of arbitrary agents and enables optimization across different agent scaffolds and tools.

Key optimizations include:

Asynchronous scheduling strategies balancing throughput against sample off-policyness
Tree-structured merging strategy for training samples
Approximately 40x training speedup achieved

CISPO Algorithm

For algorithmic stability during large-scale MoE (Mixture of Experts) training, M2.5 continues using the CISPO algorithm that MiniMax proposed in early 2025. To address credit assignment challenges in long contexts, they introduced a process reward mechanism for end-to-end monitoring of generation quality.

Training Environment Scale

By the numbers:

Hundreds of thousands of real-world training environments
10+ programming languages
200,000+ code environments
Tasks spanning web, Android, iOS, and Windows development

MiniMax Agent Integration

M2.5 isn't just an API-it's already powering MiniMax's own products.

Office Skills Integration

MiniMax distilled core information-processing capabilities into standardized Office Skills deeply integrated within MiniMax Agent. In MAX mode, when handling Word formatting, PowerPoint editing, and Excel calculations, the Agent automatically loads corresponding Office Skills based on file type.

Expert Creation

Users can combine Office Skills with domain-specific industry expertise to create reusable Experts for specific task scenarios. For example:

Industry research: Merge a research framework SOP with Word Skills to automatically fetch data, organize logic, and output formatted reports
Financial modeling: Combine proprietary modeling standards with Excel Skills to follow specific risk control logic and calculation standards

Adoption Metrics

Over 10,000 Experts created on MiniMax Agent
30% of MiniMax's overall tasks autonomously completed by M2.5
80% of newly committed code at MiniMax is generated by M2.5

This isn't theoretical capability-it's production-hardened technology.

How M2.5 Compares to Competitors

vs Claude Opus 4.6

Metric	M2.5	Opus 4.6
SWE-Bench Verified	80.2%	~77%
Droid scaffold	79.7%	78.9%
OpenCode scaffold	76.1%	75.9%
Runtime on SWE-Bench	22.8 min	22.9 min
Cost/task	~$1.50	~$15+

M2.5 matches or exceeds Opus 4.6 on coding benchmarks while costing approximately 10% per task.

vs GPT-5

Significantly lower cost (1/10th to 1/20th the price)
Competitive coding benchmarks
Native office productivity features
Faster inference speed (100 TPS vs typical 30-50 TPS)

vs Gemini 3 Pro

Much lower pricing
Higher SWE-Bench scores
Better office productivity integration
More aggressive RL scaling approach

Conclusion

MiniMax M2.5 represents a genuine paradigm shift in the AI landscape. For the first time, we have a frontier model that combines state-of-the-art capability with pricing that enables unlimited deployment.

The key takeaways:

Top-tier coding performance (80.2% SWE-Bench, outperforming Opus 4.6 on multiple scaffolds)
Agentic efficiency (20% fewer reasoning rounds, 37% faster than M2.1)
Office productivity (59% win rate against competitors on real-world office tasks)
Unbeatable pricing ($0.30-$1/hour, 1/10th to 1/20th of competitors)
Production-ready (already powering MiniMax's own products, generating 80% of company code)

The question isn't whether M2.5 is worth trying-it's whether you can afford not to.

Ready to build and test AI-powered APIs? Download Apidog free and create comprehensive test suites for your MiniMax integrations. Import your existing Postman collections with one click and start testing in minutes.

button

FAQ

What is MiniMax M2.5?

MiniMax M2.5 is a frontier AI model released in February 2026 that achieves state-of-the-art performance in coding, agentic tasks, and office productivity. It's notable for its combination of top-tier benchmarks and extremely low pricing.

How does MiniMax M2.5 compare to Claude Opus 4.6?

M2.5 matches or exceeds Claude Opus 4.6 on most coding benchmarks (80.2% vs ~77% on SWE-Bench Verified) while costing approximately 90% less per task. It matches Opus 4.6's runtime speed (22.8 vs 22.9 minutes on SWE-Bench).

What is the pricing for MiniMax M2.5?

M2.5 costs $0.30 per million input tokens and $2.40 per million output tokens (at 50 TPS). At full throughput, running M2.5 continuously for an hour costs just $0.30-$1.00, depending on the variant.

What programming languages does M2.5 support?

M2.5 was trained on over 10 languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby across more than 200,000 real-world environments.

Is MiniMax M2.5 good for office work?

Yes. M2.5 was specifically trained for office productivity tasks including Word, PowerPoint, and Excel financial modeling. It achieved a 59% win rate against other mainstream models on office tasks in MiniMax's internal evaluations.

Can I use MiniMax M2.5 via API?

Yes. MiniMax provides API access through their platform at minimax.io. The API supports both the standard M2.5 (50 TPS) and M2.5-Lightning (100 TPS) variants.

What makes MiniMax M2.5 special?

M2.5 is the first "frontier model" where the cost is low enough that users don't need to worry about it-the company claims it's "intelligence too cheap to meter." Combined with top-tier coding benchmarks and agentic capabilities, this makes it viable for large-scale agent deployment.

How fast is MiniMax M2.5?

M2.5-Lightning generates at 100 tokens per second-nearly twice as fast as other frontier models. Even the standard M2.5 runs at 50 TPS. On SWE-Bench tasks, it completes evaluations 37% faster than M2.1.