What is MiniMax M2.5?

Discover MiniMax M2.5, the AI model achieving SOTA on SWE-Bench at 80.2%. Learn about its coding capabilities, agentic features, pricing ($0.30/hour), and how it compares to Claude Opus 4.6.

Ashley Innocent

Ashley Innocent

3 March 2026

What is MiniMax M2.5?

TL;DR

MiniMax M2.5 is a frontier AI model released on February 12, 2026, achieving state-of-the-art performance in coding (80.2% on SWE-Bench Verified), agentic tool use, and office productivity tasks. At just $0.30 per hour at 50 tokens/second throughput, it's priced at one-tenth to one-twentieth of competitors like Claude Opus 4.6 and GPT-5, making it the first "intelligence too cheap to meter" frontier model. The model completes complex coding tasks 37% faster than its predecessor, matching Claude Opus 4.6's speed while costing 90% less per task.

Introduction

MiniMax just introduced M2.5, a frontier model that challenges everything we thought we knew about the cost-performance tradeoff in large language models. The official announcement provides full technical details. With an 80.2% score on SWE-Bench Verified-the gold standard for coding capability-M2.5 isn't just competitive with top-tier models like Claude Opus 4.6 and GPT-5. In many metrics, it surpasses them.

But here's what makes this announcement genuinely disruptive: the pricing. At $0.30 per hour to run continuously at 50 tokens per second, or just $1 per hour at 100 tokens per second, MiniMax claims M2.5 delivers "intelligence too cheap to meter." For developers and businesses, the barrier to deploying sophisticated AI agents just collapsed.

💡
When building applications that integrate with AI models like MiniMax M2.5, you'll need to thoroughly test your API integrations. Apidog provides a comprehensive API testing platform that supports HTTP, WebSocket, and GraphQL endpoints-perfect for validating AI-powered applications.

What is MiniMax M2.5?

MiniMax M2.5 is the latest flagship model from Chinese AI company MiniMax, representing the third iteration in the company's M2 series released over just three and a half months (M2 in late October, M2.1 in late 2025, and M2.5 in February 2026).

What sets M2.5 apart is its focus on real-world productivity rather than just benchmark performance. Trained extensively with reinforcement learning across hundreds of thousands of complex real-world environments, M2.5 is designed to handle economically valuable tasks that developers and knowledge workers face daily.

The model comes in two variants:

Both versions support context caching and are functionally identical in capability, differing only in speed and pricing.

Key Specifications at a Glance

Specification Value
Release Date February 12, 2026
SWE-Bench Verified 80.2%
Multi-SWE-Bench 51.3%
BrowseComp 76.3%
Throughput (Standard) 50 TPS
Throughput (Lightning) 100 TPS
Input Pricing $0.30 per million tokens
Output Pricing $2.40 per million tokens

Coding Capabilities

If there's one area where MiniMax M2.5 flexes its muscles most dramatically, it's coding. The model achieves 80.2% on SWE-Bench Verified-a benchmark that tests the ability to resolve real-world GitHub issues-a figure that places it firmly in state-of-the-art territory.

But raw benchmark scores don't tell the full story. What makes M2.5 particularly interesting for developers is its architectural thinking capability. During training, the model developed what MiniMax describes as a "Spec-writing tendency"-before writing any code, M2.5 actively decomposes and plans features, structure, and UI design from the perspective of an experienced software architect.

Multilingual Programming Excellence

M2.5 was trained on over 10 programming languages across more than 200,000 real-world environments:

This isn't just about bug-fixing. The model handles the entire development lifecycle:

Cross-Platform Full-Stack Development

Unlike many coding assistants that focus primarily on frontend demos, M2.5 tackles full-stack projects across multiple platforms: Web, Android, iOS, and Windows. It handles server-side APIs, business logic, databases, and complex system architecture-not just webpage components.

Benchmark Performance Against Competition

MiniMax tested M2.5 on different coding agent harnesses to evaluate generalization across out-of-distribution environments:

Scaffold M2.5 Opus 4.6
Droid 79.7% 78.9%
OpenCode 76.1% 75.9%

M2.5 edges out Claude Opus 4.6 on both popular agent scaffolds, suggesting strong generalization capabilities.

When building AI-powered applications with M2.5, you'll need to test the APIs that connect your app to the model. Apidog lets you create test scenarios that validate request/response handling, authentication flows, and error handling-essential for production AI applications.

Modern AI isn't just about answering questions-it's about taking action. M2.5 demonstrates strong agentic capabilities, particularly in tool calling and autonomous search.

On benchmarks like BrowseComp and Wide Search, M2.5 achieves industry-leading performance. But more importantly, MiniMax built RISE (Realistic Interactive Search Evaluation) to test real-world professional search tasks-the kind that require deep exploration across information-dense webpages, not just simple search queries.

Efficient Decision-Making

Perhaps the most impressive aspect of M2.5's agentic capabilities is its efficiency. Across multiple agentic tasks including BrowseComp, Wide Search, and RISE, M2.5 achieved better results with approximately 20% fewer reasoning rounds compared to M2.1. This indicates the model doesn't just get the right answer-it finds efficient paths to get there.

This has practical implications: fewer API calls, lower costs, and faster task completion when deploying M2.5 as an autonomous agent.

Office Productivity Features

Coding isn't the only area where M2.5 excels. MiniMax specifically designed the model for real-world office productivity, collaborating with senior professionals in finance, law, and social sciences to train the model on genuinely deliverable outputs.

Word, PowerPoint, and Excel Mastery

M2.5 demonstrates significant capability improvements in high-value workspace scenarios:

MiniMax built an internal evaluation framework called GDPval-MM that assesses both output quality and the professionalism of the agent's entire workflow trajectory. In head-to-head comparisons against other mainstream models, M2.5 achieved a 59.0% average win rate.

Finance Modeling Specialization

The model was specifically trained on financial modeling problems constructed by industry experts. These involve end-to-end research and analysis tasks performed via Excel tools, scored using expert-designed rubrics. For finance professionals, this could represent a significant productivity leap.

Performance and Speed

Speed matters in real-world deployments. A model that's smarter but slower often provides worse user experience than a slightly less capable but faster alternative.

Token Generation Speed

M2.5 is served natively at 100 tokens per second for the Lightning variant-nearly twice as fast as other frontier models. This native throughput advantage compounds significantly when handling long-running agentic tasks.

SWE-Bench Runtime Comparison

Metric M2.1 M2.5 Opus 4.6
Avg tokens/task 3.72M 3.52M -
Avg runtime 31.3 min 22.8 min 22.9 min
Speed improvement - -37% -

M2.5 completes the SWE-Bench Verified evaluation 37% faster than M2.1, matching Claude Opus 4.6's runtime while using only 3.52 million tokens per task (compared to M2.1's 3.72M).

Pricing and Cost Efficiency

This is where M2.5 becomes genuinely disruptive. MiniMax has positioned the model as the first frontier AI where users "do not need to worry about cost."

Pricing Structure

Model Throughput Input Price Output Price
M2.5 50 TPS $0.30/million tokens $2.40/million tokens
M2.5-Lightning 100 TPS $0.60/million tokens $4.80/million tokens

Cost Comparisons

At full output throughput:

This translates to approximately one-tenth to one-twentieth the cost of Opus, Gemini 3 Pro, and GPT-5 based on output pricing.

Real-World Cost Example

Running M2.5 continuously for an hour costs just $1 at full speed. At 50 TPS, that drops to $0.30. For context, you could run four M2.5 instances continuously for an entire year for $10,000.

For businesses deploying AI agents at scale, this pricing fundamentally changes the economics. Tasks that were prohibitively expensive become viable. Experimental projects that would have burned through budget constraints become affordable explorations.

Technical Architecture

Reinforcement Learning at Scale

A key driver of M2.5's capabilities is the scaling of reinforcement learning. MiniMax converted most company tasks and workspaces into training environments-hundreds of thousands of real-world scenarios where the model learns through trial and error.

Forge: Agent-Native RL Framework

MiniMax developed Forge, an in-house agent-native RL framework that introduces an intermediary layer fully decoupling the underlying training-inference engine from the agent. This supports integration of arbitrary agents and enables optimization across different agent scaffolds and tools.

Key optimizations include:

CISPO Algorithm

For algorithmic stability during large-scale MoE (Mixture of Experts) training, M2.5 continues using the CISPO algorithm that MiniMax proposed in early 2025. To address credit assignment challenges in long contexts, they introduced a process reward mechanism for end-to-end monitoring of generation quality.

Training Environment Scale

By the numbers:

MiniMax Agent Integration

M2.5 isn't just an API-it's already powering MiniMax's own products.

Office Skills Integration

MiniMax distilled core information-processing capabilities into standardized Office Skills deeply integrated within MiniMax Agent. In MAX mode, when handling Word formatting, PowerPoint editing, and Excel calculations, the Agent automatically loads corresponding Office Skills based on file type.

Expert Creation

Users can combine Office Skills with domain-specific industry expertise to create reusable Experts for specific task scenarios. For example:

Adoption Metrics

This isn't theoretical capability-it's production-hardened technology.

How M2.5 Compares to Competitors

vs Claude Opus 4.6

Metric M2.5 Opus 4.6
SWE-Bench Verified 80.2% ~77%
Droid scaffold 79.7% 78.9%
OpenCode scaffold 76.1% 75.9%
Runtime on SWE-Bench 22.8 min 22.9 min
Cost/task ~$1.50 ~$15+

M2.5 matches or exceeds Opus 4.6 on coding benchmarks while costing approximately 10% per task.

vs GPT-5

vs Gemini 3 Pro

Conclusion

MiniMax M2.5 represents a genuine paradigm shift in the AI landscape. For the first time, we have a frontier model that combines state-of-the-art capability with pricing that enables unlimited deployment.

The key takeaways:

The question isn't whether M2.5 is worth trying-it's whether you can afford not to.

Ready to build and test AI-powered APIs? Download Apidog free and create comprehensive test suites for your MiniMax integrations. Import your existing Postman collections with one click and start testing in minutes.
button

FAQ

What is MiniMax M2.5?

MiniMax M2.5 is a frontier AI model released in February 2026 that achieves state-of-the-art performance in coding, agentic tasks, and office productivity. It's notable for its combination of top-tier benchmarks and extremely low pricing.

How does MiniMax M2.5 compare to Claude Opus 4.6?

M2.5 matches or exceeds Claude Opus 4.6 on most coding benchmarks (80.2% vs ~77% on SWE-Bench Verified) while costing approximately 90% less per task. It matches Opus 4.6's runtime speed (22.8 vs 22.9 minutes on SWE-Bench).

What is the pricing for MiniMax M2.5?

M2.5 costs $0.30 per million input tokens and $2.40 per million output tokens (at 50 TPS). At full throughput, running M2.5 continuously for an hour costs just $0.30-$1.00, depending on the variant.

What programming languages does M2.5 support?

M2.5 was trained on over 10 languages including Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, and Ruby across more than 200,000 real-world environments.

Is MiniMax M2.5 good for office work?

Yes. M2.5 was specifically trained for office productivity tasks including Word, PowerPoint, and Excel financial modeling. It achieved a 59% win rate against other mainstream models on office tasks in MiniMax's internal evaluations.

Can I use MiniMax M2.5 via API?

Yes. MiniMax provides API access through their platform at minimax.io. The API supports both the standard M2.5 (50 TPS) and M2.5-Lightning (100 TPS) variants.

What makes MiniMax M2.5 special?

M2.5 is the first "frontier model" where the cost is low enough that users don't need to worry about it-the company claims it's "intelligence too cheap to meter." Combined with top-tier coding benchmarks and agentic capabilities, this makes it viable for large-scale agent deployment.

How fast is MiniMax M2.5?

M2.5-Lightning generates at 100 tokens per second-nearly twice as fast as other frontier models. Even the standard M2.5 runs at 50 TPS. On SWE-Bench tasks, it completes evaluations 37% faster than M2.1.

Explore more

What Are the Top 100 OpenClaw Skills Every Developer Should Install for AI Agents?

What Are the Top 100 OpenClaw Skills Every Developer Should Install for AI Agents?

Discover the top 100 OpenClaw skills that transform your local AI assistant into an autonomous powerhouse. This technical guide breaks down installation, categories, and real-world applications for developers building with OpenClaw.

2 March 2026

What Are the Top 25 Awesome OpenClaw Skills to boost your AI Agent

What Are the Top 25 Awesome OpenClaw Skills to boost your AI Agent

Discover the top 25 awesome OpenClaw skills that transform your self-hosted AI agent into a productivity powerhouse. Engineers install these community-driven tools via ClawHub to automate GitHub workflows, manage calendars, control browsers, and more.

2 March 2026

Claude vs Claude Code vs Claude Cowork: Which One Should You Use?

Claude vs Claude Code vs Claude Cowork: Which One Should You Use?

Understand the differences between Claude, Claude Code, and Claude Cowork. Find the right Anthropic AI product for your workflow - coding, chat, or agentic tasks

28 February 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs