What Is MiniMax M3? The First Open-Weight Frontier Coding Model

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

MiniMax M3 is an open-weight AI model that MiniMax released on June 1, 2026. It’s the first open-weight model to combine three things in one system: frontier-level coding, a context window of up to 1,000,000 tokens, and native multimodality that handles image and video input and can even operate a desktop computer.

That combination is the headline. Plenty of models do one or two of these well. M3 is the first you can run on your own weights that aims to do all three at once. MiniMax has also promised to publish the open weights and a full technical report within roughly 10 days of launch, so the model you read about today becomes something you can host yourself shortly after. If you’ve followed the open-weight race through releases like Qwen 3.7, M3 is the next big entry, and the launch details come straight from the MiniMax M3 announcement.

This article walks through what M3 is, the benchmarks MiniMax reported, how its architecture keeps long-context costs down, what you can build with it, and how to get access.

💡

If you’re planning to wire M3 into an application, you’ll want a way to inspect its API responses and tool calls; tools like Apidog make that step straightforward, and we’ll come back to it.

button

What makes M3 different

Most frontier models force a trade-off. You can have strong coding, or a huge context window, or multimodal input, but rarely all three in a single open model. M3’s pitch is that you no longer have to choose.

Here’s the three-way unification in plain terms:

Frontier coding. M3 targets the same tier as the strongest closed models on coding and agentic-software benchmarks, not just open ones.
1M-token context. You can feed it up to a million tokens at once. That’s a large codebase, a long document set, or a full chat history without aggressive truncation.
Native multimodality. It accepts images and video as input, and it can drive a desktop computer directly. MiniMax demonstrated it opening a local ERP client and batch-entering invoices on its own.

The open-weight angle is what ties this together. When weights are public, you can self-host for data-sensitive work, fine-tune on your own domain, and avoid per-call vendor lock-in. Combining that freedom with frontier coding and a million-token window is the part that hasn’t existed in one package before. For a sense of how the broader field is moving in this direction, the Chinese LLM price war of 2026 covers the competitive pressure pushing models like this into the open.

The numbers that matter

MiniMax published a set of benchmark results at launch. These are vendor-reported figures, so treat them as MiniMax’s own measurements rather than independent third-party scores. With that caveat, here’s how M3 lines up.

The result worth circling is SWE-Bench Pro at 59.0%. SWE-Bench Pro is a hard, contamination-resistant suite of real software-engineering tasks; you can read more about the methodology at the SWE-Bench project site. MiniMax reports that M3 clears both GPT-5.5 and Gemini 3.1 Pro on it and lands close to Claude Opus 4.7. For an open-weight model, that’s a strong claim.

M3 isn’t ahead everywhere. On PostTrainBench it scores 0.37, slightly behind Opus 4.7 (0.42) and GPT-5.5 (0.39). One honest gap on the scoreboard reads as more credible than a clean sweep.

One detail MiniMax hasn’t disclosed yet: parameter counts and active-parameter figures. Those numbers are expected with the technical report, so for now you can’t compute exact cost-per-parameter comparisons. If you want a head-to-head breakdown against the closed frontier, see MiniMax M3 vs Opus 4.7 vs GPT-5.5.

MSA architecture in plain English

M3’s efficiency comes from MSA, short for MiniMax Sparse Attention. Standard attention compares every token against every other token, so cost grows fast as your context gets longer. That’s what makes million-token windows expensive on conventional architectures.

Sparse attention changes the math. Instead of attending to everything, each token attends to a selected subset of the sequence. MiniMax reports that this cuts per-token compute to roughly 1/20 of its previous-generation model. The practical payoff shows up in two phases of inference:

Prefill (reading your prompt) is more than 9x faster.
Decode (generating the response) is more than 15x faster.

Why does that matter to you? Long-context work is usually slow and pricey, which pushes teams toward chunking and retrieval workarounds. When per-token cost drops by an order of magnitude, feeding a whole repository or a stack of long documents straight into the model becomes practical instead of a budget problem. The speedups also mean lower latency on agentic loops, where the model reads, acts, and reads again many times over.

What you can actually build

M3 is built for long-horizon agentic work, the kind where the model runs for a long stretch and produces something concrete. MiniMax shipped a few demonstrations that show the range:

24-hour CUDA kernel optimization. M3 worked on a kernel autonomously and reached a 9.4x speedup.
Autonomous paper reproduction. It reproduced a research paper across 18 commits and generated 23 experimental figures, managing the multi-step process on its own.
Computer use. It can operate a desktop application directly, like opening a local ERP client and batch-entering invoices.

The product wrapper for this is MiniMax Code, which adds Agent Team has: multi-stage, concurrent, and dynamically adjustable workflows. One pattern worth calling out is the “Producer plus Verifier” adversarial harness loop, where one agent generates work and another checks it before it’s accepted. That checker-in-the-loop design tends to cut the silent failures that plague single-pass agents.

If you’re building agents on top of M3, the hard part is rarely the model; it’s the plumbing between the model and your tools. Tool-call schemas drift, arguments come back malformed, and a single bad response can stall a whole workflow. This is where API testing earns its keep. You can capture M3’s tool-call responses and validate their structure in Apidog, so you catch a broken function call before it reaches production. For the design side of that work, agentic workflow tool wiring: patterns and pitfalls covers the common traps.

How to access M3

Right now MiniMax has two paths: subscription token plans and the API.

The subscription plans bundle a monthly token allowance

For programmatic access, the API uses an OpenAI-style chat-completions interface. The base URL is https://api.minimax.io/v1, you call POST /chat/completions, and the model id is MiniMax-M3. Authentication is a bearer token in the header:

POST https://api.minimax.io/v1/chat/completions
Authorization: Bearer $API_KEY
Content-Type: application/json

You can call it over raw HTTP, through the Anthropic SDK (MiniMax’s recommended route), or through the OpenAI SDK. The official MiniMax API reference has the full schema.

Two pricing details to know. API calls are billed at a standard rate when your input is 512K tokens or fewer, and at a higher long-context rate above 512K, so very large prompts cost more per call. There are also two service tiers: standard (the default) and priority. MiniMax hasn’t published an exact per-token price, so confirm current rates in the docs before you budget.

For a step-by-step setup with working requests, see how to use the MiniMax M3 API. If you’d rather try it without paying, how to use MiniMax M3 for free covers the no-cost options. Once you have a key, Download Apidog to send your first request and inspect the response shape before you write any application code.

How it stacks up against other open-weight models

M3 lands in a crowded field of open-weight models, many of them from Chinese labs pushing hard on price and capability. The current contenders include DeepSeek V4-pro, Qwen 3.7, Kimi k2.6, and GLM-5.1. Each has its own strengths across coding, reasoning, and multilingual work.

M3’s differentiator isn’t any single score; it’s the bundle. Few open-weight peers pair frontier coding with a true 1M-token window and native computer use in the same model. The closest comparisons tend to win on one axis while M3 spreads its bet across all three. That said, the technical report and open weights aren’t out yet, so independent benchmarks will be the real test. If you’re already running another open model, the Qwen 3.7 overview is a useful reference point for what M3 is competing against.

FAQ

Is MiniMax M3 open source? It’s open-weight. MiniMax has promised to publish the model weights and a technical report within roughly 10 days of the June 1, 2026 launch. As of writing, those weights aren’t out yet, so you can’t download and self-host today. Once MiniMax open-sources the weights, you’ll be able to run M3 on your own infrastructure.

What’s the context window? Up to 1,000,000 tokens. The MSA architecture is what makes a window that large affordable, since it cuts per-token compute to roughly 1/20 of the previous-generation model.

Is MiniMax M3 free? Not directly. MiniMax sells subscription token plans starting at $20/mo (Plus) and API access billed by tokens. There’s no published free tier from MiniMax itself, though how to use MiniMax M3 for free walks through the available no-cost routes.

How does M3 compare to Claude Opus 4.7? On MiniMax’s reported benchmarks, M3 approaches Opus 4.7 on SWE-Bench Pro (59.0%) and beats it on SVG-Bench, while trailing it on PostTrainBench (0.37 vs 0.42). These are vendor figures, so wait for independent testing before treating any single number as settled.

When do the weights drop? MiniMax committed to releasing both the open weights and the technical report within about 10 days of the June 1, 2026 launch. The technical report should also fill in the parameter counts, which MiniMax hasn’t disclosed yet.

Can M3 handle images and video? Yes. M3 is natively multimodal and accepts both image and video input. It also goes a step further with computer use, operating desktop applications directly rather than just describing what’s on screen.

The short version

MiniMax M3 is the first open-weight model to put frontier coding, a 1M-token context window, and native multimodality in one place. The MSA architecture keeps long-context costs down, the reported SWE-Bench Pro score puts it near the closed frontier, and the open weights are due within days of launch. The honest gaps, undisclosed parameter counts and a few benchmarks where it trails, are worth tracking as independent results come in. If you’re ready to build on it, grab an API key, test your first calls and tool responses in Apidog, and start small before you scale.

In this article

What makes M3 different The numbers that matter MSA architecture in plain English What you can actually build How to access M3 How it stacks up against other open-weight models FAQ The short version

Apidog: A Real Design-first API Development Platform

API Design

API Documentation

API Debugging

Automated Testing

API Mocking

More

Get Started for Free

Enterprise

On-Premises or SaaS or EU-hosted

SSO, RBAC & audit logs

SOC 2, GDPR, ISO 27001

Explore Apidog Enterprise

Explore more

What is Gemini 3.5 Flash-Lite?

Gemini 3.5 Flash-Lite is Google's cheapest, fastest Gemini tier: $0.30 input, ~350 tokens/sec. Get the specs, pricing, benchmarks, and how to test it.

22 July 2026

Gemini 3.6 Flash pricing: what it actually costs in 2026

Gemini 3.6 Flash pricing explained: $1.50/1M input, $7.50/1M output (thinking tokens included), caching costs, the free tier, and a worked monthly cost example.

22 July 2026

What is Gemini 3.6 Flash?

Gemini 3.6 Flash is Google's new workhorse model, GA July 21 2026. Cheaper and more token-efficient than 3.5 Flash. Specs, benchmarks, pricing, and access.

22 July 2026