How to Run Minimax M1 via API: A Complete Guide

Rebecca Kovács

Rebecca Kovács

19 June 2025

How to Run Minimax M1 via API: A Complete Guide

MiniMax M1, developed by a Shanghai-based AI startup, is a groundbreaking open-weight, large-scale hybrid-attention reasoning model. With a 1 million token context window, efficient reinforcement learning (RL) training, and competitive performance, it’s ideal for complex tasks like long-context reasoning, software engineering, and agentic tool use. This 1500-word guide explores MiniMax M1’s benchmarks and provides a step-by-step tutorial on running it via the OpenRouter API.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

MiniMax M1 Benchmarks: A Performance Overview

MiniMax M1 stands out due to its unique architecture and cost-effective training. Available in two variants—M1-40k and M1-80k, based on their “thinking budgets” or output lengths—it excels in multiple benchmarks. Below, we dive into its key performance metrics.

MiniMax M1-40k delivers above-average quality with an MMLU score of 0.808 and an Intelligence Index of 61. It outperforms many open-weight models in complex reasoning tasks. The M1-80k variant further enhances performance, leveraging extended computational resources. MiniMax M1 shines in benchmarks like FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, surpassing competitors in tool-use scenarios and software engineering, making it ideal for debugging codebases or analyzing lengthy documents.

Minimax M1 Pricing

Source: Artificialanalysis.AI

MiniMax M1-40k is cost-competitive at $0.82 per 1 million tokens (3:1 input-to-output ratio). Input tokens cost $0.40 per million, and output tokens cost $2.10 per million, cheaper than the industry average. MiniMax M1-80k is slightly pricier due to its extended thinking budget. Volume discounts are available for enterprise users, enhancing affordability for large-scale deployments.

Minimax M1 Architecture and Training

MiniMax M1’s hybrid-attention design blends Lightning Attention (linear cost) with periodic Softmax Attention (quadratic but expressive) and a sparse MoE routing system, activating ~10% of its 456 billion parameters. Its RL training, powered by the CISPO algorithm, enhances efficiency by clipping importance sampling weights. MiniMax M1 was trained on 512 H800 GPUs in three weeks, a remarkable feat.

MiniMax M1 excels in long-context reasoning, cost-effectiveness, and agentic tasks, though its output speed lags. Its open-source Apache 2.0 license enables fine-tuning or on-premises deployment for sensitive workloads. Next, we explore running MiniMax M1 via the OpenRouter API.

Running MiniMax M1 via OpenRouter API

OpenRouter offers a unified, OpenAI-compatible API to access MiniMax M1, simplifying integration. Below is a step-by-step guide to running MiniMax M1 using OpenRouter.

MiniMax M1 - API, Providers, Stats
MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom “lightning attention” mechanism, allowing it to process long sequences—up to 1 million tokens—while m…

Step 1: Set Up an OpenRouter Account

  1. Visit OpenRouter’s website and sign up using email or OAuth providers like Google.
  2. Generate an API key in the “API Keys” section of your dashboard and store it securely.
  3. Add funds to your account via credit card to cover API usage costs. Check for promotions, as MiniMax M1 occasionally offers discounts.

Step 2: Understand MiniMax M1 on OpenRouter

MiniMax M1 on OpenRouter is optimized for:

It typically defaults to the M1-40k variant, with pricing at ~$0.40 per million input tokens and $2.10 per million output tokens.

Step 3: Make MiniMax M1 API Requests

OpenRouter’s API works with OpenAI’s SDK. Here’s how to send requests:

Prerequisites

Sample Code

Below is a Python script to query MiniMax M1:

python

from openai import OpenAI

# Initialize the client with OpenRouter's endpoint and your API key
client = OpenAI(
    base_url="<https://openrouter.ai/api/v1>",
    api_key="your_openrouter_api_key_here"
)

# Define the prompt and parameters
prompt = "Summarize the key features of MiniMax M1 in 100 words."
model = "minimax/minimax-m1"# Specify MiniMax M1
max_tokens = 200
temperature = 1.0# For creative responses
top_p = 0.95# For coherence# Make the API call
response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p
)

# Extract and print the response
output = response.choices[0].message.content
print("Response:", output)

Explanation

Step 4: Handle MiniMax M1 Responses

The API returns a JSON object with MiniMax M1’s output in choices[0].message.content. Ensure inputs don’t exceed 1 million tokens. If truncated, increase max_tokens or paginate output.

Step 5: Optimize MiniMax M1 for Specific Tasks

Step 6: Monitor MiniMax M1 Usage and Costs

Track usage and costs in OpenRouter’s dashboard. Optimize prompts to minimize token counts, reducing input and output expenses.

Step 7: Explore Advanced MiniMax M1 Integrations

Troubleshooting MiniMax M1

Conclusion

MiniMax M1 is a powerful, cost-effective AI model with unmatched long-context capabilities and strong reasoning performance. Its open-source nature and efficient training make it accessible for diverse applications. Using OpenRouter’s API, developers can integrate MiniMax M1 into projects like document summarization or code generation. Follow the steps above to get started and explore advanced deployment options for production. MiniMax M1 unlocks scalable, reasoning-driven AI for developers and enterprises alike.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

How to Use Amazon EKS MCP Server

How to Use Amazon EKS MCP Server

Discover how to install and use Amazon EKS MCP Server with Cline in VS Code. Create EKS clusters, deploy NGINX, and troubleshoot pods with AI-driven ease!

19 June 2025

What Cursor’s Pro Plan "Unlimited-with-Rate-Limits" Means

What Cursor’s Pro Plan "Unlimited-with-Rate-Limits" Means

Cursor’s Pro plan is now unlimited with rate limits. Learn what that means, how rate limits work, what burst and local limits mean and why users are confused.

19 June 2025

Cursor Pro Plan Goes Unlimited (with Rate Limits)

Cursor Pro Plan Goes Unlimited (with Rate Limits)

Cursor’s new Pro plan promises an unlimited-with-rate-limits model, but what does that really mean? Dive into the details, user concerns, and find out whether it is a smart upgrade or sneaky shift.

19 June 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs