How to Run Minimax M1 via API: A Complete Guide

MiniMax M1, developed by a Shanghai-based AI startup, is a groundbreaking open-weight, large-scale hybrid-attention reasoning model. With a 1 million token context window, efficient reinforcement learning (RL) training, and competitive performance, it’s ideal for complex tasks like long-context reasoning, software engineering, and agentic tool use. This 1500-word guide explores MiniMax M1’s benchmarks and provides a step-by-step tutorial on running it via the OpenRouter API.

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!

button

MiniMax M1 Benchmarks: A Performance Overview

MiniMax M1 stands out due to its unique architecture and cost-effective training. Available in two variants—M1-40k and M1-80k, based on their “thinking budgets” or output lengths—it excels in multiple benchmarks. Below, we dive into its key performance metrics.

MiniMax M1-40k delivers above-average quality with an MMLU score of 0.808 and an Intelligence Index of 61. It outperforms many open-weight models in complex reasoning tasks. The M1-80k variant further enhances performance, leveraging extended computational resources. MiniMax M1 shines in benchmarks like FullStackBench, SWE-bench, MATH, GPQA, and TAU-Bench, surpassing competitors in tool-use scenarios and software engineering, making it ideal for debugging codebases or analyzing lengthy documents.

Minimax M1 Pricing

MiniMax M1-40k is cost-competitive at $0.82 per 1 million tokens (3:1 input-to-output ratio). Input tokens cost $0.40 per million, and output tokens cost $2.10 per million, cheaper than the industry average. MiniMax M1-80k is slightly pricier due to its extended thinking budget. Volume discounts are available for enterprise users, enhancing affordability for large-scale deployments.

Speed: MiniMax M1-40k’s output speed is 41.1 tokens per second, slower than average, reflecting its focus on long-context and complex reasoning tasks.
Latency: With a time-to-first-token (TTFT) of 1.35 seconds, MiniMax M1 offers quick initial responses, outperforming the average.
Context Window: MiniMax M1’s 1 million token input context and up to 80,000 token output dwarf most models, enabling processing of vast datasets like novels or code repositories.
Efficiency: MiniMax M1’s hybrid Mixture-of-Experts (MoE) architecture and Lightning Attention mechanism use 25% of the FLOPs required by competitors at 100,000-token generation length. Its $534,700 training cost is significantly lower than peers, making it cost-effective.

Minimax M1 Architecture and Training

MiniMax M1’s hybrid-attention design blends Lightning Attention (linear cost) with periodic Softmax Attention (quadratic but expressive) and a sparse MoE routing system, activating ~10% of its 456 billion parameters. Its RL training, powered by the CISPO algorithm, enhances efficiency by clipping importance sampling weights. MiniMax M1 was trained on 512 H800 GPUs in three weeks, a remarkable feat.

MiniMax M1 excels in long-context reasoning, cost-effectiveness, and agentic tasks, though its output speed lags. Its open-source Apache 2.0 license enables fine-tuning or on-premises deployment for sensitive workloads. Next, we explore running MiniMax M1 via the OpenRouter API.

Running MiniMax M1 via OpenRouter API

OpenRouter offers a unified, OpenAI-compatible API to access MiniMax M1, simplifying integration. Below is a step-by-step guide to running MiniMax M1 using OpenRouter.

Step 1: Set Up an OpenRouter Account

Visit OpenRouter’s website and sign up using email or OAuth providers like Google.
Generate an API key in the “API Keys” section of your dashboard and store it securely.
Add funds to your account via credit card to cover API usage costs. Check for promotions, as MiniMax M1 occasionally offers discounts.

Step 2: Understand MiniMax M1 on OpenRouter

MiniMax M1 on OpenRouter is optimized for:

Long-context document summarization
Software engineering (e.g., code debugging, generation)
Mathematical reasoning
Agentic tool use (e.g., function calling)

It typically defaults to the M1-40k variant, with pricing at ~$0.40 per million input tokens and $2.10 per million output tokens.

Step 3: Make MiniMax M1 API Requests

OpenRouter’s API works with OpenAI’s SDK. Here’s how to send requests:

Prerequisites

Install the OpenAI Python SDK: pip install openai
Use Python 3.7+.

Sample Code

Below is a Python script to query MiniMax M1:

python

from openai import OpenAI

# Initialize the client with OpenRouter's endpoint and your API key
client = OpenAI(
    base_url="<https://openrouter.ai/api/v1>",
    api_key="your_openrouter_api_key_here"
)

# Define the prompt and parameters
prompt = "Summarize the key features of MiniMax M1 in 100 words."
model = "minimax/minimax-m1"# Specify MiniMax M1
max_tokens = 200
temperature = 1.0# For creative responses
top_p = 0.95# For coherence# Make the API call
response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ],
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p
)

# Extract and print the response
output = response.choices[0].message.content
print("Response:", output)

Explanation

API Endpoint: Use https://openrouter.ai/api/v1.
API Key: Replace your_openrouter_api_key_here with your key.
Model: Select minimax/minimax-m1 for MiniMax M1.
Prompt: The system prompt guides MiniMax M1’s behavior. For coding, use specific prompts (e.g., You are a web development engineer).
Parameters: Set temperature=1.0 and top_p=0.95 for balanced responses. Adjust max_tokens as needed.

Step 4: Handle MiniMax M1 Responses

The API returns a JSON object with MiniMax M1’s output in choices[0].message.content. Ensure inputs don’t exceed 1 million tokens. If truncated, increase max_tokens or paginate output.

Step 5: Optimize MiniMax M1 for Specific Tasks

Long-Context Tasks: Include full text in the user message and set high max_tokens (e.g., 80,000 for M1-80k).
Coding: Use prompts like You are a powerful code editing assistant with clear instructions. MiniMax M1 supports function calling for agentic tasks.
Math Reasoning: Structure inputs clearly (e.g., “Solve: 2x + 3 = 7”) and lower temperature (e.g., 0.7) for precision.

Step 6: Monitor MiniMax M1 Usage and Costs

Track usage and costs in OpenRouter’s dashboard. Optimize prompts to minimize token counts, reducing input and output expenses.

Step 7: Explore Advanced MiniMax M1 Integrations

vLLM Deployment: Use vLLM for high-performance production serving of MiniMax M1.
Transformers: Deploy MiniMax M1 with Hugging Face’s Transformers library.
CometAPI: MiniMax M1’s API will soon be available on CometAPI for unified access.

Troubleshooting MiniMax M1

Rate Limits: Upgrade your OpenRouter plan if limits are reached.
Errors: Verify API key and model name. Check OpenRouter’s logs.
Performance: Reduce input tokens or use M1-40k for faster responses.

Conclusion

MiniMax M1 is a powerful, cost-effective AI model with unmatched long-context capabilities and strong reasoning performance. Its open-source nature and efficient training make it accessible for diverse applications. Using OpenRouter’s API, developers can integrate MiniMax M1 into projects like document summarization or code generation. Follow the steps above to get started and explore advanced deployment options for production. MiniMax M1 unlocks scalable, reasoning-driven AI for developers and enterprises alike.

💡

button