Apidog

All-in-one Collaborative API Development Platform

API Design

API Documentation

API Debugging

API Mocking

API Automated Testing

MiMo-7B-RL: the Reasoning LLM from Xiaomi

Stefania Boiko

Stefania Boiko

Updated on May 1, 2025

Xiaomi's LLM-Core Team presents MiMo-7B-RL, challenging the idea that top-tier reasoning in AI requires massive models. This 7-billion-parameter model, specifically engineered for mathematical and coding tasks, demonstrates performance that rivals much larger models and specialized systems like OpenAI's o1-mini. This achievement results from a comprehensive strategy optimizing the entire model lifecycle, proving potent reasoning can be unlocked in more efficient architectures.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

What is MiMo-7B

The development of MiMo-7B hinges on the belief that a model's fundamental reasoning capability is established during pre-training. While later fine-tuning stages are essential, the initial foundation is critical. Xiaomi identified that many smaller models struggle with complex reasoning because their base training lacks sufficient exposure to logical patterns.

To counter this, MiMo's pre-training was meticulously designed to maximize "reasoning pattern density." This involved sophisticated data processing: enhancing text extraction to capture complex structures in technical documents and code, applying multi-dimensional filters to concentrate reasoning examples, and generating vast synthetic datasets embodying logical steps and problem-solving. A three-stage data mixture strategy was employed during pre-training, utilizing approximately 25 trillion tokens to build the MiMo-7B-Base model.

Furthermore, Xiaomi incorporated Multiple-Token Prediction (MTP) as an auxiliary training objective. This technique, where the model predicts several tokens ahead, potentially enhances understanding of complex dependencies and can accelerate inference through speculative decoding.

Advanced Reinforcement Learning

Building upon the fine-tuned MiMo-7B-SFT model, the Reinforcement Learning (RL) phase specifically targets math and code proficiency. A high-quality dataset of 130,000 carefully curated math and code problems, all verifiable through rule-based checks (like unit tests or numerical validation), formed the basis for training.

To ensure genuine capability improvement and avoid "reward hacking," only objective, rule-based accuracy rewards were used. A novel "test difficulty driven code reward" system was introduced to tackle the sparse reward problem inherent in complex code generation. Instead of an all-or-nothing reward, this system grants partial credit for passing easier test cases within a problem, providing a denser gradient signal for the model to learn from.

Efficiency was also key. As the model improved, a data re-sampling strategy down-weighted easier problems, focusing training on more challenging examples. Xiaomi also developed a "Seamless Rollout Engine," an optimized RL infrastructure that integrates continuous generation, asynchronous reward calculation, and early termination to minimize GPU idle time, yielding significant training (2.29x) and validation (1.96x) speedups.

MiMo-7B-RL Family: A Quick Look

Xiaomi has released several models showcasing the development stages:

Model Description
MiMo-7B-Base Base model with strong inherent reasoning potential
MiMo-7B-RL-Zero RL applied directly to the base model
MiMo-7B-SFT Supervised Fine-Tuned model from the base
MiMo-7B-RL RL applied to the SFT model, top reasoning performance

MiMo-7B-RL Benchmarks

Evaluation results highlight MiMo-7B-RL's strengths, particularly when compared against leading models using a generation temperature of 0.6.

Comparative Performance:

Benchmark GPT-4o-0513 Claude-3.5-Sonnet-1022 OpenAI o1-mini MiMo-7B-RL
Mathematics
MATH-500(Pass@1) 74.6 78.3 90.0 95.8
AIME 2024(Pass@1) 9.3 16.0 63.6 68.2
AIME 2025(Pass@1) 11.6 7.4 50.7 55.4
Code
LiveCodeBench v5(Pass@1) 32.9 38.9 53.8 57.8
LiveCodeBench v6(Pass@1) 30.9 37.2 46.8 49.3

(Selected math/code benchmarks shown)

MiMo-7B-RL demonstrates exceptional performance in mathematics and coding, often exceeding significantly larger models and specialized reasoning models like o1-mini on challenging benchmarks like MATH, AIME, and recent LiveCodeBench versions. While its general reasoning scores are strong for its size, they naturally trail the largest frontier models, reflecting its specialized training focus.

Performance Within the MiMo Series:

Benchmark MiMo-7B-Base MiMo-7B-RL-Zero MiMo-7B-SFT MiMo-7B-RL
Mathematics
MATH500(Pass@1) 37.4 93.6 93.0 95.8
AIME 2024(Pass@1) 32.9 56.4 58.7 68.2
Code
LiveCodeBench v5(Pass@1) 32.9 49.1 52.3 57.8

This internal comparison illustrates the effectiveness of each training stage. The base model shows strong initial reasoning, which is significantly boosted by SFT, and further refined to peak performance by the final RL phase targeting math and code. Applying RL directly to the base (RL-Zero) is effective, but the SFT intermediate step appears beneficial for achieving the highest scores.

Running MiMo-7B-RL

The models are readily available on the Hugging Face Hub.

Model Access:

Find MiMo-7B-RL and other models in the series at the XiaomiMiMo organization page on Hugging Face. The model size is approximately 7.83 billion parameters (BF16 precision, Safetensors format).

Running Inference with vLLM (Recommended)

Xiaomi recommends using their fork of vLLM (based on v0.7.3) for inference, as it supports the Multi-Token Prediction feature for potentially faster generation.

  • Using the Xiaomi vLLM Fork (with MTP):
# Ensure Xiaomi's vLLM fork is installed
from vllm import LLM, SamplingParams

# --- FACTUAL CODE SNIPPET START ---
# Source: https://huggingface.co/XiaomiMiMo/MiMo-7B-RL Model Card
model_path = "/path/to/XiaomiMiMo/MiMo-7B-RL" # Replace with your download path

llm = LLM(
    model=model_path,
    trust_remote_code=True,  # Required for MiMo's custom code
    num_speculative_tokens=1, # Enables MTP speculative decoding
    disable_log_stats=False
)
# Recommended sampling temperature for benchmark replication
sampling_params = SamplingParams(temperature=0.6)

# Example conversation structure (empty system prompt recommended)
conversation = [
    {
        "role": "system",
        "content": "" # Use an empty system prompt
    },
    {
        "role": "user",
        "content": "Write a python function to compute the nth Fibonacci number.",
    },
]

# Generate the response
outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   use_tqdm=False)

# Process and print output
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}")
    print("-" * 20)
    print(f"Generated text: {generated_text!r}")
# --- FACTUAL CODE SNIPPET END ---

print("=" * 80)
  • Using Standard vLLM (without MTP):
    If not using the MTP feature or using a standard vLLM build, register the MiMo architecture first using the register_mimo_in_vllm.py script provided by Xiaomi.
# --- FACTUAL CODE SNIPPET START ---
# Source: https://huggingface.co/XiaomiMiMo/MiMo-7B-RL Model Card
# Ensure register_mimo_in_vllm.py is accessible
import register_mimo_in_vllm

from vllm import LLM, SamplingParams

model_path = "/path/to/XiaomiMiMo/MiMo-7B-RL" # Replace with your download path
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    # Do not set num_speculative_tokens if not using MTP
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

# Conversation setup and generation call is the same as the MTP example...
conversation = [
    {"role": "system", "content": ""},
    {"role": "user", "content": "Write a python function to compute the nth Fibonacci number."},
]
outputs = llm.chat(conversation, sampling_params=sampling_params, use_tqdm=False)
# Processing output is the same...
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}\n{'-'*20}\nGenerated text: {generated_text!r}")
# --- FACTUAL CODE SNIPPET END ---

Using HuggingFace Transformers

Standard HuggingFace transformers library inference is also possible. Remember trust_remote_code=True is necessary.

# --- FACTUAL CODE SNIPPET START ---
# Source: https://huggingface.co/XiaomiMiMo/MiMo-7B-RL Model Card
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/path/to/XiaomiMiMo/MiMo-7B-RL" # Replace with your download path

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True, # Essential for loading MiMo
    device_map="auto"       # Use GPU if available
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Prepare the input prompt
prompt = "Write a python function to compute the nth Fibonacci number."
# Tokenize the input
inputs = tokenizer([prompt], return_tensors='pt').to(model.device)

# Generate the output sequence
output_sequences = model.generate(
    **inputs,
    max_new_tokens=256,      # Control output length
    temperature=0.6,         # Recommended temperature
    do_sample=True           # Use sampling for temperatures != 1.0
)

# Decode the output
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print(generated_text)
# --- FACTUAL CODE SNIPPET END ---

Usage Recommendations

For best results, especially when trying to replicate benchmark scores, use the recommended setup: Xiaomi's vLLM fork (based on v0.7.3) and an empty system prompt.

Final Thoughts: Efficient Reasoning Realized by Xiaomi?

Xiaomi's MiMo-7B-RL demonstrates that exceptional reasoning performance in specialized domains like mathematics and coding is achievable without resorting to enormous model sizes. Through careful pre-training focused on reasoning patterns and innovative reinforcement learning techniques, they've created an efficient model that competes effectively with much larger counterparts. The open release of the MiMo series provides valuable tools and insights, pushing forward the development of powerful, accessible AI reasoning capabilities.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

How to Activate Python venv (Beginner's Guide)Viewpoint

How to Activate Python venv (Beginner's Guide)

In the dynamic world of Python development, managing dependencies and project environments is crucial for sanity and success. Imagine working on two different projects: one requires an older version of a popular library like requests, while the other needs the very latest features. Installing both system-wide would inevitably lead to conflicts, breakage, and frustration. This is precisely the problem Python virtual environments are designed to solve. This tutorial will guide you through the fun

Stefania Boiko

May 2, 2025

Qwen2.5-Omni-7B: Small But MightyViewpoint

Qwen2.5-Omni-7B: Small But Mighty

The field of artificial intelligence is rapidly evolving, pushing the boundaries of what machines can perceive, understand, and generate. A significant leap in this evolution is marked by the introduction of the Qwen2.5-Omni-7B model, a flagship end-to-end multimodal model developed by the Qwen team. This model represents a paradigm shift, moving beyond text-centric interactions to embrace a truly omni-modal experience. It seamlessly processes a diverse array of inputs – text, images, audio, and

Qwen 3 Has MCP Server Support and Here's How to Use ItViewpoint

Qwen 3 Has MCP Server Support and Here's How to Use It

Discover how Qwen 3 leverages MCP server support in this tutorial! Use Qwen 3 with MCP to query SQLite databases and build smarter AI agents.

Ashley Goolam

May 1, 2025