XBai o4: A New Chinese AI Model Surpassing OpenAI-o3-mini in Complex Reasoning

Discover XBai o4, a new Chinese AI model surpassing OpenAI-o3-mini in complex reasoning.

Ashley Innocent

Ashley Innocent

4 August 2025

XBai o4: A New Chinese AI Model Surpassing OpenAI-o3-mini in Complex Reasoning

MetaStone AI’s XBai o4, released on August 1, 2025, is a fourth-generation open-source language model that outperforms OpenAI-o3-mini in complex reasoning tasks. This Chinese-developed model introduces advanced training techniques and optimized inference, making it a game-changer in AI development. Available on GitHub and Hugging Face, XBai o4 promotes transparency and collaboration.

💡
For developers integrating its APIs, Apidog simplifies testing and deployment with a free, user-friendly platform—perfect for exploring XBai o4’s capabilities.
button

The Rise of XBai o4: A Technical Overview

XBai o4, developed by MetaStone AI, represents a leap forward in open-source AI technology. Unlike proprietary models, XBai o4’s codebase and weights are publicly available on GitHub and Hugging Face, fostering transparency and collaboration. Specifically, the model leverages a novel training approach called the “reflective generative form,” which integrates Long-CoT Reinforcement Learning and Process Reward Learning. Consequently, this unified framework enables XBai o4 to excel in deep reasoning and high-quality reasoning trajectory selection, setting it apart from its predecessors and competitors like OpenAI-o3-mini.



Moreover, XBai o4 optimizes inference efficiency by sharing the backbone network between its Policy Reward Models (PRMs) and policy models. This architectural choice reduces the inference cost of PRMs by an impressive 99%, resulting in faster response times and higher-quality outputs. For instance, the model’s parameters are saved in two distinct files: model.safetensors for the policy model checkpoint and a separate file for the SPRM head, as detailed in the Hugging Face repository.

Understanding the Reflective Generative Form

The cornerstone of XBai o4’s success lies in its reflective generative form. This training paradigm combines two advanced techniques:

  1. Long-CoT Reinforcement Learning: This method extends Chain-of-Thought (CoT) prompting by incorporating reinforcement learning to refine the model’s reasoning process over extended contexts. As a result, XBai o4 can tackle complex, multi-step problems with greater accuracy.
  2. Process Reward Learning: This approach rewards the model for selecting high-quality reasoning trajectories during training. Consequently, XBai o4 learns to prioritize optimal reasoning paths, enhancing its performance in tasks requiring nuanced decision-making.

By integrating these methods, XBai o4 achieves a balance between deep reasoning and computational efficiency. Furthermore, the shared backbone network minimizes redundancy, allowing the model to process inputs faster without sacrificing quality. This innovation is particularly significant when compared to OpenAI-o3-mini, which, while efficient, lacks the same level of open-source accessibility and optimized reasoning capabilities.

Comparing XBai o4 to OpenAI-o3-mini

OpenAI-o3-mini, a compact version of OpenAI’s broader o3 series, is designed for efficiency in medium-complexity tasks. However, XBai o4 claims to “completely surpass” OpenAI-o3-mini in Medium mode, as stated in MetaStone AI’s GitHub announcement.

To understand this claim, let’s examine key performance metrics:

For example, MetaStone AI’s test pipeline for mathematical benchmarks, as outlined in their GitHub repository, demonstrates XBai o4’s ability to process tasks like AIME24 with high precision. The pipeline uses scripts like score_model_queue.py and policy_model_queue.py to evaluate performance, leveraging tools like XFORMERS for optimized attention mechanisms.

Technical Implementation of XBai o4

To deploy XBai o4, developers need a robust setup, as outlined in the GitHub repository. Below is a simplified setup guide based on the provided instructions:

Environment Setup:

Training and Evaluation:

API Integration:

This setup highlights XBai o4’s flexibility for both research and production environments. Additionally, the model’s compatibility with tools like Apidog simplifies API testing, allowing developers to validate endpoints efficiently.

Benchmark Performance and Evaluation

MetaStone AI’s release notes emphasize XBai o4’s superior performance on mathematical benchmarks like AIME24. The test pipeline, detailed in the GitHub repository, uses a combination of policy and score model APIs to evaluate the model’s reasoning capabilities. For instance, the inference.py script processes input files like aime24.jsonl and generates results with 16 samples, leveraging multiple API endpoints for speed.

Moreover, the model’s performance is enhanced by the XFORMERS attention backend, which optimizes memory usage and computation speed. This is particularly evident in the VLLM_ATTENTION_BACKEND=XFORMERS configuration, which ensures efficient processing on GPU-enabled systems.

In contrast, OpenAI-o3-mini, while effective for general tasks, does not provide the same level of transparency in its evaluation process. XBai o4’s open-source nature allows researchers to scrutinize and replicate its benchmarks, fostering trust in its performance claims.

Community Reception and Skepticism

The AI community has responded with a mix of excitement and skepticism to XBai o4’s release. A Reddit post on r/accelerate, for instance, highlights the model’s potential but raises concerns about benchmark overtuning, referencing past issues with models like Llama-4. Some users question the credibility of MetaStone AI, a relatively new player compared to established organizations like Qwen. Nevertheless, the open-source availability of XBai o4’s weights and code encourages independent verification, which could dispel doubts over time.

For example, a user on Threads reported testing XBai o4 on an M4 Max with the mlx-lm backend, noting that it passed the “1+1 vibe test” for reasoning tasks. However, challenges like rendering complex visualizations (e.g., inverse kinematics) suggest areas for improvement.

Integration with Apidog for API Testing

For developers integrating XBai o4 into their workflows, tools like Apidog are invaluable. Apidog simplifies the process of testing and managing APIs, such as those used in XBai o4’s evaluation pipeline. By providing a user-friendly interface for sending requests to endpoints like http://ip:port/score, Apidog ensures that developers can validate model performance without complex manual configurations. Furthermore, its free download makes it accessible to researchers and hobbyists alike, aligning with XBai o4’s open-source ethos.

button

To illustrate, consider a scenario where a developer uses Apidog to test XBai o4’s policy model API. By configuring the endpoint URL and parameters (e.g., --model_path and --port), Apidog can send test requests and analyze responses, streamlining the debugging process. This integration is particularly useful for scaling evaluations across multiple nodes, as recommended in the GitHub setup instructions.

Future Implications for Open-Source AI

XBai o4’s release underscores the growing importance of open-source AI in democratizing access to advanced technology. Unlike proprietary models like OpenAI-o3-mini, XBai o4 empowers developers to customize and extend the model for specific use cases. For instance, its reflective generative form could be adapted for domains like scientific research, financial modeling, or automated code generation.

Additionally, the model’s efficiency improvements pave the way for deploying large language models on resource-constrained environments. By reducing inference costs, XBai o4 makes it feasible to run sophisticated AI on consumer-grade hardware, broadening its potential applications.

However, challenges remain. The AI community’s skepticism highlights the need for rigorous, transparent benchmarking to validate performance claims. Moreover, while XBai o4 excels in reasoning, its visualization capabilities (e.g., inverse kinematics) require further refinement, as noted in community feedback.

Conclusion: XBai o4’s Place in the AI Ecosystem

In summary, XBai o4 represents a significant advancement in open-source AI, offering superior reasoning capabilities and efficiency compared to OpenAI-o3-mini. Its reflective generative form, combining Long-CoT Reinforcement Learning and Process Reward Learning, sets a new standard for complex problem-solving. Furthermore, its open-source availability on GitHub and Hugging Face fosters collaboration and innovation, making it a valuable resource for developers and researchers.

For those looking to explore XBai o4’s capabilities, tools like Apidog provide an efficient way to test and integrate its APIs, ensuring seamless deployment in real-world applications. As the AI landscape continues to evolve, XBai o4 stands as a testament to the power of open-source innovation, challenging proprietary models and pushing the boundaries of what AI can achieve.

Explore more

Kong vs Azure: Which API Gateway Should You Choose?

Kong vs Azure: Which API Gateway Should You Choose?

Compare Kong vs Azure API Gateway in-depth. Learn which gateway is better for your use case performance, security, cost, and DevOps. Plus, see how Apidog enhances both options.

4 August 2025

How to Create an MCP Server with the Java SDK

How to Create an MCP Server with the Java SDK

Create a Java MCP Server to share data with AI models using the Java SDK. This guide covers setup, testing with game data, and extending tools for Claude or Cursor integration.

4 August 2025

Apigee vs Kong: Comprehensive Guide to Choosing the Right API Gateway

Apigee vs Kong: Comprehensive Guide to Choosing the Right API Gateway

Choosing the right API gateway can shape your app’s performance, security, and scalability. This guide breaks down Apigee vs Kong—comparing features, use cases, developer experience, and when to use each. Plus, see how Apidog fits in to streamline your API workflow from design to deployment.

1 August 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs