Baidu released ERNIE 5.1 on May 9, 2026, and the headline number is hard to ignore: a Mixture-of-Experts model with roughly one-third of ERNIE 5.0’s total parameters that landed at 4th place globally on the Arena Search leaderboard and 1st among Chinese models with a score of 1,223.
It is the first version of the ERNIE family where Baidu is openly competing on agentic tool use, long-form creative writing, and reasoning against Gemini 3.1 Pro and DeepSeek-V4-Pro, no longer only on Chinese-language tasks. If you build with Apidog and you have been waiting for a Chinese frontier model that you can slot into an agent stack without a 70B-parameter footprint, this release is worth a careful look.
This guide unpacks what ERNIE 5.1 is, what changed under the hood, how the benchmarks compare to DeepSeek-V4-Pro and Gemini 3.1 Pro, and where the model fits if you already use DeepSeek V4 or Kimi K2.6 in production.
TL;DR: ERNIE 5.1 in one paragraph
ERNIE 5.1 is a text-only MoE model trained at roughly 6% of the pre-training cost of comparable frontier models. Total parameters are about one-third of ERNIE 5.0, and active parameters per forward pass are about one-half. It scores 1,223 on the Arena Search leaderboard (4th global, 1st in China), beats DeepSeek-V4-Pro on the τ³-bench and SpreadsheetBench-Verified agentic benchmarks, and hits 99.6 on AIME26 with tool use. Access is live through the ERNIE chat UI, Baidu AI Studio’s ERNIE 5.1 Playground, and the Qianfan API.

Why this release matters
Three things stand out, and none of them is “Baidu shipped another model.”
1. The cost-to-quality ratio. A pre-training run at ~6% of the cost of comparable models is a number that resets pricing expectations across the industry. If Baidu can serve this through Qianfan at a fraction of what frontier closed models charge, downstream API pricing will follow.
2. The MoE design is elastic on three axes. Most MoE models route across width (which experts fire) and sometimes depth (layer skipping). Baidu claims ERNIE 5.1 routes across depth, width, and sparsity at once, which is how they shrunk the model without losing the agentic tool-use scores. This is closer to the design philosophy in DeepSeek-V3.x than to a vanilla GShard-style MoE.
3. Agentic capability is the headline, not a footnote. ERNIE 5.0 was positioned as a knowledge-and-creative-writing model. ERNIE 5.1 explicitly markets “agentic capabilities on par with the world’s top models” and ships with a Baidu AI Studio playground tuned for tool-calling demos. That is a strategic shift.

The benchmarks, side by side
Here is what Baidu published, mapped against the closest public comparison points.
| Benchmark | ERNIE 5.1 | What it tests | Closest competitor |
|---|---|---|---|
| Arena Search leaderboard | 1,223 (4th global, 1st CN) | Human-rated search-aware QA | Gemini 3.1 Pro, GPT-5.x |
| τ³-bench | Beats DeepSeek-V4-Pro | Agentic tool-use, multi-turn | DeepSeek-V4-Pro |
| SpreadsheetBench-Verified | Beats DeepSeek-V4-Pro | Real-world spreadsheet tasks | DeepSeek-V4-Pro |
| AIME26 (with tools) | 99.6 | Competition math with code interpreter | GPT-5.x, Gemini 3.1 Pro |
| GPQA | “Approaches leading closed-source” | Graduate-level science QA | Claude Sonnet 4.6 |
| MMLU-Pro | “Approaches leading closed-source” | Broad knowledge | All frontier models |
A few honest caveats. Arena scores depend on the prompt mix and the voter pool, and Chinese-leaning prompts likely help here. The AIME26-with-tools score is also tool-augmented; a pure-reasoning AIME number was not disclosed. Creative writing is described as “approaching Gemini 3.1 Pro” rather than matching it.
That said, the τ³-bench and SpreadsheetBench results are the ones to pay attention to. Both are agentic, both are externally maintained, and both have historically been hard to game.
What we know about the architecture
Baidu disclosed less than DeepSeek does for its V3-series papers, but here is what the release post and adjacent posts confirm:
- Total parameters: about one-third of ERNIE 5.0
- Active parameters per token: about one-half of ERNIE 5.0
- Routing: elastic on depth, width, and sparsity (a tri-axis MoE)
- Pre-training cost: ~6% of “comparable models”
- Modality: text only at launch (no vision, no audio)
- Languages: Chinese and English versions available
Context length, exact parameter counts, and the training token budget were not disclosed. If you have built with Chinese MoE models like GLM 5.1 before, expect a similar developer surface area.

What you cannot do with ERNIE 5.1 (yet)
Worth calling out so you do not design around it and get burned later.
- No image input. ERNIE 5.1 is text-only. For multimodal Baidu workflows you still need ERNIE-VL or an external vision model.
- No audio input or output. No native speech, no real-time voice.
- No published context window. Until Baidu confirms the figure, treat long-document use cases with care.
- No HuggingFace weights. This is a hosted-only model. If on-prem matters, you are looking at DeepSeek V4 locally or a local LLM instead.
How ERNIE 5.1 compares to the Chinese frontier
If you already pick between DeepSeek, Kimi, GLM, and Qwen, here is the quick mental model.
Pick ERNIE 5.1 when you need strong agentic tool-use plus search-augmented answers in Chinese or English, and you want the cheapest pricing curve on the Chinese cloud side.
Pick DeepSeek V4 when you need open weights, on-prem deployment, or the strongest pure-reasoning score on hard math without tools.
Pick Kimi K2.6 when you need long context windows for document-heavy workflows.
Pick GLM 5.1 when you need a balanced generalist and you already have Z.ai or Zhipu in your stack.
This is not a strict ranking; it is about which trade-off matches your workload. Run your own evals on a 50-prompt slice before committing.
Where to try ERNIE 5.1 today
Three paths, in order of friction:
- ernie.baidu.com: the consumer chat UI. Free, no API key, China-region. Best for kicking the tires on creative writing and reasoning.
- Baidu AI Studio ERNIE 5.1 Playground: a hosted playground with tool-calling demos pre-wired. Good for agentic experiments before you commit to API work.
- Qianfan API: the developer endpoint. OpenAI-compatible request shape, Bearer-token auth. Full hands-on walkthrough is in our companion post How to use the ERNIE 5.1 API.
If you are evaluating across multiple Chinese model providers in parallel, Apidog is the cleanest way to manage the keys, save request bodies per provider, and diff responses side by side without writing throwaway scripts.
Pricing and rollout
Baidu announced that ERNIE 5.1 will roll out across 10+ creative production platforms in the weeks following launch. Public per-token pricing on Qianfan was not in the release post; based on the ~6% pre-training cost claim and Baidu’s historical Qianfan rate sheet, expect input pricing in the same band as ERNIE 4.5 Turbo or lower. Always check the live Qianfan console before quoting numbers internally.
How developers should think about ERNIE 5.1
Three concrete recommendations if you are deciding whether to wire it into your stack.
1. Run it against your own agentic eval, not the public benchmark. τ³-bench is a good signal but it is not your workload. Build a 20–50 case eval that mirrors your real tool-use patterns, then compare ERNIE 5.1 against your current model. Test LLMs as APIs walks through one way to do this with Apidog.
2. Treat the model as a Chinese-cloud bet. Qianfan is hosted in China. If your data residency rules say “no PRC infrastructure,” this is a non-starter regardless of benchmarks.
3. Watch the pricing announcement. The ~6% pre-training cost claim is the most interesting number in the release. If Baidu passes that through to the API, the entire Chinese-model price floor moves down, which forces DeepSeek, Zhipu, and Moonshot to respond.
Frequently asked questions
Is ERNIE 5.1 open-source? No. ERNIE 5.1 is a hosted-only model accessible through Baidu’s chat UI, Baidu AI Studio, and the Qianfan API. There are no public weights on HuggingFace at the time of writing.
Does ERNIE 5.1 support image or vision input? No. ERNIE 5.1 is text-only at launch. Baidu’s ERNIE-VL family handles vision tasks. If you need a single multimodal Chinese model, look at Qwen 3.5 Omni instead.
What is the context length? Baidu did not publish a specific context-window number in the release post. Until they confirm it, design long-document workflows defensively and chunk inputs.
Can I use ERNIE 5.1 from outside China? The chat UI and Qianfan API are accessible from most regions, but latency and account verification differ. Some enterprise features still require a mainland phone number or business license. The companion guide How to use the ERNIE 5.1 API covers the access flow in detail.
Is ERNIE 5.1 better than DeepSeek-V4-Pro? On τ³-bench and SpreadsheetBench-Verified, Baidu says yes. On open-weight access, no. On pure-reasoning math benchmarks without tool use, the public numbers do not give a clear answer. The honest position: they target slightly different deployment models.
Ready to start building? Download Apidog and import the Qianfan OpenAPI spec to test ERNIE 5.1 alongside your current model in one workspace.



