How to Use the ERNIE 5.1 API?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

ERNIE 5.1 shipped on May 9, 2026, and within a week the Qianfan API was live for it. If you want to call the model from your own code, route tool calls through it, or wire it into an agent loop with Apidog, this guide walks the full path: account, key, request body, streaming, tool use, error handling.

We’ll keep it practical. By the end you’ll have working curl, Python, and Node snippets, plus a request collection you can drop into Apidog.

If you have not read the ERNIE 5.1 launch breakdown yet, skim it first; it covers benchmarks and trade-offs versus DeepSeek V4 and Kimi K2.6. This post is the implementation companion.

Step 1: Get a Qianfan API key

ERNIE 5.1 is served through Baidu Intelligent Cloud’s Qianfan platform. There is no separate “ERNIE API”; everything routes through Qianfan.

Go to cloud.baidu.com and create or sign in to a Baidu Intelligent Cloud account. International developers can use email signup; some enterprise features still need a mainland phone number.
Open the Qianfan console at console.bce.baidu.com/qianfan.
Under API Key Management (API Key 管理), click Create API Key. Pick the workspace and grant access to the chat-completions service.
Copy the key. It looks like bce-v3/ALTAK-xxxx/xxxx. Store it in an env var, not in source.

export QIANFAN_API_KEY="bce-v3/ALTAK-xxxx/xxxx"

Two things to know up front. First, the new v2 endpoint uses a single Bearer token; the older v1 OAuth access_token flow is being deprecated and you should not build new code on it. Second, ERNIE 5.1 is a paid model from day one. Top up a small balance (¥10 is enough to test) before your first request.

Step 2: Hit the OpenAI-compatible endpoint with curl

Qianfan exposes an OpenAI-compatible chat-completions endpoint, so anything in your stack that already speaks OpenAI’s format will work with a base-URL swap and a model-ID change.

Base URL: https://qianfan.baidubce.com/v2 Model ID: ernie-5.1 (also: ernie-5.1-preview for early-access features)

Minimum viable request:

curl https://qianfan.baidubce.com/v2/chat/completions \
  -H "Authorization: Bearer $QIANFAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ernie-5.1",
    "messages": [
      {"role": "system", "content": "You are a senior API designer."},
      {"role": "user", "content": "Sketch a REST schema for a GitHub-style PR review API. Be concise."}
    ],
    "temperature": 0.3
  }'

You get back a standard OpenAI-shaped response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1746780000,
  "model": "ernie-5.1",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 318,
    "total_tokens": 360
  }
}

If you see 401 Unauthorized, your key is wrong or expired. If you see 403, the key is valid but the model is not enabled on this workspace; go back to the console and add ERNIE 5.1 to the workspace’s allowed models.

Step 3: Call ERNIE 5.1 from Python

Because the endpoint is OpenAI-compatible, the official openai Python SDK works as-is. Point it at Qianfan.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["QIANFAN_API_KEY"],
    base_url="https://qianfan.baidubce.com/v2",
)

response = client.chat.completions.create(
    model="ernie-5.1",
    messages=[
        {"role": "system", "content": "You explain APIs in plain English."},
        {"role": "user", "content": "Why would I use server-sent events over WebSockets for a chat UI?"},
    ],
    temperature=0.4,
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")

If you already have wrappers around the OpenAI SDK in your codebase, swapping ERNIE 5.1 in for A/B testing is a one-line change. The same trick works for DeepSeek’s API and most other Chinese model providers.

Step 4: Stream tokens for chat-style UIs

For any user-facing chat, you want streaming. Set stream: true and consume server-sent events.

stream = client.chat.completions.create(
    model="ernie-5.1",
    messages=[{"role": "user", "content": "Write a haiku about API versioning."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Curl equivalent for debugging:

curl https://qianfan.baidubce.com/v2/chat/completions \
  -H "Authorization: Bearer $QIANFAN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ernie-5.1",
    "stream": true,
    "messages": [{"role": "user", "content": "Stream a 3-sentence joke."}]
  }' \
  --no-buffer

The stream format is identical to OpenAI’s: data: {...} lines terminated by data: [DONE].

Step 5: Use ERNIE 5.1 with tools (the agentic part)

This is where ERNIE 5.1 earns its launch headline. The model scored above DeepSeek-V4-Pro on τ³-bench and SpreadsheetBench-Verified, which means tool-calling works in production, not just in demos.

Same schema as OpenAI function calling:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name, e.g. Singapore"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="ernie-5.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo right now?"}],
    tools=tools,
    tool_choice="auto",
)

tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    call = tool_calls[0]
    print(f"Model wants to call: {call.function.name}({call.function.arguments})")

After your code runs the actual tool, append the result as a tool role message and call again. The loop terminates when finish_reason == "stop" and tool_calls is empty.

One gotcha: ERNIE 5.1 occasionally returns tool arguments as a stringified JSON inside a code fence rather than as a clean JSON string. Parse defensively with json.loads() wrapped in try/except, and if it fails, strip ```json fences before retrying.

Step 6: Call ERNIE 5.1 from Node.js

Drop-in for any Node project using openai v5+:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.QIANFAN_API_KEY,
  baseURL: "https://qianfan.baidubce.com/v2",
});

const completion = await client.chat.completions.create({
  model: "ernie-5.1",
  messages: [
    { role: "user", content: "Return a JSON object with 3 API design tips." },
  ],
  response_format: { type: "json_object" },
});

console.log(completion.choices[0].message.content);

response_format: { type: "json_object" } works and is reliable. Strict JSON schemas (json_schema) are still being rolled out on Qianfan; verify the response shape in code rather than trusting the constraint.

Step 7: Test and compare with Apidog

If you are deciding between ERNIE 5.1, DeepSeek V4, and Kimi K2.6, do not do it from the terminal. Use Apidog to build a single workspace with one folder per provider, identical request bodies, and saved environments per API key.

The 60-second setup:

Open Apidog and create a new project called “LLM bake-off.”

Add an environment with QIANFAN_API_KEY, DEEPSEEK_API_KEY, MOONSHOT_API_KEY as variables.

Create three requests pointing at each provider’s base URL with model set to ernie-5.1, deepseek-chat, and kimi-k2-6 respectively.

Pin the same messages array on all three. Use Apidog’s “Run” feature to fire them in parallel and diff outputs.

The free tier handles this comfortably. Apidog saves the request history per environment, so you can come back next week and re-run the exact same eval against a new model version. Beats babysitting curl in a tmux pane.

For more on multi-provider testing, see Test local LLMs as APIs and our GLM 5.1 API guide.

Pricing, rate limits, and quotas

Public Qianfan pricing for ERNIE 5.1 was not in the release post; check the live console rate card before quoting numbers internally. Three practical tips while you wait:

Default rate limits are workspace-scoped. New accounts start with a low QPS cap. Raise it from the console once you finish testing.
Token usage shows up in the response. The usage field gives prompt_tokens, completion_tokens, and total_tokens per call. Log these per request; do not trust the dashboard alone for cost accounting.
Caching is not automatic. Unlike Anthropic, Qianfan does not currently expose a prompt-caching primitive for ERNIE 5.1. If you have a 2,000-token system prompt, you pay for it every call. Architect around that.

Error handling that will save you

The errors you will hit in practice, in rough order of frequency:

Status	Meaning	Fix
401	Bearer token wrong or expired	Regenerate from console
403	Model not enabled on this workspace	Add ERNIE 5.1 in console
429	Rate limit hit	Backoff + retry with jitter
400 (`invalid messages`)	Wrong message-role ordering	Ensure user/assistant alternation
500/502	Qianfan-side blip	Retry once; if it persists, check status page

Wrap every call in retry-with-exponential-backoff capped at 3 attempts. For production, log request_id from response headers; Baidu support needs it to debug your case.

A minimal production-shaped wrapper

If you want to drop ERNIE 5.1 into a real app today, here is the smallest wrapper that is not embarrassing:

import os, time, random, json
from openai import OpenAI, RateLimitError, APIError

client = OpenAI(
    api_key=os.environ["QIANFAN_API_KEY"],
    base_url="https://qianfan.baidubce.com/v2",
)

def chat(messages, *, model="ernie-5.1", temperature=0.3, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
            )
        except RateLimitError:
            time.sleep((2 ** attempt) + random.random())
        except APIError as e:
            if e.status_code and e.status_code >= 500 and attempt < max_retries - 1:
                time.sleep(1 + attempt)
                continue
            raise
    raise RuntimeError("ERNIE 5.1 retries exhausted")

That handles the 80% case. For tool-loops and streaming, build on top.

Frequently asked questions

Is the ERNIE 5.1 API free? No. Qianfan is pay-as-you-go. There is no permanent free tier; new accounts sometimes get trial credits. For free experimentation use the ernie.baidu.com chat UI or look at free LLM options.

Can I run ERNIE 5.1 locally? No. There are no public weights. If on-prem is a hard requirement, look at how to run DeepSeek V4 locally or the best local LLMs in 2026 instead.

Does the OpenAI SDK work without changes? Yes, with base_url set to https://qianfan.baidubce.com/v2 and api_key set to your Qianfan key. The model field takes Qianfan model IDs, not OpenAI ones. Function calling, streaming, and response_format: json_object all work. Strict json_schema validation is still rolling out.

How does ERNIE 5.1 handle Chinese vs English prompts? Both are first-class. The Arena Search score of 1,223 came from a mixed-language voter pool. For technical English tasks (code, API design), it is competitive with the closed frontier; for Chinese creative writing it is best-in-class among Chinese models.

What is the max output length? Not officially published. In practice, single-turn responses cap around 8K tokens before the model wraps up. For long-form generation, chunk and continue.

Building an agent on ERNIE 5.1? Download Apidog and use the OpenAI-compatible request collection to mock, test, and document the Qianfan endpoint alongside the rest of your services.

In this article

Step 1: Get a Qianfan API key Step 2: Hit the OpenAI-compatible endpoint with curl Step 3: Call ERNIE 5.1 from Python Step 4: Stream tokens for chat-style UIs Step 5: Use ERNIE 5.1 with tools (the agentic part)Step 6: Call ERNIE 5.1 from Node.js Step 7: Test and compare with Apidog Pricing, rate limits, and quotas Error handling that will save you A minimal production-shaped wrapper Frequently asked questions

Apidog: A Real Design-first API Development Platform

API Design

API Documentation

API Debugging

Automated Testing

API Mocking

More

Get Started for Free

Enterprise

On-Premises or SaaS or EU-hosted

SSO, RBAC & audit logs

SOC 2, GDPR, ISO 27001

Explore Apidog Enterprise

Explore more

Gemini 3.5 Flash-Lite vs 3.6 Flash: which one should you use?

Gemini 3.5 Flash-Lite vs 3.6 Flash compared: price, speed, benchmarks, a use-case matrix, and a same-workload cost example so you pick the right tier fast.

22 July 2026

Gemini 3.6 Flash vs 3.5 Flash: what changed and should you upgrade?

Gemini 3.6 Flash vs 3.5 Flash: same $1.50 input, output cut to $7.50, 17% fewer output tokens, higher computer-use scores. What changed and should you upgrade?

22 July 2026

How to use Gemini 3.6 Flash for free

Use Gemini 3.6 Flash for free two ways: the Gemini app and the free API tier in Google AI Studio. Real rate limits, the data-use catch, and when to pay.

22 July 2026