Alibaba’s Qwen team shipped Qwen3.7-Max-Preview in mid-May 2026, and developers immediately started asking the same question: how do I call it from my own code? The model is a flagship reasoning system with a 1M-token context window and explicit chain-of-thought traces, a strong fit for agent backends, long-document analysis, and code generation. But “preview” is doing a lot of work in that name. Access is gated, the API surface is still settling, and the details you need to write working code are scattered across release notes and platform docs.
TL;DR
Qwen3.7-Max-Preview is Alibaba’s flagship reasoning model, released in preview on May 14, 2026, with a 1M-token context window. During preview the most reliable way to use it is Qwen Chat (chat.qwen.ai); production API access runs through Alibaba Cloud Model Studio (DashScope) using an OpenAI-compatible endpoint, where you set a base URL, pass your key as a Bearer token, and call /chat/completions. Because the 3.7 tier is preview-only, confirm the exact model ID and endpoint in the official docs before you ship, and use Apidog to test and mock the endpoint while availability stabilizes.
How to access Qwen 3.7 right now
Qwen ships its models across a few surfaces, and they don’t all light up at once. As of late May 2026, here is the honest state of access.
Qwen Chat (chat.qwen.ai). The fastest way to try Qwen3.7-Max-Preview. Sign in with a free Qwen account, pick qwen3.7-max-preview in the model selector, and turn on Thinking Mode to see the reasoning trace. There are usage rate limits during preview, but it costs nothing and needs no setup. It’s a browser product, not an API, so it’s for evaluation rather than integration.
Alibaba Cloud Model Studio (DashScope). This is where Qwen models become a real API. Model Studio exposes Qwen through an OpenAI-compatible endpoint, so any code that already talks to the OpenAI SDK can call Qwen with a base-URL and key swap. Older tiers like qwen3.6-max-preview and the qwen-max family are already available here. The 3.7 preview tier may not yet have a public API entry when you read this; Qwen has historically opened API access a few weeks after the chat preview.

The OpenAI-compatible pattern. Every recent Qwen model on Model Studio follows the same shape. You point the standard OpenAI client at a DashScope base URL, authenticate with a Bearer token, and call the chat completions route. That pattern is stable across versions, so the code below keeps working as the 3.7 model ID lands; you mostly change one string.
Because the model identifier and endpoint can shift during a preview, treat the official Qwen documentation and the Model Studio model list as the source of truth. For a zero-cost route while you wait for API access, our guide on how to use Qwen 3.7 for free covers the preview channels in detail.
Access methods at a glance
| Method | API access | Cost | Best for |
|---|---|---|---|
| Qwen Chat (chat.qwen.ai) | No | Free, rate-limited | Quick evaluation, prompt testing |
| Alibaba Cloud Model Studio (DashScope) | Yes, OpenAI-compatible | Pay per token | Production integration |
| Qwen on Hugging Face | Weights, when released | Free (self-host) | Open-weight models, not the Max preview |
| Third-party gateways | Varies | Varies | Multi-model routing |
One distinction worth noting: the open-weight Qwen models reach Hugging Face, but the Max-Preview tier is proprietary, so don’t expect downloadable weights for qwen3.7-max-preview.
Getting a Qwen 3.7 API key
API access goes through an Alibaba Cloud account. The steps are short.
- Create an Alibaba Cloud account and open the Model Studio console (
modelstudio.console.alibabacloud.com). - Activate Model Studio for your account and region. Keys are region-scoped, so a key for the Singapore endpoint won’t authenticate against Beijing.
- Open the API keys section of the console and generate a key. It looks like
sk-followed by a string of characters. - Copy the key once and store it like a password.
Pick your region deliberately, because it sets your base URL:
| Region | Base URL |
|---|---|
| Singapore | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
| US (Virginia) | https://dashscope-us.aliyuncs.com/compatible-mode/v1 |
| Beijing (China) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
Never hardcode the key in source you commit. Put it in an environment variable instead:
# macOS / Linux
export DASHSCOPE_API_KEY="sk-your-key-here"
# Windows PowerShell
setx DASHSCOPE_API_KEY "sk-your-key-here"
Your code reads DASHSCOPE_API_KEY at runtime. This keeps the secret out of your repo and lets you rotate keys without touching code. The same habit applies whatever model you call; you’ll see the same pattern in our guide to the Gemini 3.5 API.
Your first request: Python, curl, and JavaScript
Qwen’s Model Studio endpoint is OpenAI-compatible, so you have two options: the official OpenAI SDK pointed at the DashScope base URL, or a raw HTTP call. Both are below.
One note before the code. The model ID qwen3.7-max-preview is the identifier Qwen Chat uses for the preview model. The exact string the API expects can differ during a preview window, and an older tier like qwen3.6-max-preview may be live when you try this. Confirm the current model ID in the Model Studio model list, then drop it into the model field. The request shape does not change.
Python with the OpenAI SDK
Install the SDK with pip install openai, then send a request:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
# Use the base URL for your account's region
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
response = client.chat.completions.create(
# Confirm the live model ID in the Model Studio model list
model="qwen3.7-max-preview",
messages=[
{"role": "system", "content": "You are a precise coding assistant."},
{"role": "user", "content": "Write a Python function that reverses a linked list."},
],
)
print(response.choices[0].message.content)
That’s a complete request. The messages array follows the standard role pattern: a system message sets behavior, then user turns. The response carries the generated text in choices[0].message.content.
curl
For a quick check from the terminal, or to confirm a key works before writing app code:
curl 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3.7-max-preview",
"messages": [
{"role": "user", "content": "Explain idempotency in REST APIs in two sentences."}
]
}'
If the key and model ID are valid, you get a JSON response with the completion. If not, the error body tells you what to fix; more on errors below.
JavaScript / Node.js
The same OpenAI SDK works in Node. Install it with npm install openai:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
const response = await client.chat.completions.create({
model: "qwen3.7-max-preview",
messages: [
{ role: "user", content: "List three trade-offs of GraphQL versus REST." },
],
});
console.log(response.choices[0].message.content);
Three languages, one request shape; that’s the upside of an OpenAI-compatible API.
Streaming responses
For anything user-facing, you don’t want to wait for the full completion before showing output. Streaming sends tokens as they’re generated. Set stream to true and iterate over the chunks.
stream = client.chat.completions.create(
model="qwen3.7-max-preview",
messages=[
{"role": "user", "content": "Summarize the CAP theorem."},
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
In Node, the streamed response is an async iterable:
const stream = await client.chat.completions.create({
model: "qwen3.7-max-preview",
messages: [{ role: "user", content: "Summarize the CAP theorem." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
Streaming matters more with a reasoning model than a plain chat model. Qwen 3.7 can spend real time on its chain of thought before the final answer, so without streaming the user stares at a blank screen. With streaming you can show the thinking trace, a typing indicator, or the answer as it forms.
The reasoning and thinking parameter
Qwen3.7-Max-Preview is a reasoning model. It can produce an explicit chain of thought inside <think> blocks before it commits to a final answer. That trace pushes its scores on math and hard multi-step problems, and it helps with debugging: you can see where the model’s logic went sideways.
On recent Qwen models served through DashScope, thinking behavior is controlled with an enable_thinking flag. Confirm the exact mechanism and parameter name for the 3.7 preview tier against the current API reference, since reasoning controls have changed between Qwen versions. Conceptually, the request looks like this:
response = client.chat.completions.create(
model="qwen3.7-max-preview",
messages=[
{"role": "user", "content": "A train leaves at 2pm averaging 60mph. "
"A second leaves at 3pm at 75mph on the same route. "
"When does the second catch the first?"},
],
# Reasoning controls vary by Qwen version; confirm the current
# parameter in the Model Studio API reference before relying on it.
extra_body={"enable_thinking": True},
)
print(response.choices[0].message.content)
A few practical notes:
- Thinking costs tokens and time. The reasoning trace is generated text. It counts toward output and adds latency. For simple lookups or formatting, leave thinking off.
- Turn it on for hard problems. Multi-step math, code with tricky edge cases, planning, and analysis are where the chain of thought earns its cost.
- Decide whether to show the trace. Some apps surface the
<think>content so users see the model’s work; others strip it and show only the final answer. Both are valid.
If you’re weighing reasoning quality and cost against other frontier models, our comparison of Qwen 3.7 vs GPT-5.5 vs Opus 4.7 puts the trade-offs side by side. Reasoning models can burn tokens fast in agent loops; if that’s your situation, the techniques in our piece on how to reduce agent token costs apply directly.
Error handling and rate limits
A request can fail for predictable reasons. Handle them so your app degrades gracefully.
| HTTP status | Meaning | What to do |
|---|---|---|
| 400 | Bad request: malformed JSON, invalid parameter | Fix the request body; check the model ID and field names |
| 401 | Invalid or missing API key | Verify the key and that it matches the endpoint region |
| 403 | No access to the model | The preview tier may be gated; confirm your account is enabled |
| 404 | Model not found | The model ID is wrong or not available in your region |
| 429 | Rate limit or quota exceeded | Back off and retry; check QPS limits and account balance |
| 500 / 503 | Server-side error | Retry with exponential backoff |
Preview models throw 403 and 404 more often than stable ones, because access is gated and identifiers move. If you get one of those, the issue is usually access or the model string, not your code.
Rate limits on Model Studio are set per account as queries per second or per minute, and the exact numbers depend on your account tier and the model; check the console rather than assuming a fixed value. The pattern is the same regardless: catch 429, wait, and retry with increasing delays.
import time
from openai import OpenAI, RateLimitError, APIStatusError
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def ask_qwen(prompt, max_retries=4):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="qwen3.7-max-preview",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except RateLimitError:
wait = 2 ** attempt # 1s, 2s, 4s, 8s
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
except APIStatusError as e:
# 400/401/403/404 are not worth retrying; surface them
print(f"API error {e.status_code}: {e.message}")
raise
raise RuntimeError("Failed after retries")
Exponential backoff on 429 and 5xx, fail fast on 4xx. That split keeps you from hammering the API on errors a retry won’t fix.
Testing and mocking the Qwen API with Apidog
This is where a preview API gets painful, and where good tooling pays off. When access is gated, the model ID is shifting, and rate limits are tight, you don’t want to test by running your whole app and reading logs. You want to send a request, see exactly what comes back, and keep it around to run again. Apidog is built for that loop.

Mock the endpoint while you build. This is the big one for a gated preview. Apidog’s mock server returns realistic responses from the API schema, with no key and no rate limit. So your frontend or agent can develop against a stand-in Qwen endpoint that always responds instantly, even when real preview access is throttled, down, or not yet open for your account. When the live API is ready, flip the base URL from the mock to DashScope and your code is unchanged. For more on schema-first workflows, see our spec-first mode walkthrough.
The pattern generalizes to any model API. The same testing-and-mocking loop in Apidog works whether you’re calling Qwen, Gemini, or the ERNIE 5.1 API; a preview model makes the mocking step more valuable, because the real endpoint is the least dependable part of your stack.
Conclusion
Calling Qwen 3.7 is straightforward once you know the path. The friction is preview gating, not the API.
Stop guessing what Qwen returns and start seeing it. Download Apidog to design the Qwen endpoint, send real test requests, save reusable scenarios, and mock the API while you build. It’s free to start, and it turns an unstable preview into something you can develop against with confidence.



