Qwen 3.7 Plus is Alibaba’s multimodal agent model: text, image, and video in, a 1M-token context, and a budget price. Because it ships only as an API, the practical questions are immediate. How do I get a key, how do I send an image, and what does it cost? This guide answers all three.
We’ll cover access, getting a key, your first request in Python, curl, and JavaScript, the multimodal payload format, the full pricing breakdown with worked cost examples, and rate limits. Along the way you’ll use Apidog to fire test requests, inspect raw responses, and mock the endpoint so your app keeps building. If you want the capabilities and benchmarks first, start with our Qwen 3.7 Plus overview; for the text-only flagship, see the base Qwen 3.7 API guide.
TL;DR
Qwen 3.7 Plus runs through Alibaba Cloud Model Studio on an OpenAI-compatible endpoint. You set a region base URL, pass your key as a Bearer token, and call /chat/completions with a qwen3.7-plus model ID. Multimodal requests add image or video parts to the message content. Pricing is $0.40 per million input tokens and $1.60 per million output tokens, with cached input at $0.08, roughly six times cheaper than Qwen3.7-Max. There’s no perpetual free tier, though new accounts get a one-time free quota. Vision tokens share the context budget, so images and video drive your bill. Confirm the exact model ID in the Model Studio docs before you ship.
How to access Qwen 3.7 Plus
Unlike the text flagship, which spent its early days behind a chat-only preview, Plus is a commercial API from day one. Two surfaces matter.

Qwen Chat (chat.qwen.ai). The fastest way to try Plus with an image. Sign in, pick the Plus model, drop in a screenshot, and see how it grounds. It’s for evaluation, not integration.
Alibaba Cloud Model Studio (DashScope). This is the real API. Model Studio exposes Plus through an OpenAI-compatible endpoint, so any code that already talks to the OpenAI SDK can call it with a base-URL and key swap.
One hard limit to plan around: Plus is proprietary. There are no open weights to download, so you can’t self-host or run it air-gapped. If that’s a requirement, our Qwen 3.7 Plus overview covers the trade in detail.
| Method | API access | Cost | Best for |
|---|---|---|---|
| Qwen Chat (chat.qwen.ai) | No | Free, rate-limited | Quick evaluation with images |
| Model Studio (DashScope) | Yes, OpenAI-compatible | Pay per token | Production integration |
| Self-hosting | No | n/a | Not available; weights are closed |
Getting a Qwen 3.7 Plus API key
Access goes through an Alibaba Cloud account.
- Create an Alibaba Cloud account and open the Model Studio console (
modelstudio.console.alibabacloud.com). - Activate Model Studio for your account and region. Keys are region-scoped, so a Singapore key won’t authenticate against Beijing.
- Open the API keys section and generate a key. It looks like
sk-followed by a string. - Copy it once and store it like a password.
Your region sets your base URL:
| Region | Base URL |
|---|---|
| Singapore | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
| US (Virginia) | https://dashscope-us.aliyuncs.com/compatible-mode/v1 |
| Beijing (China) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
Keep the key out of source control. Use an environment variable:
# macOS / Linux
export DASHSCOPE_API_KEY="sk-your-key-here"
# Windows PowerShell
setx DASHSCOPE_API_KEY "sk-your-key-here"
Your first request: Python, curl, and JavaScript
The endpoint is OpenAI-compatible, so you can use the official OpenAI SDK pointed at the DashScope base URL, or a raw HTTP call. The model ID is qwen3.7-plus, but confirm the current string in the Model Studio model list before shipping, since identifiers can shift.
Python with the OpenAI SDK
Install with pip install openai, then:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
resp = client.chat.completions.create(
model="qwen3.7-plus",
messages=[{"role": "user", "content": "Summarize the Qwen 3.7 Plus pricing model in two sentences."}],
)
print(resp.choices[0].message.content)
curl
curl "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-plus",
"messages": [{"role": "user", "content": "Hello from the Qwen 3.7 Plus API."}]
}'
JavaScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
const resp = await client.chat.completions.create({
model: "qwen3.7-plus",
messages: [{ role: "user", content: "Hello from the Qwen 3.7 Plus API." }],
});
console.log(resp.choices[0].message.content);
Sending images and video
The reason to use Plus over Max is multimodal input. You pass visual content as extra parts in the message content array, the same shape the OpenAI vision API uses.
resp = client.chat.completions.create(
model="qwen3.7-plus",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Which button submits this form? Give pixel coordinates."},
{"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}},
],
}],
)
You can pass an image as a public URL or a base64 data URI. Video follows the same pattern with a video part. This is what powers the GUI-grounding behavior: hand Plus a screenshot, and it returns structured actions like click at (x=487, y=232). The exact part names for video can differ by region, so check the OpenAI-compatibility docs for the current schema.
Pricing
Plus is priced as a budget multimodal tier. Here’s how it compares to the text flagship.
| Model | Input / 1M | Output / 1M | Cached input / 1M |
|---|---|---|---|
| Qwen 3.7 Plus | $0.40 | $1.60 | $0.08 |
| Qwen 3.7 Max | $2.50 | $7.50 | $0.25 |
That’s roughly six times cheaper than Max on input. There’s no perpetual free tier, but new Model Studio accounts get a one-time free token quota (usually in the Singapore region) to evaluate the model before billing switches to pay-as-you-go. Note that the old Qwen OAuth free path was retired on April 15, 2026, so don’t build on it. The official numbers live on the Model Studio pricing page and the free-quota guide; for zero-cost ways to try the wider family, see our Qwen 3.7 for free guide.
What requests actually cost
Text is cheap. Vision is where the bill grows, because images and video are converted to tokens that share the same per-token rate and the same 1M context budget. A high-resolution screenshot can run to a few thousand tokens, and video frames stack up fast.
| Request | Input tokens | Output tokens | Approx cost |
|---|---|---|---|
| Text-only prompt | 10,000 | 2,000 | ~$0.007 |
| One 1080p screenshot + prompt | ~1,500 | 300 | ~$0.001 |
| 30s video sampled at 2 fps | ~77,000 | 500 | ~$0.032 |
The frame-token figures are approximate and depend on resolution and sampling rate, but the lesson holds: a text agent on Plus costs almost nothing, while a video-heavy workload can cost 30 times more per call. Downscale screenshots and sample video sparingly. For wider cost strategy, see our notes on reducing agent token costs and the 2026 Chinese LLM price war that put Plus at this price in the first place.
Rate limits and errors
Model Studio enforces per-account rate limits in requests per minute and tokens per minute, and the ceilings depend on your account tier and region rather than a single published number. Check the quota page in the console for your current limits, and request an increase there if you hit them.
Handle the common failures:
- 401 Unauthorized: wrong key, or a key from the wrong region for your base URL.
- 429 Too Many Requests: you’ve hit the rate limit. Back off and retry with exponential delay.
- 400 Bad Request: usually a malformed multimodal payload, an oversized image, or a context overflow once vision tokens are counted.
Wrap calls in a retry with backoff on 429 and 5xx, and validate image size before sending.
Test and mock the API with Apidog
Multimodal requests are easy to get wrong. You’re base64-encoding images, nesting content arrays, and reading back structured action plans, often inside a tool-calling loop. Eyeballing that in a terminal gets old fast.

Apidog gives you a real workspace for it. Send Qwen 3.7 Plus requests with image and video parts, inspect the raw JSON response, store your Model Studio key per environment so you never paste it into code, and mock the endpoint so your frontend builds while you tune prompts. When Plus is chaining tool calls across a GUI-and-CLI agent run, Apidog’s AI agent debugger shows the full sequence so you can find where a run broke.
Download Apidog to test, debug, and mock the Qwen 3.7 Plus API before it reaches production.
FAQ
Is there a free tier for the Qwen 3.7 Plus API? No perpetual free tier. New Alibaba Cloud Model Studio accounts get a one-time free token quota to evaluate, usually in the Singapore region, then billing moves to pay-as-you-go.
What’s the model ID? qwen3.7-plus on Model Studio. Because identifiers can change, confirm the current string in the Model Studio model list before you ship.
How is image and video cost calculated? Visual content is converted to tokens billed at the standard input rate. A 1080p screenshot can cost a few thousand tokens, and video adds tokens per sampled frame, so large media payloads dominate the bill.
How is the API different from Qwen 3.7 Max? Same OpenAI-compatible shape and base URLs. Plus accepts image and video parts in the message content and costs about six times less; Max is text-only and keeps a small edge on pure-text benchmarks.
Can I self-host Qwen 3.7 Plus? No. The weights are closed, so it runs only through Alibaba Cloud Model Studio.
Which base URL should I use? The one matching the region where you created your key: Singapore, US (Virginia), or Beijing. A key won’t authenticate against a different region’s endpoint.
The bottom line
Calling Qwen 3.7 Plus is a base-URL-and-key swap on the OpenAI SDK, plus image or video parts when you need vision. The pricing is genuinely cheap for text and scales with your visual payload, so the discipline is in how many pixels you send, not the API itself. Get a key, send your first multimodal request, and test the whole flow in Apidog before you wire it into production.



