Qwen 3.7 Plus: Alibaba's multimodal agent model, benchmarks and pricing

Qwen 3.7 Plus is Alibaba's multimodal sibling of Qwen3.7-Max: text, image and video input, 1M context, GUI-agent grounding, and a budget price of $0.40/$1.60 per 1M tokens. Benchmarks, access, and the proprietary catch.

Ashley Innocent

Ashley Innocent

3 June 2026

Qwen 3.7 Plus: Alibaba's multimodal agent model, benchmarks and pricing

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

Alibaba shipped Qwen 3.7 Plus just few days after Qwen3.7-Max. The short version: Plus is Max with eyes. It keeps the same 1M-token context and agentic backbone, adds image and video input, and lands at roughly a sixth of Max’s price. If you’ve been following the family, our guide to what Qwen 3.7 is covers the text flagship; this post is about what the new Plus variant adds.

One thing to flag up front, because it changes who should care: Qwen 3.7 Plus is API-only and proprietary. There are no open weights, which breaks from Qwen’s open-source habit. We’ll get to what that means below. Since Plus ships only as an API, you’ll spend your time calling and debugging it; that’s where Apidog comes in, covered at the end.

button

The short answer

Qwen 3.7 Plus is the multimodal, budget-priced sibling of Qwen3.7-Max. Hand it a screenshot, a design mockup, or a video, and it reasons over them as a first-class input. It’s built for agents that drive graphical interfaces: it can look at an app screenshot and return exact pixel coordinates to click.

On pure text, Max still edges it slightly. On anything with a visual signal, Plus is the one you want, and it costs a fraction of Max either way. The only real downside is the closed weights.

What’s new versus Qwen 3.7 Max

Three changes matter.

It sees. Max is text-only. Plus accepts text, images, and video. That unlocks screenshot perception, document and PDF reading, and video understanding from a single model.

It grounds GUIs. Plus is positioned as a multimodal interactive agent that handles browser automation, GUI navigation, and hybrid GUI-plus-CLI workflows. It produces structured action plans like “click at (x=487, y=232),” which is what makes computer-use agents actually work.

It’s cheap. Plus runs at a budget tier well below Max.

Qwen 3.7 Plus Qwen 3.7 Max
Input modalities Text, image, video Text only
Context window 1M tokens (shared with vision) 1M tokens
Input / output per 1M $0.40 / $1.60 $2.50 / $7.50
Cached input per 1M $0.08 $0.25
GUI grounding (ScreenSpot Pro) 79.0 None
Terminal-Bench 70.3 69.7
Autonomous run ceiling 35 hours 35 hours

Benchmarks

The launch numbers, backed up by early hands-on reviews, tell a consistent story: Plus matches or slightly trails Max on text, then pulls ahead the moment vision enters the picture.

The pattern is clear. Pick Plus when the task carries a visual signal: a screenshot, a mockup, a chart. For a head-to-head on the text side, our Qwen 3.7 vs GPT-5.5 vs Opus 4.7 comparison covers where the family lands against the Western flagships. As always, benchmark numbers come from the vendor and early reviewers, so treat them as direction rather than gospel.

Pricing: the budget multimodal tier

Here’s where Plus gets interesting. At $0.40 input and $1.60 output per million tokens, it’s roughly six times cheaper than Max on input and nearly five times cheaper on output. Cached input drops to $0.08. You get vision and a 1M context for less than most text-only models charge.

One caveat worth building into your cost model: images and video share that 1M-token budget. A high-resolution screenshot can burn thousands of tokens, and video frames add up fast, so your effective text headroom shrinks as the visual payload grows. Budget for it. For the wider context on why Chinese labs keep undercutting on price, see our breakdown of the 2026 Chinese LLM price war.

The catch: proprietary and API-only

Qwen built its enterprise traction on open weights. Much of the earlier Qwen line shipped under Apache 2.0 or open-use licenses, so teams could download, fine-tune, and run models inside air-gapped data centers. Qwen 3.7 Plus does not do that.

Plus is delivered strictly as a managed commercial API through Alibaba Cloud Model Studio. You can’t download the weights, you can’t self-host, and you can’t run it offline. For regulated or air-gapped environments, that’s a hard stop. An open-weight Plus variant has been floated for Q3 2026, but it isn’t confirmed, and the proprietary tier may stay closed. If open weights are a requirement, this model isn’t your pick today; rivals like Step 3.7 Flash ship under Apache 2.0 and undercut it on price.

How to access Qwen 3.7 Plus

Two paths:

A minimal multimodal call uses the standard OpenAI message format, with an image part added alongside the text:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MODEL_STUDIO_KEY",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

resp = client.chat.completions.create(
    model="qwen3.7-plus",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Which button submits this form? Give pixel coordinates."},
            {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}},
        ],
    }],
)
print(resp.choices[0].message.content)

Check the Model Studio docs for the exact model identifier and the regional base URL, since those differ between the international and China endpoints.

Who should use it

Reach for Qwen 3.7 Plus when your work looks like this:

Stick with Max if you’re optimizing purely for SWE-Bench Pro text scores or need the fastest text-only latency, where it runs a bit quicker on cold paths. For most mixed workloads, the cheaper multimodal option is the sensible default. If you’re weighing Plus against other open and budget models, our MiniMax M3 vs DeepSeek V4 vs Qwen 3.7 comparison is a useful map.

Testing Qwen 3.7 Plus with Apidog

Because Plus is API-only, you live in the API. Multimodal requests are fiddly: you’re encoding images, attaching video, and reading back structured action plans, often inside a tool-calling loop that runs for minutes or hours. You need to see exactly what each request sends and what comes back.

Apidog is built for that. Send Qwen 3.7 Plus requests with image and video payloads, inspect the raw responses, manage your Model Studio keys across environments, and mock the endpoint so your app keeps building while you tune prompts. For the agentic side, where Plus chains tool calls across a GUI-and-CLI workflow, Apidog’s AI agent debugger shows the full call sequence so you can find where a run went wrong.

Download Apidog to test, debug, and mock the Qwen 3.7 Plus API before it reaches production.

FAQ

Is Qwen 3.7 Plus open source? No. It’s proprietary and available only as a managed API through Alibaba Cloud Model Studio. You can’t download or self-host the weights. An open-weight variant has been suggested for Q3 2026 but isn’t confirmed.

Qwen 3.7 Plus or Max, which should I use? Use Plus if you need vision (screenshots, PDFs, video) or want the lower price, which covers most workloads. Use Max if you’re tuning for pure-text SWE-Bench Pro scores or need the fastest text-only latency.

How much does Qwen 3.7 Plus cost? $0.40 per million input tokens, $1.60 per million output tokens, and $0.08 for cached input. That’s roughly six times cheaper than Qwen3.7-Max.

Does Qwen 3.7 Plus handle video? Yes. It accepts text, images, and video as input. Remember that visual tokens share the 1M-token context budget, so large media payloads reduce your text headroom.

What’s the context window? 1M tokens, inherited from the Max backbone, shared across text, image, and video tokens.

How do I access Qwen 3.7 Plus? Through the Alibaba Cloud Model Studio API, or try it in the browser at chat.qwen.ai.

The bottom line

Qwen 3.7 Plus takes Alibaba’s agentic flagship, bolts on vision, and cuts the price to a budget tier. For builders shipping computer-use agents, screenshot-driven coding, or video understanding, it’s one of the cheapest frontier-tier multimodal options available. The trade you accept is closed weights and a hard dependency on Alibaba’s cloud.

If that trade works for you, the next step is the API itself. Test it, debug the multimodal calls, and mock the responses in Apidog so what you ship holds up under real traffic.

button

Explore more

Qwen 3.7 Plus vs Max: which Qwen 3.7 model should you use?

Qwen 3.7 Plus vs Max: which Qwen 3.7 model should you use?

Qwen 3.7 Plus vs Max compared: benchmarks, pricing, speed, and vision. Plus adds image and video at about six times less cost; Max keeps a small text-only edge. Here's which to pick.

3 June 2026

Looking for a Bruno Alternative That Does More Than Git?

Looking for a Bruno Alternative That Does More Than Git?

Bruno is a great Git-native client, but stops at requests. See how an all-in-one API platform adds mocking, hosted docs, and visual design.

2 June 2026

Is Bruno Request-First? When You Need a Design-First Tool

Is Bruno Request-First? When You Need a Design-First Tool

Bruno is request-first by design. Here's when a design-first, OpenAPI-native workflow wins, and how Apidog Spec-First Mode delivers it.

2 June 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

Qwen 3.7 Plus: Alibaba's multimodal agent model, benchmarks and pricing