What is Gemini 3.5 Flash? Google's New Fast Frontier Model Explained

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Google shipped Gemini 3.5 Flash on May 19, 2026. It’s the fast, low-cost variant of the new 3.5 family, and it’s the only model in that family you can use today. Gemini 3.5 Pro is announced for June 2026, but Flash is what landed first, and it’s the one that matters for most production workloads right now.

Flash is the model Google built for the workloads that actually run in 2026: long agent loops, terminal automation, multi-file coding, multimodal document analysis, and streaming chat. It runs roughly 4× faster than other frontier models on output tokens and costs less than half what they cost per task.

This guide walks through what Gemini 3.5 Flash is, what’s actually new, the benchmark numbers, how to access it, and how it fits next to the rest of your stack including Apidog for testing AI endpoints.

Quick facts about Gemini 3.5 Flash

Release date: May 19, 2026
Variant: Gemini 3.5 Flash (Pro arrives June 2026)
Context window: 1M tokens input, 64K output
Modalities: text, images, code, graphics generation
Headline benchmarks: 76.2% Terminal-Bench 2.1, 84.2% CharXiv Reasoning, 83.6% MCP Atlas, 1656 Elo on GDPval-AA
Speed: ~4× faster output tokens/second than other frontier models
Cost: less than half the cost of comparable frontier models for agentic tasks
API name: gemini-3.5-flash
Access: Gemini app, AI Mode in Search, Google Antigravity, Gemini API, AI Studio, Android Studio, Gemini Enterprise

For the full pricing breakdown including free-tier limits and real cost scenarios, see our Gemini 3.5 Flash pricing guide.

What’s new with 3.5 Flash vs 3 and 3.1

Gemini 3.5 Flash builds on the Gemini 3 Flash and Gemini 3.1 Pro lines with five concrete upgrades:

Agentic execution gets sharper. Flash handles longer task chains without losing the thread. Tool calls land in the right order. Subagent dispatch works as a first-class capability, not a workaround.
Coding output is denser. Multi-file refactors, long-horizon refactoring jobs, and CLI-driven workflows are where Flash clearly improves over the 3.x line.
Graphics generation got real. Interactive web UI, rich SVG, and inline diagrams come out of the model directly. You no longer route through a separate image model for in-line graphics.
Output speed jumps. Google claims roughly 4× the tokens/second of other frontier models. That changes how you build streaming UX.
Safety guardrails widened. Stronger cyber and CBRN safeguards, plus interpretability tools that explain why the model refused or rerouted a request.

The pattern is consistent. Google is optimizing Flash for production agent workloads, not just chat. That’s the same direction OpenAI and Anthropic took with GPT-5.5 and Claude Opus 4.7.

Gemini 3.5 Flash benchmarks

Flash punches well above its tier. The numbers from Google’s published table:

Benchmark	What it tests	Gemini 3.5 Flash
Terminal-Bench 2.1	Long-horizon CLI workflows	76.2%
MCP Atlas	Multi-tool coordination	83.6%
CharXiv Reasoning	Chart and diagram interpretation	84.2%
GDPval-AA	General agentic value	1656 Elo
MRCR v2 (1M context)	Long-context retrieval	Top of Google’s table

Where Flash visibly leads: chart reasoning, agentic multi-tool work, long-context retrieval.

Where it doesn’t dominate: pure SWE-Bench Verified is still a tight race between Opus 4.7 and GPT-5.5. If your only metric is single-shot bug fixes, those flagships still nudge ahead. If you care about long agent runs at low cost, Flash pulls ahead.

For a deeper three-way breakdown, see Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7.

The Gemini 3.5 model family

Gemini 3.5 Flash (available now)

Flash is the workhorse variant. It’s available immediately through AI Studio, the Gemini API, the Gemini app, AI Mode in Search, Antigravity, Android Studio, and Gemini Enterprise.

Reported pricing on launch day sits around $1.50 per 1M input tokens and $9.00 per 1M output tokens. That’s noticeably above last year’s 3.1 Flash-Lite but still far cheaper than Pro-tier competitors. See the full pricing guide for batch mode, cached input, and Vertex rates.

Where Flash shines:

High-throughput agent loops
Vision-heavy chart and document understanding
Embedded use inside Apidog test scripts where latency matters
Streaming chat UIs where output speed is visible to users
1M-token document analysis without chunking

Gemini 3.5 Pro (rolling out June 2026)

Pro is announced but not yet shipping. Google is positioning it as the agentic flagship: the variant you run when the task budget includes multi-hour autonomous work, deep research, or the absolute top of the leaderboard. Expect Pro pricing to land closer to GPT-5.5 and Opus 4.7 list rates.

Until Pro ships, Flash carries the load. The good news: Flash is already credible on agentic benchmarks, so you don’t have to wait to start building.

What about Nano?

Google didn’t ship a 3.5 Nano variant. On-device inference still rides on the 3.1 Flash-Lite line. Expect a 3.5 Nano announcement closer to the next Pixel cycle.

Where you can use Gemini 3.5 Flash

Six surfaces shipped on launch day:

Gemini app: global rollout, both free and paid tiers
AI Mode in Google Search: answers and follow-ups
Google Antigravity: Google’s agent platform for end-user automation
Gemini API: the developer entry point via AI Studio
Android Studio: IDE-level coding assistance for Android developers
Gemini Enterprise + Agent Platform: managed agent runtime for org-wide use

The newest surface is Gemini Spark, a personal agent that runs 24/7 on your account. Spark uses Flash under the hood and connects to your Gmail, Calendar, and Drive context.

Information agents inside Search are also new, small autonomous helpers that pull together updates on topics you follow without you re-querying.

How to start using Gemini 3.5 Flash

You have four real paths. Each maps to a different use case.

1. Gemini app (the chat path)

Open gemini.google.com, pick “3.5 Flash” from the model selector, and you’re done. The app surface covers most casual workloads: research, writing, coding sketches, image analysis.

2. Google AI Studio (the free dev path)

Head to ai.google.dev, sign in, and you get an API key with a free daily quota. Flash is on the free tier at roughly 1,500 requests per day on launch.

If you’ve used the Google Gemini API before, the pattern is identical. Set GEMINI_API_KEY, point the SDK at gemini-3.5-flash, send your request. See our free Gemini API key guide for the step-by-step, or our Flash-specific free guide for all five free paths.

3. Gemini API in production

Production workloads route through the same endpoint with a billed account. Flash’s per-token pricing follows the standard input/output model and lands well below flagship competitors. See How to Use the Gemini 3.5 Flash API for full code samples in Python, Node, and curl, plus streaming, tool use, and multimodal patterns.

When you wire it into your stack, test the endpoint properly. Apidog handles the full request/response cycle for the Flash REST and streaming endpoints in a single workspace, useful when you need to verify tool calls or multimodal payloads end-to-end.

4. Gemini Enterprise (the managed path)

For organizations, the Gemini Enterprise Agent Platform packages Flash with audit logs, data residency, and the Agent Platform’s runtime. This is the path most large teams will pick once they’ve prototyped on the developer API.

What Gemini 3.5 Flash is actually good at

After a day of public testing, the patterns are clear:

Long agent loops at low cost. Multi-step web research with tool calls runs further before drifting. The MCP Atlas score of 83.6% is the practical evidence. Flash picks the right tool more often, recovers from tool errors better, and doesn’t loop on the same step.

Chart and document reasoning. CharXiv at 84.2% means real reports and PDFs become tractable. If you’ve been hand-rolling chart-extraction pipelines, Flash collapses them into single calls.

Interactive UI generation. Ask for a dashboard, get working HTML + interactive widgets in one pass. The graphics quality jump over 3.1 Flash-Lite is the most visible upgrade.

Cost-sensitive production workloads. “Less than half the cost of other frontier models” is Google’s framing for agentic tasks. Even allowing for marketing math, Flash’s per-task cost for a long agent run is materially lower than Opus 4.7 or GPT-5.5. The numbers are in our pricing breakdown.

What Flash is still not great at

No model is a silver bullet. Three honest weak points on day one:

Pure SWE-Bench Verified: Opus 4.7’s 87.6% still leads on isolated bug-fix benchmarks. If your only KPI is single-issue resolution, the gap to Flash is real.
Voice: Gemini’s voice stack is separate. Compare with Grok Voice vs GPT-Realtime for that workload.
Tool ecosystem maturity: OpenAI and Anthropic both have a head start in third-party adapters. Google is catching up fast with Antigravity, but the ecosystem is younger.

How to test Gemini 3.5 Flash properly

Two things matter when you bring a new model into a production stack: response shape stability and tool-call correctness.

Build a small evaluation harness:

Pin a set of representative prompts
Run them against gemini-3.5-flash and your current model
Score on latency, token cost, and downstream task success
Watch for tool-call schema drift between minor versions

For step 1 and 3, Apidog gives you a recorded test suite for the Flash API endpoints, including streaming. You can replay the same prompts across model versions and diff the outputs. Download Apidog if you want to set this up locally.

Migration tips from Gemini 3.1 to 3.5 Flash

If you’re already on 3.1, the migration is a one-line model string change in most SDKs. A few details worth flagging:

Token budgets are stable. 1M input / 64K output stays the same.
Tool schemas are stable. Existing function definitions carry over without changes.
Output speed roughly 4× faster. Your streaming UI may need to throttle if it can’t render that fast.
Pricing is different. Re-baseline cost projections using the Flash pricing guide before shifting heavy traffic.
Safety responses are stricter. Expect different refusal patterns; rerun your red-team eval.

For deeper migration notes, the Google Gemini 3 API guide covers the SDK pattern in detail.

FAQ

When is Gemini 3.5 Pro available? Google announced “rolling out next month” on May 19, 2026. Expect general availability in June 2026 across AI Studio, Gemini API, and Gemini Enterprise. Until then, Flash is the only 3.5 variant you can call.

Is Gemini 3.5 Flash free to use? Yes, with daily quotas. The Gemini app’s standard tier and AI Studio with an API key both give you Flash access without payment. See our Flash free guide and Get Free Unlimited Gemini API for the five free paths.

Does Gemini 3.5 Flash support function calling? Yes. Tool calling and subagent dispatch are first-class. The MCP Atlas score of 83.6% is the headline evidence.

How does Flash compare to Opus 4.7 and GPT-5.5? Flash leads on cost, output speed, and chart reasoning. Opus 4.7 still edges ahead on SWE-Bench Pro and long-form writing. GPT-5.5 wins on token efficiency. See the three-way comparison for the workload-by-workload breakdown.

Can I run Gemini 3.5 Flash locally? No. There’s no open-weights release. For local inference, look at the best local LLMs of 2026 instead.

Does Gemini 3.5 Flash work with Cursor? Yes, through the standard Gemini API. The pattern is the same as Gemini 3.0 Pro with Cursor.

What’s the API model name for Flash? gemini-3.5-flash. Use this string in the SDK or REST endpoint.

What this means for your stack

If you’re running an AI feature in production today, here’s the short version:

Already on 3.1 Flash? Test 3.5 Flash side by side this week. The output-speed jump alone is worth the swap on streaming UIs.
Already on Opus 4.7 or GPT-5.5? Run a cost-and-quality eval against Flash. For agent-heavy workloads, the cost gap may justify routing some traffic to Flash.
Building a new agent loop? Start on Flash. It’s the cheapest path with credible agentic performance.
Heavy multimodal workload? Move now. CharXiv Reasoning at 84.2% is meaningful.

Whatever path you take, treat the model as one component in a pipeline that needs end-to-end testing. Apidog covers the testing side for the Gemini API specifically; the rest of the loop, prompt design, tool wiring, eval scripting, is on you.

In this article

Quick facts about Gemini 3.5 Flash What’s new with 3.5 Flash vs 3 and 3.1 Gemini 3.5 Flash benchmarks The Gemini 3.5 model family Gemini 3.5 Flash (available now)Gemini 3.5 Pro (rolling out June 2026)What about Nano?Where you can use Gemini 3.5 Flash How to start using Gemini 3.5 Flash 1. Gemini app (the chat path)2. Google AI Studio (the free dev path)3. Gemini API in production 4. Gemini Enterprise (the managed path)What Gemini 3.5 Flash is actually good at What Flash is still not great at How to test Gemini 3.5 Flash properly Migration tips from Gemini 3.1 to 3.5 Flash FAQ What this means for your stack

Apidog: A Real Design-first API Development Platform

API Design

API Documentation

API Debugging

Automated Testing

API Mocking

More

Get Started for Free

Enterprise

On-Premises or SaaS or EU-hosted

SSO, RBAC & audit logs

SOC 2, GDPR, ISO 27001

Explore Apidog Enterprise

Explore more

What is httpbin? Endpoints, How to Use It, and Alternatives

What is httpbin? A simple HTTP request and response service for testing clients. Learn its key endpoints, how to use it with curl, self-host it with Docker, and the best httpbin alternatives.

3 July 2026

What is Jamstack? The decoupled architecture where your API is the product

What is Jamstack? A clear guide to the JavaScript, APIs, and Markup architecture: pre-rendering, decoupling, build-time vs runtime data, and where it fits.

3 July 2026

Webhook vs API: What's the Real Difference?

Webhook vs API, explained: a regular API waits for you to ask (pull), a webhook pushes data the moment an event fires. Why it's not either-or, and when to use each.

3 July 2026