What is Gemini 3.5 Flash? Google's New Fast Frontier Model Explained

Gemini 3.5 Flash explained: shipped May 19, 2026 with 1M context, 76.2% Terminal-Bench, 84.2% CharXiv, ~4x faster output, and free-tier access. Pro arrives June.

Ashley Innocent

Ashley Innocent

20 May 2026

What is Gemini 3.5 Flash? Google's New Fast Frontier Model Explained

Google shipped Gemini 3.5 Flash on May 19, 2026. It’s the fast, low-cost variant of the new 3.5 family, and it’s the only model in that family you can use today. Gemini 3.5 Pro is announced for June 2026, but Flash is what landed first, and it’s the one that matters for most production workloads right now.

Flash is the model Google built for the workloads that actually run in 2026: long agent loops, terminal automation, multi-file coding, multimodal document analysis, and streaming chat. It runs roughly 4× faster than other frontier models on output tokens and costs less than half what they cost per task.

This guide walks through what Gemini 3.5 Flash is, what’s actually new, the benchmark numbers, how to access it, and how it fits next to the rest of your stack including Apidog for testing AI endpoints.

Quick facts about Gemini 3.5 Flash

For the full pricing breakdown including free-tier limits and real cost scenarios, see our Gemini 3.5 Flash pricing guide.

What’s new with 3.5 Flash vs 3 and 3.1

Gemini 3.5 Flash builds on the Gemini 3 Flash and Gemini 3.1 Pro lines with five concrete upgrades:

  1. Agentic execution gets sharper. Flash handles longer task chains without losing the thread. Tool calls land in the right order. Subagent dispatch works as a first-class capability, not a workaround.
  2. Coding output is denser. Multi-file refactors, long-horizon refactoring jobs, and CLI-driven workflows are where Flash clearly improves over the 3.x line.
  3. Graphics generation got real. Interactive web UI, rich SVG, and inline diagrams come out of the model directly. You no longer route through a separate image model for in-line graphics.
  4. Output speed jumps. Google claims roughly 4× the tokens/second of other frontier models. That changes how you build streaming UX.
  5. Safety guardrails widened. Stronger cyber and CBRN safeguards, plus interpretability tools that explain why the model refused or rerouted a request.

The pattern is consistent. Google is optimizing Flash for production agent workloads, not just chat. That’s the same direction OpenAI and Anthropic took with GPT-5.5 and Claude Opus 4.7.

Gemini 3.5 Flash benchmarks

Flash punches well above its tier. The numbers from Google’s published table:

Benchmark What it tests Gemini 3.5 Flash
Terminal-Bench 2.1 Long-horizon CLI workflows 76.2%
MCP Atlas Multi-tool coordination 83.6%
CharXiv Reasoning Chart and diagram interpretation 84.2%
GDPval-AA General agentic value 1656 Elo
MRCR v2 (1M context) Long-context retrieval Top of Google’s table

Where Flash visibly leads: chart reasoning, agentic multi-tool work, long-context retrieval.

Where it doesn’t dominate: pure SWE-Bench Verified is still a tight race between Opus 4.7 and GPT-5.5. If your only metric is single-shot bug fixes, those flagships still nudge ahead. If you care about long agent runs at low cost, Flash pulls ahead.

For a deeper three-way breakdown, see Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7.

The Gemini 3.5 model family

Gemini 3.5 Flash (available now)

Flash is the workhorse variant. It’s available immediately through AI Studio, the Gemini API, the Gemini app, AI Mode in Search, Antigravity, Android Studio, and Gemini Enterprise.

Reported pricing on launch day sits around $1.50 per 1M input tokens and $9.00 per 1M output tokens. That’s noticeably above last year’s 3.1 Flash-Lite but still far cheaper than Pro-tier competitors. See the full pricing guide for batch mode, cached input, and Vertex rates.

Where Flash shines:

Gemini 3.5 Pro (rolling out June 2026)

Pro is announced but not yet shipping. Google is positioning it as the agentic flagship: the variant you run when the task budget includes multi-hour autonomous work, deep research, or the absolute top of the leaderboard. Expect Pro pricing to land closer to GPT-5.5 and Opus 4.7 list rates.

Until Pro ships, Flash carries the load. The good news: Flash is already credible on agentic benchmarks, so you don’t have to wait to start building.

What about Nano?

Google didn’t ship a 3.5 Nano variant. On-device inference still rides on the 3.1 Flash-Lite line. Expect a 3.5 Nano announcement closer to the next Pixel cycle.

Where you can use Gemini 3.5 Flash

Six surfaces shipped on launch day:

  1. Gemini app: global rollout, both free and paid tiers
  2. AI Mode in Google Search: answers and follow-ups
  3. Google Antigravity: Google’s agent platform for end-user automation
  4. Gemini API: the developer entry point via AI Studio
  5. Android Studio: IDE-level coding assistance for Android developers
  6. Gemini Enterprise + Agent Platform: managed agent runtime for org-wide use

The newest surface is Gemini Spark, a personal agent that runs 24/7 on your account. Spark uses Flash under the hood and connects to your Gmail, Calendar, and Drive context.

Information agents inside Search are also new, small autonomous helpers that pull together updates on topics you follow without you re-querying.

How to start using Gemini 3.5 Flash

You have four real paths. Each maps to a different use case.

1. Gemini app (the chat path)

Open gemini.google.com, pick “3.5 Flash” from the model selector, and you’re done. The app surface covers most casual workloads: research, writing, coding sketches, image analysis.

2. Google AI Studio (the free dev path)

Head to ai.google.dev, sign in, and you get an API key with a free daily quota. Flash is on the free tier at roughly 1,500 requests per day on launch.

If you’ve used the Google Gemini API before, the pattern is identical. Set GEMINI_API_KEY, point the SDK at gemini-3.5-flash, send your request. See our free Gemini API key guide for the step-by-step, or our Flash-specific free guide for all five free paths.

3. Gemini API in production

Production workloads route through the same endpoint with a billed account. Flash’s per-token pricing follows the standard input/output model and lands well below flagship competitors. See How to Use the Gemini 3.5 Flash API for full code samples in Python, Node, and curl, plus streaming, tool use, and multimodal patterns.

When you wire it into your stack, test the endpoint properly. Apidog handles the full request/response cycle for the Flash REST and streaming endpoints in a single workspace, useful when you need to verify tool calls or multimodal payloads end-to-end.

4. Gemini Enterprise (the managed path)

For organizations, the Gemini Enterprise Agent Platform packages Flash with audit logs, data residency, and the Agent Platform’s runtime. This is the path most large teams will pick once they’ve prototyped on the developer API.

What Gemini 3.5 Flash is actually good at

After a day of public testing, the patterns are clear:

Long agent loops at low cost. Multi-step web research with tool calls runs further before drifting. The MCP Atlas score of 83.6% is the practical evidence. Flash picks the right tool more often, recovers from tool errors better, and doesn’t loop on the same step.

Chart and document reasoning. CharXiv at 84.2% means real reports and PDFs become tractable. If you’ve been hand-rolling chart-extraction pipelines, Flash collapses them into single calls.

Interactive UI generation. Ask for a dashboard, get working HTML + interactive widgets in one pass. The graphics quality jump over 3.1 Flash-Lite is the most visible upgrade.

Cost-sensitive production workloads. “Less than half the cost of other frontier models” is Google’s framing for agentic tasks. Even allowing for marketing math, Flash’s per-task cost for a long agent run is materially lower than Opus 4.7 or GPT-5.5. The numbers are in our pricing breakdown.

What Flash is still not great at

No model is a silver bullet. Three honest weak points on day one:

How to test Gemini 3.5 Flash properly

Two things matter when you bring a new model into a production stack: response shape stability and tool-call correctness.

Build a small evaluation harness:

  1. Pin a set of representative prompts
  2. Run them against gemini-3.5-flash and your current model
  3. Score on latency, token cost, and downstream task success
  4. Watch for tool-call schema drift between minor versions

For step 1 and 3, Apidog gives you a recorded test suite for the Flash API endpoints, including streaming. You can replay the same prompts across model versions and diff the outputs. Download Apidog if you want to set this up locally.

Migration tips from Gemini 3.1 to 3.5 Flash

If you’re already on 3.1, the migration is a one-line model string change in most SDKs. A few details worth flagging:

For deeper migration notes, the Google Gemini 3 API guide covers the SDK pattern in detail.

FAQ

When is Gemini 3.5 Pro available? Google announced “rolling out next month” on May 19, 2026. Expect general availability in June 2026 across AI Studio, Gemini API, and Gemini Enterprise. Until then, Flash is the only 3.5 variant you can call.

Is Gemini 3.5 Flash free to use? Yes, with daily quotas. The Gemini app’s standard tier and AI Studio with an API key both give you Flash access without payment. See our Flash free guide and Get Free Unlimited Gemini API for the five free paths.

Does Gemini 3.5 Flash support function calling? Yes. Tool calling and subagent dispatch are first-class. The MCP Atlas score of 83.6% is the headline evidence.

How does Flash compare to Opus 4.7 and GPT-5.5? Flash leads on cost, output speed, and chart reasoning. Opus 4.7 still edges ahead on SWE-Bench Pro and long-form writing. GPT-5.5 wins on token efficiency. See the three-way comparison for the workload-by-workload breakdown.

Can I run Gemini 3.5 Flash locally? No. There’s no open-weights release. For local inference, look at the best local LLMs of 2026 instead.

Does Gemini 3.5 Flash work with Cursor? Yes, through the standard Gemini API. The pattern is the same as Gemini 3.0 Pro with Cursor.

What’s the API model name for Flash? gemini-3.5-flash. Use this string in the SDK or REST endpoint.

What this means for your stack

If you’re running an AI feature in production today, here’s the short version:

Whatever path you take, treat the model as one component in a pipeline that needs end-to-end testing. Apidog covers the testing side for the Gemini API specifically; the rest of the loop, prompt design, tool wiring, eval scripting, is on you.

Explore more

Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7: Can a Fast-Tier Model Beat the Flagships?

Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7: Can a Fast-Tier Model Beat the Flagships?

Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 head-to-head: SWE-Bench, Terminal-Bench, pricing, context windows, agentic performance, and when to pick each model in 2026.

20 May 2026

What is Gemini Omni? Google's Reasoning-First Video Model

What is Gemini Omni? Google's Reasoning-First Video Model

Gemini Omni is Google's new model combining reasoning with native video generation. See what it does, when the API ships, and how to test it with Apidog.

20 May 2026

Claude Managed Agents vs Agent SDK (2026): Which to Choose

Claude Managed Agents vs Agent SDK (2026): Which to Choose

Claude Managed Agents vs Agent SDK in 2026: compare control, cost, ops, observability, and data residency, plus how to test the APIs your agents call.

19 May 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs