Gemini 3.5 Flash Pricing: How Much Does It Actually Cost ?

Gemini 3.5 Flash pricing breakdown: ~$1.50 input / ~$9 output per 1M tokens, free tier (1500 req/day), 50% batch discount, real-world cost scenarios, and comparison to GPT-5.5 and Opus 4.7.

Ashley Innocent

Ashley Innocent

20 May 2026

Gemini 3.5 Flash Pricing: How Much Does It Actually Cost ?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

Google shipped Gemini 3.5 Flash on May 19, 2026, and the headline pricing claim is bold: “less than half the cost of other frontier models” for agentic tasks. That’s the marketing line. This guide does the actual math.

You’ll find the per-token rates, the free-tier caps, the batch-mode discount, real-world cost scenarios for common workloads, and a side-by-side cost comparison against GPT-5.5 and Claude Opus 4.7. By the end, you’ll know exactly what Flash costs to run, and where you can save 50% or more without giving up much.

Quick summary

Cost type Rate
Standard input ~$1.50 / 1M tokens
Standard output ~$9.00 / 1M tokens
Batch mode input ~$0.75 / 1M tokens (~50% off)
Batch mode output ~$4.50 / 1M tokens (~50% off)
Cached input reduced rate (varies)
Free tier (AI Studio) ~1,500 requests/day, 1M tokens/min, 15 RPM
Vertex AI new account $300 credit over 90 days

Rates current as of May 2026 per Google’s launch announcement and aggregator listings. Always verify against the official pricing page before committing budget.

Gemini 3.5 Flash per-token rates

Flash uses the same pay-as-you-go model that every Gemini variant has used since 2.5: you pay per million input tokens and per million output tokens, independently.

Tier Input ($/1M) Output ($/1M)
Standard ~$1.50 ~$9.00
Cached input discounted n/a
Batch (async) ~$0.75 ~$4.50

Two practical notes:

For background on how Gemini’s batch mode works, see Gemini API batch mode is here and 50% cheaper.

Free tier: what you get without paying

The AI Studio free tier ships with Flash from day one. Limits on launch:

That’s enough for most side projects, internal prototypes, and small-scale automation. If your workload fits inside 1,500 calls/day, you pay $0.

Free-tier specifics:

For the full setup walkthrough, see How to use Gemini 3.5 Flash for free and How to get a free Google Gemini API key.

Batch mode: the 50% discount most teams miss

If your workload doesn’t need real-time responses, batch mode cuts Flash costs roughly in half.

How it works:

  1. Submit a batch job with up to 50,000 prompts at once
  2. Google processes them within 24 hours
  3. You pay ~50% less per token, both input and output

When batch mode makes sense:

When it doesn’t:

Most production stacks should run batch mode for any workload that can tolerate latency. The savings compound fast at scale. Setup details in our batch mode guide.

Cached input: another lever

If your prompts share a long static prefix (system prompt, big reference document, long instructions), context caching gives you a discount on the cached portion.

Pattern:

Concrete savings depend on cache hit rate, but for RAG-style apps where the same retrieved chunks come back across queries, expect 30–60% input-cost reduction.

Real-world cost scenarios

Token math gets abstract fast. Here are five concrete scenarios at Flash’s standard rates.

Scenario 1: Customer support chat bot

Daily cost:

Run the same workload through batch mode (if you can tolerate batched responses): ~$585/month. Add context caching for the system prompt: another 20–30% off.

Scenario 2: Document Q&A SaaS

Daily cost:

This is where Flash’s 1M context shines: no chunking infrastructure, just send the whole document. Compared to chunked RAG with a flagship model, you’d pay multiples more in API plus infrastructure.

Scenario 3: Long-running autonomous agent

Per-run cost:

Daily total: 200 × $0.83 = ~$165/day, ~$4,950/month

For comparison, the same workload on Opus 4.7 (~$15/$75 per 1M) costs roughly $25/run, or $5,000/day. That’s the agentic cost gap Google’s claim is pointing at.

Scenario 4: Chart extraction pipeline

Daily cost:

Add batch mode and the same workload runs at ~$375/month. CharXiv reasoning at 84.2% means the quality holds up.

Scenario 5: High-volume content generation

Daily cost:

Move this to batch mode and the monthly bill drops to ~$28K. At this scale you’d also want to test routing routine pieces to even cheaper models like 3.1 Flash-Lite and reserving Flash for harder generations.

Cost vs GPT-5.5 and Opus 4.7

The headline pricing comparison:

Model Input ($/1M) Output ($/1M) Multiple vs Flash
Gemini 3.5 Flash ~$1.50 ~$9.00 1× (baseline)
GPT-5.5 ~$10 ~$30 6.7× input, 3.3× output
Claude Opus 4.7 ~$15 ~$75 10× input, 8.3× output

Run Scenario 1 (customer support chat) through each:

This is the agentic cost gap that drives Google’s marketing line. The flagships return marginally better quality on the hardest tasks; for everyday workloads, Flash is enough at a fraction of the price.

For deeper breakdowns, see GPT-5.5 pricing and our three-way comparison.

Cost vs other Gemini variants

Model Input ($/1M) Output ($/1M) When to use
Gemini 3.1 Flash-Lite ~$0.40 ~$2.00 High-volume routine work
Gemini 3 Flash ~$0.50 ~$3.00 Last-generation, still solid
Gemini 3.1 Pro ~$2.00 ~$12.00 Reasoning-heavy work pre-3.5 Pro
Gemini 3.5 Flash ~$1.50 ~$9.00 New default for most workloads
Gemini 3.5 Pro (June 2026) TBD TBD Hardest reasoning tasks

Flash is more expensive than its 3.x Flash predecessors but credibly cheaper than the previous Pro tier. For most teams, that’s the right trade: better than Flash 3.x, costs less than Pro 3.x.

For the older Gemini line, see 3.1 Flash-Lite, 3.0 API pricing, and 3 Flash.

Vertex AI pricing (production)

If you call Flash through Vertex AI instead of AI Studio, per-token pricing is the same. The differences are billing and account features:

For most production teams, the path is: prototype on AI Studio’s free tier, switch to AI Studio paid for scale, then move to Vertex AI when you need enterprise controls. The model behavior is identical across all three.

Cost optimization tips

Six concrete habits that cut Flash bills the most:

  1. Run batch mode for anything that doesn’t need real-time response. 50% off, no quality loss.
  2. Cache long static prefixes. System prompts, reference docs, instructions, all good candidates.
  3. Use structured JSON output. Forces the model to write less, both faster and cheaper than free-form prose.
  4. Route by task complexity. Easy tasks to Flash-Lite; hard ones to Flash; the rare killer task to 3.5 Pro when it ships.
  5. Pre-validate inputs. Don’t burn tokens on malformed requests. Apidog catches these before they hit the API.
  6. Track per-prompt cost. Add a logging middleware that records input/output tokens per request. Cost overruns almost always come from a few outlier prompts.

For the prompt validation flow, download Apidog, build a test scenario for your Gemini endpoint, and add response-shape assertions. Burning the same broken request 200 times in a debug session is how teams waste their free-tier quotas in a single afternoon.

When the free tier isn’t enough

Three signals to upgrade from free to paid Flash:

  1. You’re hitting 1,500 requests/day multiple days in a row. Pay-as-you-go is cheap enough that the dev time spent dodging quotas costs more than the upgrade.
  2. You need higher RPM throughput. Free tier caps at 15 requests per minute; paid tiers go much higher.
  3. You need data residency or audit logs. Move to Vertex AI on a billed account.

Most teams find $50–200/month in paid Flash usage replaces a lot of free-tier juggling.

Pricing risks and what to watch

Three things that could change the math:

Set up cost alerts on day one. Both AI Studio (in the project’s quotas page) and Vertex AI (in Cloud Billing) support per-day budget caps. Use them.

Bottom line

Gemini 3.5 Flash is cheap enough that most production AI workloads in 2026 should start there. The standard rates ($1.50 / $9 per 1M tokens) undercut every other frontier-class option. Batch mode and context caching push the effective cost even lower.

For the workloads where Flash isn’t enough, the right move is to mix tiers: Flash for the bulk, a flagship like GPT-5.5 or Opus 4.7 for the hardest tasks. Routing by task complexity is the highest-leverage cost optimization you can make.

To put this into practice:

That’s two days of work that usually pays back in a single billing cycle.

button

Explore more

How to Validate Your API Against Its Spec Without Dredd

How to Validate Your API Against Its Spec Without Dredd

Dredd checks your running API against its spec, but needs a hooks file and a loose spec. Here is an alternative that keeps the spec and tests in one npm CLI.

15 June 2026

How to Install the Apidog CLI With an AI Coding Agent

How to Install the Apidog CLI With an AI Coding Agent

Let your AI coding agent install the Apidog CLI for you. Exact prompts for Claude Code, Cursor, and Copilot, the commands they run, and how to verify each step.

15 June 2026

How to Run Automated API Tests in Azure Pipelines (Step-by-Step)

How to Run Automated API Tests in Azure Pipelines (Step-by-Step)

Run automated API tests in Azure Pipelines step by step: design scenarios in Apidog, trigger them with the Apidog CLI, and fail the build on regressions.

15 June 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

Gemini 3.5 Flash Pricing: How Much Does It Actually Cost ?