Google Gemini API Batch Mode is Here and 50% Cheaper

Ashley Innocent

Ashley Innocent

7 July 2025

Google Gemini API Batch Mode is Here and 50% Cheaper

Google's Gemini API now features Batch Mode, a transformative update designed for large-scale, asynchronous tasks that comes with a 50% reduction in cost. πŸš€

So, Let's take a close look at the new Google Gemini API Batch Mode!

πŸ’‘
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Pricing of Gemini API Batch Mode

A primary benefit of the Gemini API Batch Mode is a significant reduction in cost. All jobs submitted through this endpoint are priced at 50% less than the standard rate for the equivalent model used in a synchronous (real-time) call.

This 50% discount applies directly to the per-token pricing structure. Whether you are using gemini-2.5-pro, gemini-2.5-flash, or any other supported model, the cost for both input and output tokens is halved when processed via a batch job. This pricing model makes it financially viable to perform large-scale tasks, such as analyzing terabytes of text data or generating content for an entire product catalog, that might be cost-prohibitive using the standard API. The cost is still calculated based on the number of tokens in your input and the generated output, but the rate per token is what's discounted.

How to Use the Gemini API Batch Mode: A Step-by-Step Guide

The workflow for the Gemini API Batch Mode is designed to be straightforward, involving file preparation, job creation, and result retrieval. The following sections provide a practical guide using the Google GenAI Python SDK.

Step 1: Preparing Your Input File for Gemini API Batch Mode

The Gemini API Batch Mode processes requests from a JSON Lines (JSONL) file. Each line in the file must be a valid JSON object representing a single, self-contained request. The file can be up to 2GB.

Each JSON object in the file must contain two fields:

Example batch_requests.jsonl:

{"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}}
{"key": "request_2", "request": {"contents": [{"parts": [{"text": "Summarize the key benefits of context caching in LLMs."}]}]}}
{"key": "request_3", "request": {"contents": [{"parts": [{"text": "Write a python function to reverse a string."}]}]}}

Step 2: The Programming Workflow for Gemini API Batch Mode

The Python SDK simplifies the process of interacting with the batching endpoint into a few key function calls.

Upload the Input File: First, you must upload your JSONL file to Google's file service. This returns a file object that you will reference when creating the job.

import google.generativeai as genai

# It is recommended to configure your API key as an environment variable
# genai.configure(api_key="YOUR_API_KEY")

uploaded_batch_requests = genai.upload_file(path="batch_requests.jsonl")

Create the Batch Job: With the file uploaded, you can now create the batch job. This call requires specifying the model you wish to use and providing the uploaded file as the source of requests.

batch_job = genai.create_batch_job(
    model="gemini-2.5-flash",  # Or "gemini-2.5-pro", etc.
    requests=uploaded_batch_requests,
    config={
        'display_name': "MyFirstBatchJob-1",
    },
)
print(f"Created batch job: {batch_job.name}")
print(f"Initial state: {batch_job.state.name}")

This function returns immediately, providing the job's name and its initial state, which is typically JOB_STATE_PENDING.

Step 3: Managing and Monitoring Jobs in Gemini API Batch Mode

Since batch jobs are asynchronous, you need to monitor their status. You can retrieve the current state of a job at any time using its name. Jobs are guaranteed to complete within a 24-hour window.

The possible job states are:

Example of checking job status:

# Check the status after some time has passed
retrieved_job = genai.get_batch_job(name=batch_job.name)
print(f"Current job state: {retrieved_job.state.name}")

Step 4: Processing Results from Gemini API Batch Mode

Once the job state is JOB_STATE_SUCCEEDED, the results are available for download as a JSONL file. Each line in the output file corresponds to a request from the input file.

The output JSON object contains the key from the original request and a response object containing the model's output.

  1. Download the Results File:
if retrieved_job.state.name == 'JOB_STATE_SUCCEEDED':
    result_file_metadata = retrieved_job.result_file
    result_file_content_bytes = genai.download_file(name=result_file_metadata.name).read()
    
    # Decode and process the results
    file_content = result_file_content_bytes.decode('utf-8')
    for line in file_content.splitlines():
        print(line)
elif retrieved_job.state.name == 'JOB_STATE_FAILED':
    print(f"Job failed with error: {retrieved_job.error}")

Example Output File Line:

{"key": "request_1", "response": {"candidates": [{"content": {"parts": [{"text": "Artificial intelligence enables machines to learn and reason."}]}}]}}

You can parse this file, using the key to match each response to its original prompt.

Advanced Functionality in Gemini API Batch Mode

The Gemini API Batch Mode also supports more advanced features for optimizing large-scale workflows.

Context Caching with Gemini API Batch Mode

For tasks that involve a large, shared piece of context (e.g., a long document that you want to ask multiple questions about), you can use Context Caching. This feature allows you to cache the shared context, so it is not re-processed with every single request in the batch. This can lead to further significant cost savings and faster processing times by reducing the total number of tokens processed.

Using Built-in Tools with Gemini API Batch Mode

Batch jobs support tool use, including the built-in Google Search functionality. This allows you to perform large-scale tasks that require the model to access and process real-time information from the web. For example, a batch job could be configured to analyze thousands of URLs and summarize their content.

Google has highlighted several organizations already using this functionality:

Conclusion: The Technical Value of Gemini API Batch Mode

The Gemini API Batch Mode provides a technically robust and financially advantageous solution for large-scale, asynchronous AI processing. By offering a 50% cost reduction, a simplified file-based workflow, and support for advanced features like context caching and tool use, it removes the engineering and financial barriers associated with high-throughput AI tasks. It is an essential tool for developers and organizations looking to leverage the full power of Gemini models on massive datasets.

πŸ’‘
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

Top 10 Best API Gateways for Developers in 2025

Top 10 Best API Gateways for Developers in 2025

These are top 10 best API gateway solutions for 2025. Explore top picks, key features, selection tips and Learn how Apidog streamline API design

30 July 2025

Will New Weekly Rate Limits for Claude Pro and Max Hinder Your Coding Workflow?

Will New Weekly Rate Limits for Claude Pro and Max Hinder Your Coding Workflow?

Discover how new weekly rate limits for Claude Pro and Max, starting August 28, 2025, impact your coding workflow. Learn to adapt with words of expert insights.

30 July 2025

What is MCP AI Agent?

What is MCP AI Agent?

Discover everything you need to know about the MCP AI Agent. Learn how this powerful AI tool can transform your workflows, integrate with APIs, and work seamlessly with tools like Apidog.

29 July 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs