Google Gemini API Batch Mode is Here and 50% Cheaper

Ashley Innocent

Ashley Innocent

7 July 2025

Google Gemini API Batch Mode is Here and 50% Cheaper

Google's Gemini API now features Batch Mode, a transformative update designed for large-scale, asynchronous tasks that comes with a 50% reduction in cost. πŸš€

So, Let's take a close look at the new Google Gemini API Batch Mode!

πŸ’‘
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Pricing of Gemini API Batch Mode

A primary benefit of the Gemini API Batch Mode is a significant reduction in cost. All jobs submitted through this endpoint are priced at 50% less than the standard rate for the equivalent model used in a synchronous (real-time) call.

This 50% discount applies directly to the per-token pricing structure. Whether you are using gemini-2.5-pro, gemini-2.5-flash, or any other supported model, the cost for both input and output tokens is halved when processed via a batch job. This pricing model makes it financially viable to perform large-scale tasks, such as analyzing terabytes of text data or generating content for an entire product catalog, that might be cost-prohibitive using the standard API. The cost is still calculated based on the number of tokens in your input and the generated output, but the rate per token is what's discounted.

How to Use the Gemini API Batch Mode: A Step-by-Step Guide

The workflow for the Gemini API Batch Mode is designed to be straightforward, involving file preparation, job creation, and result retrieval. The following sections provide a practical guide using the Google GenAI Python SDK.

Step 1: Preparing Your Input File for Gemini API Batch Mode

The Gemini API Batch Mode processes requests from a JSON Lines (JSONL) file. Each line in the file must be a valid JSON object representing a single, self-contained request. The file can be up to 2GB.

Each JSON object in the file must contain two fields:

Example batch_requests.jsonl:

{"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}}
{"key": "request_2", "request": {"contents": [{"parts": [{"text": "Summarize the key benefits of context caching in LLMs."}]}]}}
{"key": "request_3", "request": {"contents": [{"parts": [{"text": "Write a python function to reverse a string."}]}]}}

Step 2: The Programming Workflow for Gemini API Batch Mode

The Python SDK simplifies the process of interacting with the batching endpoint into a few key function calls.

Upload the Input File: First, you must upload your JSONL file to Google's file service. This returns a file object that you will reference when creating the job.

import google.generativeai as genai

# It is recommended to configure your API key as an environment variable
# genai.configure(api_key="YOUR_API_KEY")

uploaded_batch_requests = genai.upload_file(path="batch_requests.jsonl")

Create the Batch Job: With the file uploaded, you can now create the batch job. This call requires specifying the model you wish to use and providing the uploaded file as the source of requests.

batch_job = genai.create_batch_job(
    model="gemini-2.5-flash",  # Or "gemini-2.5-pro", etc.
    requests=uploaded_batch_requests,
    config={
        'display_name': "MyFirstBatchJob-1",
    },
)
print(f"Created batch job: {batch_job.name}")
print(f"Initial state: {batch_job.state.name}")

This function returns immediately, providing the job's name and its initial state, which is typically JOB_STATE_PENDING.

Step 3: Managing and Monitoring Jobs in Gemini API Batch Mode

Since batch jobs are asynchronous, you need to monitor their status. You can retrieve the current state of a job at any time using its name. Jobs are guaranteed to complete within a 24-hour window.

The possible job states are:

Example of checking job status:

# Check the status after some time has passed
retrieved_job = genai.get_batch_job(name=batch_job.name)
print(f"Current job state: {retrieved_job.state.name}")

Step 4: Processing Results from Gemini API Batch Mode

Once the job state is JOB_STATE_SUCCEEDED, the results are available for download as a JSONL file. Each line in the output file corresponds to a request from the input file.

The output JSON object contains the key from the original request and a response object containing the model's output.

  1. Download the Results File:
if retrieved_job.state.name == 'JOB_STATE_SUCCEEDED':
    result_file_metadata = retrieved_job.result_file
    result_file_content_bytes = genai.download_file(name=result_file_metadata.name).read()
    
    # Decode and process the results
    file_content = result_file_content_bytes.decode('utf-8')
    for line in file_content.splitlines():
        print(line)
elif retrieved_job.state.name == 'JOB_STATE_FAILED':
    print(f"Job failed with error: {retrieved_job.error}")

Example Output File Line:

{"key": "request_1", "response": {"candidates": [{"content": {"parts": [{"text": "Artificial intelligence enables machines to learn and reason."}]}}]}}

You can parse this file, using the key to match each response to its original prompt.

Advanced Functionality in Gemini API Batch Mode

The Gemini API Batch Mode also supports more advanced features for optimizing large-scale workflows.

Context Caching with Gemini API Batch Mode

For tasks that involve a large, shared piece of context (e.g., a long document that you want to ask multiple questions about), you can use Context Caching. This feature allows you to cache the shared context, so it is not re-processed with every single request in the batch. This can lead to further significant cost savings and faster processing times by reducing the total number of tokens processed.

Using Built-in Tools with Gemini API Batch Mode

Batch jobs support tool use, including the built-in Google Search functionality. This allows you to perform large-scale tasks that require the model to access and process real-time information from the web. For example, a batch job could be configured to analyze thousands of URLs and summarize their content.

Google has highlighted several organizations already using this functionality:

Conclusion: The Technical Value of Gemini API Batch Mode

The Gemini API Batch Mode provides a technically robust and financially advantageous solution for large-scale, asynchronous AI processing. By offering a 50% cost reduction, a simplified file-based workflow, and support for advanced features like context caching and tool use, it removes the engineering and financial barriers associated with high-throughput AI tasks. It is an essential tool for developers and organizations looking to leverage the full power of Gemini models on massive datasets.

πŸ’‘
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

A Beginner's Guide for Google MCP Toolbox for Databases

A Beginner's Guide for Google MCP Toolbox for Databases

An Introduction to the MCP Toolbox The MCP Toolbox is a free tool from Google that helps your AI applications talk to your databases. Think of it as a special translator. Your AI can ask for information in a simple way, and the MCP Toolbox translates that request into the language your database understands, like SQL. It uses something called the Model Context Protocol (MCP), which is just a standard set of rules for this kind of communication. πŸ’‘Want a great API Testing tool that generates bea

7 July 2025

Top 10 Documentation Site Generator for Developers in 2025

Top 10 Documentation Site Generator for Developers in 2025

In the world of software development, clear, concise, and accessible documentation is no longer a mere accessory; it is the bedrock of a successful project. Well-crafted documentation empowers developers to understand and effectively utilize APIs, frameworks, and libraries, fostering a vibrant and collaborative ecosystem. As we venture further into 2025, the tools available to generate these crucial resources have evolved, becoming more intelligent, collaborative, and deeply integrated into the

7 July 2025

Why Should Developers Choose ntfy.sh for Push Notifications?

Why Should Developers Choose ntfy.sh for Push Notifications?

Learn how ntfy.sh revolutionizes push notifications with its open source, HTTP-based API. Discover implementation strategies, security best practices, and integration with Apidog for comprehensive testing.

4 July 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs