Google Gemini API Batch Mode is Here and 50% Cheaper

Google's Gemini API now features Batch Mode, a transformative update designed for large-scale, asynchronous tasks that comes with a 50% reduction in cost. 🚀

This powerful endpoint allows you to process enormous jobs with your results delivered within 24 hours, all at half the standard API price.
The system is engineered for high-throughput workloads, accommodating up to 2GB JSONL files and leveraging optimizations like Context Caching for greater efficiency.
It also supports built-in tools like Google Search and is managed through a simple API for creating, deleting, and retrieving jobs, making massive AI processing more affordable and straightforward.

So, Let's take a close look at the new Google Gemini API Batch Mode!

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!

button

Pricing of Gemini API Batch Mode

A primary benefit of the Gemini API Batch Mode is a significant reduction in cost. All jobs submitted through this endpoint are priced at 50% less than the standard rate for the equivalent model used in a synchronous (real-time) call.

This 50% discount applies directly to the per-token pricing structure. Whether you are using gemini-2.5-pro, gemini-2.5-flash, or any other supported model, the cost for both input and output tokens is halved when processed via a batch job. This pricing model makes it financially viable to perform large-scale tasks, such as analyzing terabytes of text data or generating content for an entire product catalog, that might be cost-prohibitive using the standard API. The cost is still calculated based on the number of tokens in your input and the generated output, but the rate per token is what's discounted.

How to Use the Gemini API Batch Mode: A Step-by-Step Guide

The workflow for the Gemini API Batch Mode is designed to be straightforward, involving file preparation, job creation, and result retrieval. The following sections provide a practical guide using the Google GenAI Python SDK.

Step 1: Preparing Your Input File for Gemini API Batch Mode

The Gemini API Batch Mode processes requests from a JSON Lines (JSONL) file. Each line in the file must be a valid JSON object representing a single, self-contained request. The file can be up to 2GB.

Each JSON object in the file must contain two fields:

key: A unique string identifier (of your choosing) for each request, which is used to correlate requests with their results.
request: The request payload, which is identical in structure to a request sent to the synchronous Gemini API. It contains a contents field with the model prompt.

Example batch_requests.jsonl:

{"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}}
{"key": "request_2", "request": {"contents": [{"parts": [{"text": "Summarize the key benefits of context caching in LLMs."}]}]}}
{"key": "request_3", "request": {"contents": [{"parts": [{"text": "Write a python function to reverse a string."}]}]}}

Step 2: The Programming Workflow for Gemini API Batch Mode

The Python SDK simplifies the process of interacting with the batching endpoint into a few key function calls.

Upload the Input File: First, you must upload your JSONL file to Google's file service. This returns a file object that you will reference when creating the job.

import google.generativeai as genai

# It is recommended to configure your API key as an environment variable
# genai.configure(api_key="YOUR_API_KEY")

uploaded_batch_requests = genai.upload_file(path="batch_requests.jsonl")

Create the Batch Job: With the file uploaded, you can now create the batch job. This call requires specifying the model you wish to use and providing the uploaded file as the source of requests.

batch_job = genai.create_batch_job(
    model="gemini-2.5-flash",  # Or "gemini-2.5-pro", etc.
    requests=uploaded_batch_requests,
    config={
        'display_name': "MyFirstBatchJob-1",
    },
)
print(f"Created batch job: {batch_job.name}")
print(f"Initial state: {batch_job.state.name}")

This function returns immediately, providing the job's name and its initial state, which is typically JOB_STATE_PENDING.

Step 3: Managing and Monitoring Jobs in Gemini API Batch Mode

Since batch jobs are asynchronous, you need to monitor their status. You can retrieve the current state of a job at any time using its name. Jobs are guaranteed to complete within a 24-hour window.

The possible job states are:

JOB_STATE_UNSPECIFIED: Default state.
JOB_STATE_PENDING: The job has been created and is awaiting processing.
JOB_STATE_RUNNING: The job is actively being processed.
JOB_STATE_SUCCEEDED: The job has completed successfully.
JOB_STATE_FAILED: The job failed. The error field on the job object will contain diagnostic information.
JOB_STATE_CANCELLING: A cancellation request has been received.
JOB_STATE_CANCELLED: The job has been cancelled.

Example of checking job status:

# Check the status after some time has passed
retrieved_job = genai.get_batch_job(name=batch_job.name)
print(f"Current job state: {retrieved_job.state.name}")

Step 4: Processing Results from Gemini API Batch Mode

Once the job state is JOB_STATE_SUCCEEDED, the results are available for download as a JSONL file. Each line in the output file corresponds to a request from the input file.

The output JSON object contains the key from the original request and a response object containing the model's output.

Download the Results File:

if retrieved_job.state.name == 'JOB_STATE_SUCCEEDED':
    result_file_metadata = retrieved_job.result_file
    result_file_content_bytes = genai.download_file(name=result_file_metadata.name).read()
    
    # Decode and process the results
    file_content = result_file_content_bytes.decode('utf-8')
    for line in file_content.splitlines():
        print(line)
elif retrieved_job.state.name == 'JOB_STATE_FAILED':
    print(f"Job failed with error: {retrieved_job.error}")

Example Output File Line:

{"key": "request_1", "response": {"candidates": [{"content": {"parts": [{"text": "Artificial intelligence enables machines to learn and reason."}]}}]}}

You can parse this file, using the key to match each response to its original prompt.

Advanced Functionality in Gemini API Batch Mode

The Gemini API Batch Mode also supports more advanced features for optimizing large-scale workflows.

Context Caching with Gemini API Batch Mode

For tasks that involve a large, shared piece of context (e.g., a long document that you want to ask multiple questions about), you can use Context Caching. This feature allows you to cache the shared context, so it is not re-processed with every single request in the batch. This can lead to further significant cost savings and faster processing times by reducing the total number of tokens processed.

Using Built-in Tools with Gemini API Batch Mode

Batch jobs support tool use, including the built-in Google Search functionality. This allows you to perform large-scale tasks that require the model to access and process real-time information from the web. For example, a batch job could be configured to analyze thousands of URLs and summarize their content.

Google has highlighted several organizations already using this functionality:

Reforged Labs uses Gemini API Batch Mode to analyze and label large volumes of video advertisements, cutting costs and accelerating their workflow.
Vals AI uses the high throughput of Batch Mode to benchmark foundation models with a large number of evaluation queries, bypassing the rate limits of synchronous APIs.

Conclusion: The Technical Value of Gemini API Batch Mode

The Gemini API Batch Mode provides a technically robust and financially advantageous solution for large-scale, asynchronous AI processing. By offering a 50% cost reduction, a simplified file-based workflow, and support for advanced features like context caching and tool use, it removes the engineering and financial barriers associated with high-throughput AI tasks. It is an essential tool for developers and organizations looking to leverage the full power of Gemini models on massive datasets.

💡

button