How to Use Browser Use Cloud API

This tutorial will walk you through everything you need to know to harness the power of AI-driven browser automation. Whether you're looking to automate data extraction, test your web applications, or create sophisticated monitoring tools, this guide will provide you with the knowledge and examples to get started.

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!

button

What is Browser Use Cloud?

Browser Use Cloud is a powerful platform that allows you to create and manage intelligent browser automation agents programmatically. Think of it as having a fleet of virtual assistants that can browse the web, interact with websites, and perform complex tasks on your behalf.

At the core of the platform is the concept of a "task." A task is a set of instructions you provide to an agent in natural language. For example, you could give an agent a task like, "Go to hacker-news.com, find the top 5 articles, and save their titles and URLs to a file." The agent will then use a large language model (LLM) to understand and execute these instructions in a real browser environment.

One of the most exciting features of Browser Use Cloud is the real-time feedback loop. Every task you create comes with a live_url. This URL provides a live, interactive preview of what the agent is doing. You can watch the agent browse in real-time and even take control if needed. This makes debugging and monitoring incredibly intuitive.

Getting Your API Key

Before you can start creating agents, you'll need an API key. The API key authenticates your requests and links them to your account.

<Note> To get your API key, you'll need an active subscription to Browser Use Cloud. You can manage your subscription and get your API key from the billing page: cloud.browser-use.com/billing. </Note>

Once you have your API key, be sure to keep it secure. Treat it like a password, and never expose it in client-side code or commit it to version control. It's best to store it in a secure environment variable.

export BROWSER_USE_API_KEY="your_api_key_here"

Understanding the Pricing Model

The Browser Use Cloud API has a simple, pay-as-you-go pricing model. This ensures that you only pay for what you use, making it cost-effective for both small and large-scale projects. The pricing is composed of two main parts:

Task Initialization Cost: A flat fee of $0.01 is charged for every task you start. This covers the cost of spinning up the browser environment for your agent.
Task Step Cost: This is the cost for each action or "step" the agent takes. The cost per step depends on the LLM you choose to power your agent.

LLM Step Pricing

Different LLMs have different capabilities and price points. You can choose the model that best suits your needs for performance and cost. Here's a breakdown of the cost per step for each available model:

Model	Cost per Step
GPT-4o	$0.03
GPT-4.1	$0.03
Claude 3.7 Sonnet (2025-02-19)	$0.03
GPT-4o mini	$0.01
GPT-4.1 mini	$0.01
Gemini 2.0 Flash	$0.01
Gemini 2.0 Flash Lite	$0.01
Llama 4 Maverick	$0.01

Cost Calculation Example

Let's imagine you want to automate a task that involves logging into a website, navigating to a specific page, and extracting some data. You estimate this will take about 15 steps. If you choose to use the powerful GPT-4o model, the total cost would be calculated as follows:

Task Initialization: $0.01
Task Steps: 15 steps × $0.03/step = $0.45
Total Cost: $0.01 + $0.45 = $0.46

This transparent pricing allows you to predict and control your costs effectively.

Creating Your First Agent: A "Hello, World!" Example

Now for the exciting part! Let's create your first browser automation agent. We'll start with a very simple task: going to Google and searching for "Browser Use".

We'll use curl to make a POST request to the /api/v1/run-task endpoint. This is the primary endpoint for creating new tasks.

curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task": "Go to google.com and search for Browser Use"
  }'

Let's break down this command:

curl -X POST ...: We're sending an HTTP POST request to the specified URL.
H "Authorization: Bearer $BROWSER_USE_API_KEY": This is the authentication header. It includes your API key. We're using the environment variable we set earlier.
H "Content-Type: application/json": This header tells the API that we're sending data in JSON format.
d '{ "task": "..." }': This is the body of our request. The task field contains the natural language instructions for our agent.

Understanding the API Response

When you send this request, the API will respond with a JSON object containing information about the newly created task. Here's an example of what that response might look like:

{
  "task_id": "ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7",
  "status": "running",
  "live_url": "<https://previews.browser-use.com/ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7>"
}

task_id: This is a unique identifier for your task. You'll use this ID to manage the task later (e.g., to pause, resume, or stop it).
status: This indicates the current state of the task. It will be running initially.
live_url: This is the URL for the live preview. Copy and paste this URL into your browser to see your agent in action!

Interactive Live Previews

The live_url is one of the most powerful features of the Browser Use Cloud. It's not just a read-only video stream; it's a fully interactive session.

You can embed the live_url directly into your own applications using an iframe. This allows you to build custom dashboards and monitoring tools that include a real-time view of your agents.

Here's a simple HTML snippet to embed the live preview:

<!DOCTYPE html>
<html>
<head>
  <title>Agent Live Preview</title>
  <style>
    body, html { margin: 0; padding: 0; height: 100%; overflow: hidden; }
    iframe { width: 100%; height: 100%; border: none; }
  </style>
</head>
<body>
  <iframe src="YOUR_LIVE_URL_HERE"></iframe>
</body>
</html>

Replace YOUR_LIVE_URL_HERE with the live_url from the API response. When you open this HTML file in a browser, you'll see the agent's screen. You can click, type, and scroll just as if you were browsing on your own computer. This is incredibly useful for:

Debugging: If an agent gets stuck, you can immediately see why and what's on its screen.
Manual Intervention: If a task requires a step that's difficult to automate (like solving a complex CAPTCHA), you can take over, complete the step manually, and then let the agent resume its work.
Demonstrations: It's a great way to show stakeholders what your automation is doing.

Managing the Task Lifecycle

Once a task is running, you have full control over its lifecycle. You can pause, resume, and stop tasks using the API. You'll need the task_id for all management operations.

Pausing and Resuming a Task

There are many reasons you might want to pause a task. Maybe you need to inspect the web page manually, or perhaps you want to wait for an external event to occur before continuing.

To pause a task, send a POST request to the /api/v1/pause-task endpoint:

curl -X POST <https://api.browser-use.com/api/v1/pause-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

The agent will finish its current step and then enter a paused state.

To resume the task, send a POST request to the /api/v1/resume-task endpoint:

curl -X POST <https://api.browser-use.com/api/v1/resume-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

The agent will pick up right where it left off.

Stopping a Task

If you want to terminate a task permanently, you can use the /api/v1/stop-task endpoint. This is useful if the task is complete, has gone wrong, or is no longer needed.

curl -X POST <https://api.browser-use.com/api/v1/stop-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

<Note> Once a task is stopped, it cannot be resumed. The browser environment is destroyed, and all associated resources are cleaned up. </Note>

Advanced Task Creation

The "Hello, World!" example was just the beginning. The run-task endpoint supports more than just a simple task string. You can customize your agent's behavior by providing additional parameters.

Choosing an LLM

As we saw in the pricing section, you can choose from several different LLMs to power your agent. You can specify the model in the run-task request using the model parameter.

For example, to use the Claude 3.7 Sonnet model, you would make the following request:

curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task": "Go to reddit.com/r/programming and find the top post of the day.",
    "model": "claude-3.7-sonnet-20250219"
  }'

If you don't specify a model, the API will use a default model, which is typically a cost-effective and performant option like GPT-4o mini.

Building Your Own Client

While curl is great for simple tests, you'll likely want to integrate the Browser Use Cloud API into your applications using a proper client library. The best way to do this is to use our OpenAPI specification to generate a type-safe client.

The OpenAPI spec is a standardized way to describe REST APIs. You can find our spec here: http://api.browser-use.com/openapi.json.

Python Client Generation

For Python developers, we recommend openapi-python-client. It generates a modern, async-first client with full type hints.

First, install the generator tool:

# We recommend using pipx to keep your global environment clean
pipx install openapi-python-client --include-deps

Now, generate the client:

openapi-python-client generate --url <http://api.browser-use.com/openapi.json>

This will create a new directory containing your Python client package. You can install it using pip:

pip install .

Now you can use the client in your Python code:

import asyncio
from browser_use_api import Client
from browser_use_api.models import RunTaskRequest

async def main():
    client = Client(base_url="<https://api.browser-use.com/api/v1>")
    request = RunTaskRequest(task="Go to ycombinator.com and list the top 3 companies.")

    response = await client.run_task.api_v1_run_task_post(
        client=client,
        json_body=request,
        headers={"Authorization": f"Bearer {YOUR_API_KEY}"}
    )

    if response:
        print(f"Task created with ID: {response.task_id}")
        print(f"Live URL: {response.live_url}")

if __name__ == "__main__":
    asyncio.run(main())

TypeScript/JavaScript Client Generation

For the frontend or Node.js projects, openapi-typescript is an excellent tool for generating TypeScript type definitions from the OpenAPI spec.

First, install the generator as a dev dependency:

npm install -D openapi-typescript

Then, run the generator:

npx openapi-typescript <http://api.browser-use.com/openapi.json> -o src/browser-use-api.ts

This will create a single file, src/browser-use-api.ts, containing all the type definitions for the API. You can then use these types with your preferred HTTP client, like fetch or axios, to make type-safe requests.

Here's an example using fetch in a TypeScript project:

import { paths } from './src/browser-use-api';

const API_URL = "<https://api.browser-use.com/api/v1>";

type RunTaskRequest = paths["/run-task"]["post"]["requestBody"]["content"]["application/json"];
type RunTaskResponse = paths["/run-task"]["post"]["responses"]["200"]["content"]["application/json"];

async function createTask(task: string, apiKey: string): Promise<RunTaskResponse> {
  const body: RunTaskRequest = { task };

  const response = await fetch(`${API_URL}/run-task`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${apiKey}`,
    },
    body: JSON.stringify(body),
  });

  if (!response.ok) {
    throw new Error(`API request failed with status ${response.status}`);
  }

  return response.json() as Promise<RunTaskResponse>;
}

async function run() {
  const apiKey = process.env.BROWSER_USE_API_KEY;
  if (!apiKey) {
    throw new Error("API key not found in environment variables.");
  }

  try {
    const result = await createTask("Find the current weather in New York City.", apiKey);
    console.log("Task created:", result);
  } catch (error) {
    console.error("Failed to create task:", error);
  }
}

run();

💡

button