How to Use Browser Use Cloud API

A comprehensive tutorial on using the Browser Use Cloud API to create and manage browser automation agents.

Mark Ponomarev

Mark Ponomarev

10 June 2025

How to Use Browser Use Cloud API

This tutorial will walk you through everything you need to know to harness the power of AI-driven browser automation. Whether you're looking to automate data extraction, test your web applications, or create sophisticated monitoring tools, this guide will provide you with the knowledge and examples to get started.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

What is Browser Use Cloud?

Browser Use Cloud is a powerful platform that allows you to create and manage intelligent browser automation agents programmatically. Think of it as having a fleet of virtual assistants that can browse the web, interact with websites, and perform complex tasks on your behalf.

At the core of the platform is the concept of a "task." A task is a set of instructions you provide to an agent in natural language. For example, you could give an agent a task like, "Go to hacker-news.com, find the top 5 articles, and save their titles and URLs to a file." The agent will then use a large language model (LLM) to understand and execute these instructions in a real browser environment.

One of the most exciting features of Browser Use Cloud is the real-time feedback loop. Every task you create comes with a live_url. This URL provides a live, interactive preview of what the agent is doing. You can watch the agent browse in real-time and even take control if needed. This makes debugging and monitoring incredibly intuitive.

Getting Your API Key

Before you can start creating agents, you'll need an API key. The API key authenticates your requests and links them to your account.

<Note> To get your API key, you'll need an active subscription to Browser Use Cloud. You can manage your subscription and get your API key from the billing page: cloud.browser-use.com/billing. </Note>

Once you have your API key, be sure to keep it secure. Treat it like a password, and never expose it in client-side code or commit it to version control. It's best to store it in a secure environment variable.

export BROWSER_USE_API_KEY="your_api_key_here"

Understanding the Pricing Model

The Browser Use Cloud API has a simple, pay-as-you-go pricing model. This ensures that you only pay for what you use, making it cost-effective for both small and large-scale projects. The pricing is composed of two main parts:

  1. Task Initialization Cost: A flat fee of $0.01 is charged for every task you start. This covers the cost of spinning up the browser environment for your agent.
  2. Task Step Cost: This is the cost for each action or "step" the agent takes. The cost per step depends on the LLM you choose to power your agent.

LLM Step Pricing

Different LLMs have different capabilities and price points. You can choose the model that best suits your needs for performance and cost. Here's a breakdown of the cost per step for each available model:

Model Cost per Step
GPT-4o $0.03
GPT-4.1 $0.03
Claude 3.7 Sonnet (2025-02-19) $0.03
GPT-4o mini $0.01
GPT-4.1 mini $0.01
Gemini 2.0 Flash $0.01
Gemini 2.0 Flash Lite $0.01
Llama 4 Maverick $0.01

Cost Calculation Example

Let's imagine you want to automate a task that involves logging into a website, navigating to a specific page, and extracting some data. You estimate this will take about 15 steps. If you choose to use the powerful GPT-4o model, the total cost would be calculated as follows:

This transparent pricing allows you to predict and control your costs effectively.

Creating Your First Agent: A "Hello, World!" Example

Now for the exciting part! Let's create your first browser automation agent. We'll start with a very simple task: going to Google and searching for "Browser Use".

We'll use curl to make a POST request to the /api/v1/run-task endpoint. This is the primary endpoint for creating new tasks.

curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task": "Go to google.com and search for Browser Use"
  }'

Let's break down this command:

Understanding the API Response

When you send this request, the API will respond with a JSON object containing information about the newly created task. Here's an example of what that response might look like:

{
  "task_id": "ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7",
  "status": "running",
  "live_url": "<https://previews.browser-use.com/ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7>"
}

Interactive Live Previews

The live_url is one of the most powerful features of the Browser Use Cloud. It's not just a read-only video stream; it's a fully interactive session.

You can embed the live_url directly into your own applications using an iframe. This allows you to build custom dashboards and monitoring tools that include a real-time view of your agents.

Here's a simple HTML snippet to embed the live preview:

<!DOCTYPE html>
<html>
<head>
  <title>Agent Live Preview</title>
  <style>
    body, html { margin: 0; padding: 0; height: 100%; overflow: hidden; }
    iframe { width: 100%; height: 100%; border: none; }
  </style>
</head>
<body>
  <iframe src="YOUR_LIVE_URL_HERE"></iframe>
</body>
</html>

Replace YOUR_LIVE_URL_HERE with the live_url from the API response. When you open this HTML file in a browser, you'll see the agent's screen. You can click, type, and scroll just as if you were browsing on your own computer. This is incredibly useful for:

Managing the Task Lifecycle

Once a task is running, you have full control over its lifecycle. You can pause, resume, and stop tasks using the API. You'll need the task_id for all management operations.

Pausing and Resuming a Task

There are many reasons you might want to pause a task. Maybe you need to inspect the web page manually, or perhaps you want to wait for an external event to occur before continuing.

To pause a task, send a POST request to the /api/v1/pause-task endpoint:

curl -X POST <https://api.browser-use.com/api/v1/pause-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

The agent will finish its current step and then enter a paused state.

To resume the task, send a POST request to the /api/v1/resume-task endpoint:

curl -X POST <https://api.browser-use.com/api/v1/resume-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

The agent will pick up right where it left off.

Stopping a Task

If you want to terminate a task permanently, you can use the /api/v1/stop-task endpoint. This is useful if the task is complete, has gone wrong, or is no longer needed.

curl -X POST <https://api.browser-use.com/api/v1/stop-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task_id": "YOUR_TASK_ID_HERE"
  }'

<Note> Once a task is stopped, it cannot be resumed. The browser environment is destroyed, and all associated resources are cleaned up. </Note>

Advanced Task Creation

The "Hello, World!" example was just the beginning. The run-task endpoint supports more than just a simple task string. You can customize your agent's behavior by providing additional parameters.

Choosing an LLM

As we saw in the pricing section, you can choose from several different LLMs to power your agent. You can specify the model in the run-task request using the model parameter.

For example, to use the Claude 3.7 Sonnet model, you would make the following request:

curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
  -H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
  -H "Content-Type: application/json" \\\\
  -d '{
    "task": "Go to reddit.com/r/programming and find the top post of the day.",
    "model": "claude-3.7-sonnet-20250219"
  }'

If you don't specify a model, the API will use a default model, which is typically a cost-effective and performant option like GPT-4o mini.

Building Your Own Client

While curl is great for simple tests, you'll likely want to integrate the Browser Use Cloud API into your applications using a proper client library. The best way to do this is to use our OpenAPI specification to generate a type-safe client.

The OpenAPI spec is a standardized way to describe REST APIs. You can find our spec here: http://api.browser-use.com/openapi.json.

Python Client Generation

For Python developers, we recommend openapi-python-client. It generates a modern, async-first client with full type hints.

First, install the generator tool:

# We recommend using pipx to keep your global environment clean
pipx install openapi-python-client --include-deps

Now, generate the client:

openapi-python-client generate --url <http://api.browser-use.com/openapi.json>

This will create a new directory containing your Python client package. You can install it using pip:

pip install .

Now you can use the client in your Python code:

import asyncio
from browser_use_api import Client
from browser_use_api.models import RunTaskRequest

async def main():
    client = Client(base_url="<https://api.browser-use.com/api/v1>")
    request = RunTaskRequest(task="Go to ycombinator.com and list the top 3 companies.")

    response = await client.run_task.api_v1_run_task_post(
        client=client,
        json_body=request,
        headers={"Authorization": f"Bearer {YOUR_API_KEY}"}
    )

    if response:
        print(f"Task created with ID: {response.task_id}")
        print(f"Live URL: {response.live_url}")

if __name__ == "__main__":
    asyncio.run(main())

TypeScript/JavaScript Client Generation

For the frontend or Node.js projects, openapi-typescript is an excellent tool for generating TypeScript type definitions from the OpenAPI spec.

First, install the generator as a dev dependency:

npm install -D openapi-typescript

Then, run the generator:

npx openapi-typescript <http://api.browser-use.com/openapi.json> -o src/browser-use-api.ts

This will create a single file, src/browser-use-api.ts, containing all the type definitions for the API. You can then use these types with your preferred HTTP client, like fetch or axios, to make type-safe requests.

Here's an example using fetch in a TypeScript project:

import { paths } from './src/browser-use-api';

const API_URL = "<https://api.browser-use.com/api/v1>";

type RunTaskRequest = paths["/run-task"]["post"]["requestBody"]["content"]["application/json"];
type RunTaskResponse = paths["/run-task"]["post"]["responses"]["200"]["content"]["application/json"];

async function createTask(task: string, apiKey: string): Promise<RunTaskResponse> {
  const body: RunTaskRequest = { task };

  const response = await fetch(`${API_URL}/run-task`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${apiKey}`,
    },
    body: JSON.stringify(body),
  });

  if (!response.ok) {
    throw new Error(`API request failed with status ${response.status}`);
  }

  return response.json() as Promise<RunTaskResponse>;
}

async function run() {
  const apiKey = process.env.BROWSER_USE_API_KEY;
  if (!apiKey) {
    throw new Error("API key not found in environment variables.");
  }

  try {
    const result = await createTask("Find the current weather in New York City.", apiKey);
    console.log("Task created:", result);
  } catch (error) {
    console.error("Failed to create task:", error);
  }
}

run();

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Explore more

Voxtral: Mistral AI's Open Source Whisper Alternative

Voxtral: Mistral AI's Open Source Whisper Alternative

For the past few years, OpenAI's Whisper has reigned as the undisputed champion of open-source speech recognition. It offered a level of accuracy that democratized automatic speech recognition (ASR) for developers, researchers, and hobbyists worldwide. It was a monumental leap forward, but the community has been eagerly awaiting the next step—a model that goes beyond mere transcription into the realm of true understanding. That wait is now over. Mistral AI has entered the ring with Voxtral, a ne

15 July 2025

How to build, deploy and host MCP servers on Netlify

How to build, deploy and host MCP servers on Netlify

Build and deploy MCP servers on Netlify to connect AI agents with your platform. This guide covers setup, deployment, and testing with a sample prompt, making AI workflows a breeze with Netlify’s serverless power.

15 July 2025

How to Use Kimi K2 in Cursor

How to Use Kimi K2 in Cursor

Learn how to use Kimi K2 in Cursor, why developers are demanding this integration, and how Apidog MCP Server lets you connect, document, and automate your API workflows with Kimi K2.

15 July 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs