This tutorial will walk you through everything you need to know to harness the power of AI-driven browser automation. Whether you're looking to automate data extraction, test your web applications, or create sophisticated monitoring tools, this guide will provide you with the knowledge and examples to get started.
Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?
Apidog delivers all your demans, and replaces Postman at a much more affordable price!
What is Browser Use Cloud?
Browser Use Cloud is a powerful platform that allows you to create and manage intelligent browser automation agents programmatically. Think of it as having a fleet of virtual assistants that can browse the web, interact with websites, and perform complex tasks on your behalf.
At the core of the platform is the concept of a "task." A task is a set of instructions you provide to an agent in natural language. For example, you could give an agent a task like, "Go to hacker-news.com, find the top 5 articles, and save their titles and URLs to a file." The agent will then use a large language model (LLM) to understand and execute these instructions in a real browser environment.
One of the most exciting features of Browser Use Cloud is the real-time feedback loop. Every task you create comes with a live_url
. This URL provides a live, interactive preview of what the agent is doing. You can watch the agent browse in real-time and even take control if needed. This makes debugging and monitoring incredibly intuitive.
Getting Your API Key
Before you can start creating agents, you'll need an API key. The API key authenticates your requests and links them to your account.
<Note> To get your API key, you'll need an active subscription to Browser Use Cloud. You can manage your subscription and get your API key from the billing page: cloud.browser-use.com/billing. </Note>
Once you have your API key, be sure to keep it secure. Treat it like a password, and never expose it in client-side code or commit it to version control. It's best to store it in a secure environment variable.
export BROWSER_USE_API_KEY="your_api_key_here"
Understanding the Pricing Model
The Browser Use Cloud API has a simple, pay-as-you-go pricing model. This ensures that you only pay for what you use, making it cost-effective for both small and large-scale projects. The pricing is composed of two main parts:
- Task Initialization Cost: A flat fee of $0.01 is charged for every task you start. This covers the cost of spinning up the browser environment for your agent.
- Task Step Cost: This is the cost for each action or "step" the agent takes. The cost per step depends on the LLM you choose to power your agent.
LLM Step Pricing
Different LLMs have different capabilities and price points. You can choose the model that best suits your needs for performance and cost. Here's a breakdown of the cost per step for each available model:
Model | Cost per Step |
---|---|
GPT-4o | $0.03 |
GPT-4.1 | $0.03 |
Claude 3.7 Sonnet (2025-02-19) | $0.03 |
GPT-4o mini | $0.01 |
GPT-4.1 mini | $0.01 |
Gemini 2.0 Flash | $0.01 |
Gemini 2.0 Flash Lite | $0.01 |
Llama 4 Maverick | $0.01 |
Cost Calculation Example
Let's imagine you want to automate a task that involves logging into a website, navigating to a specific page, and extracting some data. You estimate this will take about 15 steps. If you choose to use the powerful GPT-4o
model, the total cost would be calculated as follows:
- Task Initialization: $0.01
- Task Steps: 15 steps × $0.03/step = $0.45
- Total Cost: $0.01 + $0.45 = $0.46
This transparent pricing allows you to predict and control your costs effectively.
Creating Your First Agent: A "Hello, World!" Example
Now for the exciting part! Let's create your first browser automation agent. We'll start with a very simple task: going to Google and searching for "Browser Use".
We'll use curl
to make a POST
request to the /api/v1/run-task
endpoint. This is the primary endpoint for creating new tasks.
curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
-H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{
"task": "Go to google.com and search for Browser Use"
}'
Let's break down this command:
curl -X POST ...
: We're sending an HTTP POST request to the specified URL.H "Authorization: Bearer $BROWSER_USE_API_KEY"
: This is the authentication header. It includes your API key. We're using the environment variable we set earlier.H "Content-Type: application/json"
: This header tells the API that we're sending data in JSON format.d '{ "task": "..." }'
: This is the body of our request. Thetask
field contains the natural language instructions for our agent.
Understanding the API Response
When you send this request, the API will respond with a JSON object containing information about the newly created task. Here's an example of what that response might look like:
{
"task_id": "ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7",
"status": "running",
"live_url": "<https://previews.browser-use.com/ts_2a9b4e7c-1d0f-4g8h-9i1j-k2l3m4n5o6p7>"
}
task_id
: This is a unique identifier for your task. You'll use this ID to manage the task later (e.g., to pause, resume, or stop it).status
: This indicates the current state of the task. It will berunning
initially.live_url
: This is the URL for the live preview. Copy and paste this URL into your browser to see your agent in action!
Interactive Live Previews
The live_url
is one of the most powerful features of the Browser Use Cloud. It's not just a read-only video stream; it's a fully interactive session.
You can embed the live_url
directly into your own applications using an iframe
. This allows you to build custom dashboards and monitoring tools that include a real-time view of your agents.
Here's a simple HTML snippet to embed the live preview:
<!DOCTYPE html>
<html>
<head>
<title>Agent Live Preview</title>
<style>
body, html { margin: 0; padding: 0; height: 100%; overflow: hidden; }
iframe { width: 100%; height: 100%; border: none; }
</style>
</head>
<body>
<iframe src="YOUR_LIVE_URL_HERE"></iframe>
</body>
</html>
Replace YOUR_LIVE_URL_HERE
with the live_url
from the API response. When you open this HTML file in a browser, you'll see the agent's screen. You can click, type, and scroll just as if you were browsing on your own computer. This is incredibly useful for:
- Debugging: If an agent gets stuck, you can immediately see why and what's on its screen.
- Manual Intervention: If a task requires a step that's difficult to automate (like solving a complex CAPTCHA), you can take over, complete the step manually, and then let the agent resume its work.
- Demonstrations: It's a great way to show stakeholders what your automation is doing.
Managing the Task Lifecycle
Once a task is running, you have full control over its lifecycle. You can pause, resume, and stop tasks using the API. You'll need the task_id
for all management operations.
Pausing and Resuming a Task
There are many reasons you might want to pause a task. Maybe you need to inspect the web page manually, or perhaps you want to wait for an external event to occur before continuing.
To pause a task, send a POST
request to the /api/v1/pause-task
endpoint:
curl -X POST <https://api.browser-use.com/api/v1/pause-task> \\\\
-H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{
"task_id": "YOUR_TASK_ID_HERE"
}'
The agent will finish its current step and then enter a paused
state.
To resume the task, send a POST
request to the /api/v1/resume-task
endpoint:
curl -X POST <https://api.browser-use.com/api/v1/resume-task> \\\\
-H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{
"task_id": "YOUR_TASK_ID_HERE"
}'
The agent will pick up right where it left off.
Stopping a Task
If you want to terminate a task permanently, you can use the /api/v1/stop-task
endpoint. This is useful if the task is complete, has gone wrong, or is no longer needed.
curl -X POST <https://api.browser-use.com/api/v1/stop-task> \\\\
-H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{
"task_id": "YOUR_TASK_ID_HERE"
}'
<Note> Once a task is stopped, it cannot be resumed. The browser environment is destroyed, and all associated resources are cleaned up. </Note>
Advanced Task Creation
The "Hello, World!" example was just the beginning. The run-task
endpoint supports more than just a simple task
string. You can customize your agent's behavior by providing additional parameters.
Choosing an LLM
As we saw in the pricing section, you can choose from several different LLMs to power your agent. You can specify the model in the run-task
request using the model
parameter.
For example, to use the Claude 3.7 Sonnet
model, you would make the following request:
curl -X POST <https://api.browser-use.com/api/v1/run-task> \\\\
-H "Authorization: Bearer $BROWSER_USE_API_KEY" \\\\
-H "Content-Type: application/json" \\\\
-d '{
"task": "Go to reddit.com/r/programming and find the top post of the day.",
"model": "claude-3.7-sonnet-20250219"
}'
If you don't specify a model, the API will use a default model, which is typically a cost-effective and performant option like GPT-4o mini
.
Building Your Own Client
While curl
is great for simple tests, you'll likely want to integrate the Browser Use Cloud API into your applications using a proper client library. The best way to do this is to use our OpenAPI specification to generate a type-safe client.
The OpenAPI spec is a standardized way to describe REST APIs. You can find our spec here: http://api.browser-use.com/openapi.json.
Python Client Generation
For Python developers, we recommend openapi-python-client. It generates a modern, async-first client with full type hints.
First, install the generator tool:
# We recommend using pipx to keep your global environment clean
pipx install openapi-python-client --include-deps
Now, generate the client:
openapi-python-client generate --url <http://api.browser-use.com/openapi.json>
This will create a new directory containing your Python client package. You can install it using pip
:
pip install .
Now you can use the client in your Python code:
import asyncio
from browser_use_api import Client
from browser_use_api.models import RunTaskRequest
async def main():
client = Client(base_url="<https://api.browser-use.com/api/v1>")
request = RunTaskRequest(task="Go to ycombinator.com and list the top 3 companies.")
response = await client.run_task.api_v1_run_task_post(
client=client,
json_body=request,
headers={"Authorization": f"Bearer {YOUR_API_KEY}"}
)
if response:
print(f"Task created with ID: {response.task_id}")
print(f"Live URL: {response.live_url}")
if __name__ == "__main__":
asyncio.run(main())
TypeScript/JavaScript Client Generation
For the frontend or Node.js projects, openapi-typescript is an excellent tool for generating TypeScript type definitions from the OpenAPI spec.
First, install the generator as a dev dependency:
npm install -D openapi-typescript
Then, run the generator:
npx openapi-typescript <http://api.browser-use.com/openapi.json> -o src/browser-use-api.ts
This will create a single file, src/browser-use-api.ts
, containing all the type definitions for the API. You can then use these types with your preferred HTTP client, like fetch
or axios
, to make type-safe requests.
Here's an example using fetch
in a TypeScript project:
import { paths } from './src/browser-use-api';
const API_URL = "<https://api.browser-use.com/api/v1>";
type RunTaskRequest = paths["/run-task"]["post"]["requestBody"]["content"]["application/json"];
type RunTaskResponse = paths["/run-task"]["post"]["responses"]["200"]["content"]["application/json"];
async function createTask(task: string, apiKey: string): Promise<RunTaskResponse> {
const body: RunTaskRequest = { task };
const response = await fetch(`${API_URL}/run-task`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`,
},
body: JSON.stringify(body),
});
if (!response.ok) {
throw new Error(`API request failed with status ${response.status}`);
}
return response.json() as Promise<RunTaskResponse>;
}
async function run() {
const apiKey = process.env.BROWSER_USE_API_KEY;
if (!apiKey) {
throw new Error("API key not found in environment variables.");
}
try {
const result = await createTask("Find the current weather in New York City.", apiKey);
console.log("Task created:", result);
} catch (error) {
console.error("Failed to create task:", error);
}
}
run();
Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?
Apidog delivers all your demans, and replaces Postman at a much more affordable price!