How to Build AI-Powered Browser Automation with Python, Ollama & DeepSeek

button

Modern browser automation is evolving rapidly. Gone are the days of brittle Selenium scripts and fragile workflows. With open-source tools like Browser Use, combined with local LLM hosts such as Ollama and advanced reasoning engines like DeepSeek, developers can now build AI agents that browse the web, interact with forms, extract data, and automate tasks reliably—all powered by natural language instructions.

In this guide, you'll learn how to set up this powerful stack, understand the role of each component, and write a Python-based AI agent that can control your browser programmatically. Whether you're an API developer, backend engineer, or QA specialist, this approach unlocks new possibilities for robust, private, and scalable browser automation.

Why Choose Browser Use, Ollama, and DeepSeek for AI Browser Automation?

Browser Use: A Python package for orchestrating browser actions (navigate, click, extract).
Ollama: A local LLM server, enabling private, high-performance model inference on your hardware.
DeepSeek: An advanced reasoning engine (e.g., deepseek/seed or deepseek-r1) that translates high-level instructions into actionable browser steps.

Together, these tools empower you to build AI agents that can:

Automate web navigation and data extraction
Fill out forms and interact with dynamic pages
Execute multi-step tasks based on natural language prompts

Prerequisites: Setting Up Your Development Environment

Before you dive in, make sure your system meets the following requirements:

Python 3.11+ (python --version)
Ollama (download from ollama.com)
Node.js (node --version, required for browser automation via Playwright)
Git (for cloning repositories)
Hardware: At least 4 CPU cores, 16GB RAM, and 12GB free storage (for DeepSeek). A GPU is optional but recommended for large models.

Tip: Install any missing components to avoid setup issues later.

Step-by-Step Setup: Building Your AI Browser Automation Project

1. Organize Your Project

Create a dedicated folder for your work:

mkdir browser-use-agent
cd browser-use-agent

2. Clone the Browser Use Repository

git clone https://github.com/browser-use/browser-use.git
cd browser-use

3. Create and Activate a Python Virtual Environment

This keeps dependencies isolated:

python -m venv venv
# Activate:
# Mac/Linux:
source venv/bin/activate
# Windows:
venv\Scripts\activate

You'll see (venv) in your terminal, confirming activation.

4. Open Your Project in VS Code

VS Code offers excellent Python integration:

code .

Don’t have VS Code? Download it or use your favorite editor.

Installing Ollama and DeepSeek Locally

1. Install Ollama

Download and install from ollama.com. After installing, confirm it works:

ollama --version

download ollama

2. Download the DeepSeek Model

For high-quality reasoning, use the DeepSeek “seed” model:

ollama pull deepseek/seed

Note: The model is ~12GB. If storage or GPU is limited, try qwen2.5:14b (~4GB).
Verify installation:

ollama list

Look for deepseek-r1 or your chosen model.

pull deepseek model

Installing Browser Use and Required Dependencies

1. Install Browser Use and Development Tools

In your virtual environment, run:

pip install . ."[dev]"

2. Add LangChain and Ollama Integration

pip install langchain langchain-ollama

These packages connect your agent with the local LLM.

3. Install Playwright for Browser Automation

playwright install

If you encounter issues, ensure Python 3.11+ is active, or run:

playwright install-deps

Configuring the Stack: Connect Browser Use to Ollama & DeepSeek

Start the Ollama server in a separate terminal:

ollama serve

This launches the LLM server at http://localhost:11434. Keep this running while you work.

Example: Build an AI Agent to Check Boston Weather on Google

Let's create a Python script that instructs your AI agent to use Google and fetch Boston's weather.

Create test.py in your project folder and add:

import os
import asyncio
from browser_use import Agent
from langchain_ollama import ChatOllama

# Task: Use Google to find the weather in Boston, Massachusetts
async def run_search() -> str:
    agent = Agent(
        task="Use Google to find the weather in Boston, Massachusetts",
        llm=ChatOllama(
            model="deepseek/seed",
            num_ctx=32000,
        ),
        max_actions_per_step=3,
        tool_call_in_content=False,
    )
    result = await agent.run(max_steps=15)
    return result

async def main():
    result = await run_search()
    print("\n\n", result)

if __name__ == "__main__":
    asyncio.run(main())

Ensure VS Code is using your virtual environment’s Python interpreter
- Press Ctrl+P (or Cmd+P on Mac)
- Type > Select Python Interpreter
- Choose the .venv interpreter from your project
Run the script:

python test.py

The agent will launch a browser, search Google for Boston’s weather, and output the result.

browser-use search

If you see an error, confirm that Ollama is running (ollama serve) and port 11434 is open. For troubleshooting, check logs in ~/.ollama/logs.

browser-use search result

Integrating Apidog: Reliable API Testing for Browser AI Agents

When your browser AI agent interacts with web APIs—such as scraping endpoints or automating API-driven workflows—reliable API contract validation becomes essential.

How Apidog helps:

Automated API testing ensures endpoints work as expected
Generates and manages API test cases for your backend
Validates API contracts across staging and production

Apidog integrates smoothly into browser automation pipelines, letting you verify that APIs your agent relies on are robust and consistent.

Start using Apidog for free to strengthen your browser AI workflows.

API Contract Testing with Apidog

API Contract Testing with Apidog

button

Tips for Effective Prompt Engineering

Get more accurate automation by crafting clear, specific prompts:

Be Specific:
"Go to kayak.com, search flights from Zurich to Beijing, 25.12.2025–02.02.2026, sort by price"
is better than
"Find flights."
Break Down Complex Tasks:
e.g., "Visit LinkedIn, search for ML jobs, save links to a file, apply to top 3."
Iterate and Refine:
Adjust your prompts if results aren't as expected. Testing in Open WebUI chat can help.

Debugging and Troubleshooting

Check Ollama Logs:
Located at ~/.ollama/logs, useful for diagnosing model errors.
Monitor Playwright Output:
Playwright logs all actions and errors in your terminal.
Performance:
If DeepSeek models run slowly, consider lighter models or distributed compute setups.
Change Tasks Easily:
Update the task string in your script to automate different workflows (e.g., scraping GitHub stars, automating login flows).

Frequently Asked Questions

Q1. What is Browser Use?
A Python package for AI-driven browser automation using Playwright. GitHub

Q2. Do I need a GPU?
Not required for smaller models like DeepSeek/seed, but GPUs speed up larger models.

Q3. Can I use models besides DeepSeek?
Yes, any reasoning-capable model supported by Ollama can work. GitHub

Q4. Is my data processed locally?
Yes. Running Ollama keeps data and inference on your machine unless configured otherwise. Chrome Web Store

Q5. Can I automate logins and multi-step tasks?
Absolutely—just define your high-level task, and the AI agent will break it down.

Conclusion

With Python, Browser Use, Ollama, and DeepSeek, you can build robust AI agents that automate real browsers using natural language instructions. This stack is ideal for API-driven teams who need reliable, private, and powerful automation—whether for QA, backend integration, or advanced testing.

Add Apidog to your workflow to validate and test the APIs your agents interact with, ensuring your automation always works as intended.

Ready to build intelligent browser agents? Start today and streamline your web automation with confidence.

button