Run Qwen 3 Locally: Power Agentic Tasks with Ollama & MCP

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Ready to build powerful AI agents that can reason, automate, and interact with real tools—right on your own hardware? This hands-on guide shows API developers and engineers how to run Alibaba’s Qwen 3 (30B-A3B) LLM locally using Ollama, integrate it with Model Context Protocol (MCP), and create agents that perform real-world tasks like reading PDFs and querying live data. Whether you’re optimizing workflows or building robust automation, you’ll see how Qwen 3’s tool-calling and reasoning can transform your development process.

💡 Designing or testing APIs in your projects? Try Apidog for seamless API design, testing, and documentation—ideal for teams working with LLM-powered agents and tool integrations.

button

What is Qwen 3? Why Devs Love It for MCP and Agentic Workflows

Qwen 3 is Alibaba’s open large language model series, purpose-built for efficiency and advanced agentic tasks. The 30B-A3B "Mixture-of-Experts" variant stands out for:

Efficient Inference: 30B parameters total, but only 3B active per request—so you get great performance on a single RTX 3090/4090.
MCP & Tool Support: Seamless tool-calling through JSON interfaces for file, database, or web operations.
Hybrid Reasoning: Special “... blocks” enable multi-step thinking for complex logic, coding, and automation.

The developer community (e.g., r/LocalLLama) reports Qwen 3 excels at tool-calling accuracy—handling real tasks like file I/O and database queries with speed and reliability. In practical tests, Qwen 3 can summarize PDFs, automate file operations, and fetch real-time data on demand.

Step 1: Setting Up Qwen 3 with Ollama (Local Deployment Guide)

Before leveraging Qwen 3’s agentic capabilities, you’ll need to set up a local environment. Here’s how to do it, step by step.

System Requirements

OS: macOS, Linux (Ubuntu 20.04+), or Windows (via WSL2)
Hardware:
- For 30B: 16GB+ RAM, 24GB+ VRAM GPU (RTX 3090/4090), 20GB+ storage
- For smaller models: 4GB+ VRAM, 8GB+ RAM (models: 0.6B, 1.7B, 8B)
Software:
- Python 3.10+ (python3 --version)
- Git (git --version)
- Ollama

1. Install Ollama

Go to the official Ollama website and download the installer for your OS.

Or install via terminal:

curl -fsSL https://ollama.com/install.sh | sh

Check installation:

ollama --version

You should see something like 0.3.12 or newer. If not, ensure Ollama is in your PATH.

2. Download a Qwen 3 Model

For maximum capability (desktop with a strong GPU):

ollama pull qwen3:30b

This is an 18GB download—ensure you have space and time.

For lighter hardware, try:

ollama pull qwen3:0.6b  # (~0.4GB)
ollama pull qwen3:1.7b  # (~1GB)
ollama pull qwen3:8b    # (~5GB)

qwen 3 models

List available models:

ollama list

Look for your selected Qwen model (e.g., qwen3:30b, qwen3:8b).

3. Test the Model

Run the model:

ollama run qwen3:30b

Or for smaller models:

ollama run qwen3:0.6b

At the prompt (>>>), try:

Tell me a joke about computers.

Qwen 3 should respond quickly. Exit with /bye.

test qwen 3

Need more help with Ollama? See this beginner-friendly tutorial for step-by-step setup.

Step 2: Build a Qwen 3 Agent with MCP & Tool-Calling

Now, let’s create a real agent that reads a PDF and answers questions—using Qwen 3, MCP, and the Qwen-Agent GitHub repo.

1. Create a Project Folder

mkdir qwen-agent-test
cd qwen-agent-test

2. Set Up a Python Virtual Environment

python3 -m venv venv
source venv/bin/activate    # macOS/Linux
venv\Scripts\activate       # Windows

3. Install Qwen-Agent with MCP & Tools

pip install -U "qwen-agent[gui,rag,code_interpreter,mcp]"

4. Configure Your Agent Script

Create testagent.py. Copy example code from Qwen-Agent’s GitHub, but update the LLM config for Ollama:

llm_cfg = {
    'model': 'qwen3:0.6b',  # or your chosen model
    'model_server': 'http://localhost:11434/v1',
    'api_key': 'ollama',
    'generate_cfg': {'top_p': 0.8}
}

Save a PDF (e.g., a research paper or recipe) as AI-paper.pdf in your project directory.

5. Start the Ollama API Server

In a new terminal:

ollama serve

This launches the API at http://localhost:11434.

6. Run the Agent

In your project folder:

python testagent.py

run testagent.py

Ask questions about your PDF—Qwen 3 will summarize or extract key info. For a technical paper, you might see:
"The paper discusses CNN-based vision systems for real-time object recognition in robotics, with 95% accuracy."

testagent.py output

7. Test MCP Functions (Time, Weather, More)

Enhance testagent.py by configuring MCP servers:

tools = [
    'my_image_gen', 
    'code_interpreter',
    {
        'mcpServers': {
            'time': {
                'type': 'python',
                'module': 'mcp.server.time',
                'port': 8080
            },
            'fetch': {
                'type': 'python',
                'module': 'mcp.server.fetch',
                'port': 8081
            }
        }
    }
]
files = ['./AI-paper.pdf']

Ask:

“What is the time in New York?”
“What’s the weather in Sydney?”

Qwen 3 will select the right MCP server to fetch live data or system info. For details and more tool examples, see the Qwen-Agent GitHub repo.

Advanced Tips: Maximizing Qwen 3 for Agentic Automation

Expand Tools: Add custom tools for APIs, databases, or web scraping via Qwen-Agent modules.
Reasoning Modes: Use /think in prompts for multi-step planning, or /no_think for quick answers.
Model Selection: For laptops, qwen3:8b offers a great balance of speed and capability.
Performance Tuning: For even faster inference, explore quantization methods like Unsloth’s Q8_XL.

In practical tests, Qwen 3 (even on 8B) handled recipe PDFs and file operations smoothly—ideal for local agents that interact with your real data and systems.

Conclusion: Accelerate Your Agentic Projects with Qwen 3, MCP, and Apidog

You’ve now seen how to run Qwen 3 locally, configure MCP and tool-calling, and build agents that can read documents and fetch real-time data. This workflow unlocks robust automation for API projects, coding tasks, and more—without cloud dependencies.

For API-focused teams, Apidog streamlines API design, testing, and documentation—making it easy to integrate, test, and document the endpoints your AI agents will use.

💡 Need beautiful API documentation, a collaborative platform for your developer team (maximum productivity), or a more affordable alternative to Postman? Apidog fits seamlessly into your LLM-agent pipelines.

button

In this article

What is Qwen 3? Why Devs Love It for MCP and Agentic Workflows Step 1: Setting Up Qwen 3 with Ollama (Local Deployment Guide)System Requirements 1. Install Ollama 2. Download a Qwen 3 Model 3. Test the Model Step 2: Build a Qwen 3 Agent with MCP & Tool-Calling 1. Create a Project Folder 2. Set Up a Python Virtual Environment 3. Install Qwen-Agent with MCP & Tools 4. Configure Your Agent Script 5. Start the Ollama API Server 6. Run the Agent 7. Test MCP Functions (Time, Weather, More)Advanced Tips: Maximizing Qwen 3 for Agentic Automation Conclusion: Accelerate Your Agentic Projects with Qwen 3, MCP, and Apidog

Apidog: A Real Design-first API Development Platform

API Design

API Documentation

API Debugging

Automated Testing

API Mocking

More

Get started for free

Enterprise

On-Premises or SaaS or EU-hosted

SSO, RBAC & audit logs

SOC 2, GDPR, ISO 27001

Contact Sales

Explore more

What is CubeSandbox for AI Agents? Isolation Explained

What is CubeSandbox for AI agents? A clear look at Tencent's open-source KVM sandbox, why agents need isolation, and how it compares to E2B.

26 May 2026

DeepSeek V4-Pro 75% Price Cut Is Now Permanent: What It Means for Developers (2026)

DeepSeek V4-Pro pricing is now permanently 75% off: $0.435 input, $0.87 output, $0.003625 cache hit per 1M tokens. What it means for developers in 2026.

25 May 2026

What is an Agent2Agent (A2A) Debugger? And Why You Need One

An A2A debugger connects to an Agent2Agent agent, sends test messages, and shows the full request and response so you can debug agent integrations fast.

22 May 2026