How to Use LiteLLM with Ollama for Local LLMs: A Practical Guide for API Developers

Learn how to use LiteLLM with Ollama to run large language models locally through a unified API—ideal for API developers and engineering teams needing low latency, privacy, and flexibility. This practical guide covers setup, streaming, configuration, troubleshooting, and how to scale.

Maurice Odida

Maurice Odida

31 January 2026

How to Use LiteLLM with Ollama for Local LLMs: A Practical Guide for API Developers

Running large language models (LLMs) locally is increasingly popular among API developers and backend engineers seeking to balance performance, privacy, and flexibility. Ollama makes it easy to run open-source LLMs like Llama 3, Mistral, and Phi-3 directly on your own machine. But integrating these models into your workflow, especially alongside cloud LLMs (OpenAI, Anthropic, etc.), can be cumbersome due to differing APIs.

LiteLLM solves this by providing a unified Python interface for over 100 LLM providers—including Ollama, OpenAI, Cohere, and more. With LiteLLM, you can write your application logic once and seamlessly switch between local and cloud models, simplifying everything from prototyping to production.

Looking for a powerful API testing tool with beautiful documentation and team collaboration features? Apidog streamlines your API development lifecycle and delivers a modern alternative to Postman—at a more affordable price. Compare Apidog vs. Postman →

Table of Contents


Why Combine LiteLLM and Ollama?

What is LiteLLM?

LiteLLM is a lightweight Python library that standardizes how you interact with LLM APIs. Instead of juggling multiple SDKs and request formats, you use a simple, consistent function call (litellm.completion()), regardless of provider—cloud or local.

Key Features:

Why Use LiteLLM with Ollama?


Prerequisites and Setup

Before you start, make sure you have:

Check with:

bash
python --version
# or
python3 --version

If not installed, get it from python.org.

Check with:

bash
pip --version
# or
pip3 --version

Download from ollama.com and follow platform instructions.


Pulling and Running Ollama Models

After installing Ollama, pull a model to use locally. For example, to download Llama 3:

bash
ollama pull llama3

Replace llama3 with any other supported model (mistral, phi3, gemma:2b, etc.) as needed. Find more options on the Ollama model library.

Verify Ollama is Running:

bash
ollama serve
bash
curl http://localhost:11434

You should see a status message. If not, ensure Ollama is started and not blocked by firewall settings.


Installing and Configuring LiteLLM

Install LiteLLM using pip:

bash
pip install litellm
or
pip3 install litellm

Verify Installation:

python
import litellm
print(litellm.version)

Making Your First LiteLLM Call to Ollama

Here's a minimal Python script to send a prompt to your local Llama 3 model via LiteLLM.

python
import litellm

model_name = "ollama/llama3"  # Prefix with "ollama/" for local models

messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is the sky blue?"}
]

Run the script:

bash
python ollama_test.py

You should receive an answer from Llama 3, demonstrating that your local LLM is accessible with the same API as OpenAI.


Streaming Responses for Better UX

For chatbots and interactive apps, streaming lets users see tokens as they're generated. LiteLLM supports streaming out of the box.

Streaming Example:

python
import litellm

model_name = "ollama/llama3"
messages = [
{"role": "system", "content": "You are a concise poet."},
{"role": "user", "content": "Write a short haiku about a cat."}
]

response_stream = litellm.completion(
model=model_name,
messages=messages,
stream=True
)

You'll see the output appear token by token, just like with OpenAI's API.


Customizing LiteLLM & Ollama Integration

Using Environment Variables

If Ollama isn't on the default localhost:11434, set the API endpoint:

bash
export OLLAMAAPIBASE="http://192.168.1.100:11434"
python your_script.py
cmd
set OLLAMAAPIBASE=http://192.168.1.100:11434
python your_script.py

Using a config.yaml File

For multiple models or advanced routing, use a configuration file.

yaml
config.yaml
model_list:
- model_name: ollama/llama3
litellm_params:
model: ollama/llama3
api_base: "http://localhost:11434"
- model_name: ollama/mistral-remote
litellm_params:
model: ollama/mistral
api_base: "http://192.168.1.100:11434"

Load the config in your script:

python
import litellm
litellm.load_config("config.yaml")

Now you can reference models by the names defined in model_list.

Model Aliasing

Register model aliases programmatically for easier switching:

python
import litellm

litellm.register_model({
"my-local-llm": {
"model": "ollama/llama3"
}
})

Advanced Features: Retries, Logging, and Callbacks

Retries:

python
response = litellm.completion(
model="ollama/llama3",
messages=messages,
num_retries=3,
timeout=10
)

Logging with Callbacks:

python
import logging
import litellm

def log_success(kwargs, response, start, end):
logging.info(f"Success: {kwargs['model']} - {response.choices[0].message.content[:50]}...")

def log_failure(kwargs, exc, start, end):
logging.error(f"Failure: {kwargs['model']} - {exc}")


Using the LiteLLM Proxy for Scalable API Access

For team projects, microservices, or production environments, the LiteLLM Proxy provides a robust, OpenAI-compatible REST API for all your LLMs (local and cloud).

Why Use the Proxy?

Setting Up the Proxy

bash
pip install 'litellm[proxy]'
yaml
model_list:
- model_name: local-llama3
litellm_params:
model: ollama/llama3
api_base: "http://localhost:11434"
- model_name: cloud-gpt4o
litellm_params:
model: gpt-4o-mini
apikey: "os.environ/OPENAIAPI_KEY"
bash
litellm --config proxy_config.yaml --port 8000
python
import openai

client = openai.OpenAI(
base_url="http://localhost:8000",
api_key="sk-xxxx"  # Proxy ignores this for local models
)

Tip: Use Apidog to design, test, and document these API endpoints for your team.


Troubleshooting Common Issues

Issue Possible Cause Solution
Connection Refused Ollama not running, wrong API base, firewall Start Ollama, check endpoint, update config
Model Not Found Model not pulled, typo, missing prefix ollama pull, check spelling, use ollama/ prefix
Timeout Large model, slow hardware Use smaller model, increase timeout
Dependency Conflicts Python/pip version mismatch Use virtualenv, reinstall dependencies
Verbose Logs Needed Debugging issues Set litellm.set_verbose = True or enable in config

Conclusion & Next Steps

By integrating LiteLLM with Ollama, API developers and backend engineers can:

Recommended Next Steps:

button

Explore more

Which Free LLM Works Best with OpenClaw + Web Search?

Which Free LLM Works Best with OpenClaw + Web Search?

Compare free LLMs for OpenClaw: Ollama, Groq, OpenRouter, Mistral. Complete setup guide with web search integration. Save $100+/month on AI costs.

9 March 2026

How to Connect Google Workspace CLI to OpenClaw

How to Connect Google Workspace CLI to OpenClaw

Learn how to integrate Google Workspace CLI (gws) with OpenClaw for AI-powered automation. 100+ agent skills, step-by-step setup, real-world workflows.

6 March 2026

Cursor Automation vs OpenClaw: Which AI Agent Should You Choose?

Cursor Automation vs OpenClaw: Which AI Agent Should You Choose?

Compare Cursor Automation and OpenClaw side-by-side. See which AI agent fits your workflow, with pricing, features, and use case breakdowns.

6 March 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs