Running large language models (LLMs) locally is increasingly popular among API developers and backend engineers seeking to balance performance, privacy, and flexibility. Ollama makes it easy to run open-source LLMs like Llama 3, Mistral, and Phi-3 directly on your own machine. But integrating these models into your workflow, especially alongside cloud LLMs (OpenAI, Anthropic, etc.), can be cumbersome due to differing APIs.
LiteLLM solves this by providing a unified Python interface for over 100 LLM providers—including Ollama, OpenAI, Cohere, and more. With LiteLLM, you can write your application logic once and seamlessly switch between local and cloud models, simplifying everything from prototyping to production.
Looking for a powerful API testing tool with beautiful documentation and team collaboration features? Apidog streamlines your API development lifecycle and delivers a modern alternative to Postman—at a more affordable price. Compare Apidog vs. Postman →
Table of Contents
- Why Combine LiteLLM and Ollama?
- Prerequisites & Setup
- Pulling and Running Ollama Models
- Installing and Configuring LiteLLM
- Making Your First LiteLLM Call to Ollama
- Streaming Responses for Better UX
- Customizing LiteLLM & Ollama Integration
- Using the LiteLLM Proxy for Scalable API Access
- Troubleshooting Common Issues
- Conclusion & Next Steps
Why Combine LiteLLM and Ollama?
What is LiteLLM?
LiteLLM is a lightweight Python library that standardizes how you interact with LLM APIs. Instead of juggling multiple SDKs and request formats, you use a simple, consistent function call (litellm.completion()), regardless of provider—cloud or local.
Key Features:
- Unified Interface: Consistent API calls for over 100 LLMs
- Provider Agnostic: Easily switch between models (e.g., OpenAI GPT-4o, Ollama/llama3)
- Built-in Reliability: Timeouts, retries, and fallbacks
- Observability: Logging, callbacks, and integration with monitoring tools
- Proxy Server: Centralized API key and routing management
- Cost Tracking: For managing API spend (where applicable)
Why Use LiteLLM with Ollama?
- Standardization: Write your API integration once; swap between local (Ollama) and remote (OpenAI, Anthropic, etc.) models by changing a single parameter.
- Local-First Development: Test and iterate privately and offline, then move to cloud models as needed.
- Advanced Features: Use retries, logging, streaming, and observability—locally as well as in the cloud.
- Hybrid Mode: Route requests intelligently (e.g., cost-sensitive tasks to local LLMs, high-accuracy jobs to cloud APIs).
- Enhanced Productivity: Combine with tools like Apidog to document and test your LLM-powered APIs efficiently.
Prerequisites and Setup
Before you start, make sure you have:
- Python ≥ 3.8 installed
Check with:
bash
python --version
# or
python3 --versionIf not installed, get it from python.org.
- pip (Python package installer)
Check with:
bash
pip --version
# or
pip3 --version- Ollama installed and running locally
Download from ollama.com and follow platform instructions.
- LiteLLM installed via pip (see below).
Pulling and Running Ollama Models
After installing Ollama, pull a model to use locally. For example, to download Llama 3:
bash
ollama pull llama3Replace llama3 with any other supported model (mistral, phi3, gemma:2b, etc.) as needed. Find more options on the Ollama model library.
Verify Ollama is Running:
- macOS/Windows: Ollama runs in the background (look for the icon).
- Linux/Manual: Start with:
bash
ollama serve- Check API status:
bash
curl http://localhost:11434You should see a status message. If not, ensure Ollama is started and not blocked by firewall settings.
Installing and Configuring LiteLLM
Install LiteLLM using pip:
bash
pip install litellm
or
pip3 install litellmVerify Installation:
python
import litellm
print(litellm.version)Making Your First LiteLLM Call to Ollama
Here's a minimal Python script to send a prompt to your local Llama 3 model via LiteLLM.
python
import litellm
model_name = "ollama/llama3" # Prefix with "ollama/" for local models
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Why is the sky blue?"}
]
Run the script:
bash
python ollama_test.pyYou should receive an answer from Llama 3, demonstrating that your local LLM is accessible with the same API as OpenAI.
Streaming Responses for Better UX
For chatbots and interactive apps, streaming lets users see tokens as they're generated. LiteLLM supports streaming out of the box.
Streaming Example:
python
import litellm
model_name = "ollama/llama3"
messages = [
{"role": "system", "content": "You are a concise poet."},
{"role": "user", "content": "Write a short haiku about a cat."}
]
response_stream = litellm.completion(
model=model_name,
messages=messages,
stream=True
)
You'll see the output appear token by token, just like with OpenAI's API.
Customizing LiteLLM & Ollama Integration
Using Environment Variables
If Ollama isn't on the default localhost:11434, set the API endpoint:
- Linux/macOS:
bash
export OLLAMAAPIBASE="http://192.168.1.100:11434"
python your_script.py- Windows:
cmd
set OLLAMAAPIBASE=http://192.168.1.100:11434
python your_script.pyUsing a config.yaml File
For multiple models or advanced routing, use a configuration file.
yaml
config.yaml
model_list:
- model_name: ollama/llama3
litellm_params:
model: ollama/llama3
api_base: "http://localhost:11434"
- model_name: ollama/mistral-remote
litellm_params:
model: ollama/mistral
api_base: "http://192.168.1.100:11434"Load the config in your script:
python
import litellm
litellm.load_config("config.yaml")Now you can reference models by the names defined in model_list.
Model Aliasing
Register model aliases programmatically for easier switching:
python
import litellm
litellm.register_model({
"my-local-llm": {
"model": "ollama/llama3"
}
})
Advanced Features: Retries, Logging, and Callbacks
Retries:
python
response = litellm.completion(
model="ollama/llama3",
messages=messages,
num_retries=3,
timeout=10
)Logging with Callbacks:
python
import logging
import litellm
def log_success(kwargs, response, start, end):
logging.info(f"Success: {kwargs['model']} - {response.choices[0].message.content[:50]}...")
def log_failure(kwargs, exc, start, end):
logging.error(f"Failure: {kwargs['model']} - {exc}")
Using the LiteLLM Proxy for Scalable API Access
For team projects, microservices, or production environments, the LiteLLM Proxy provides a robust, OpenAI-compatible REST API for all your LLMs (local and cloud).
Why Use the Proxy?
- Central management of model configs and API keys
- Standard OpenAI API for all clients (language-agnostic)
- Load balancing, routing, and logging built-in
- Secure key handling (cloud keys stay on the proxy server)
- Easy integration with API tools like Apidog
Setting Up the Proxy
- Install Proxy Dependencies
bash
pip install 'litellm[proxy]'- Create a Proxy Config File (
proxy_config.yaml):
yaml
model_list:
- model_name: local-llama3
litellm_params:
model: ollama/llama3
api_base: "http://localhost:11434"
- model_name: cloud-gpt4o
litellm_params:
model: gpt-4o-mini
apikey: "os.environ/OPENAIAPI_KEY"- Run the Proxy:
bash
litellm --config proxy_config.yaml --port 8000- Make OpenAI-Compatible Calls via Proxy:
python
import openai
client = openai.OpenAI(
base_url="http://localhost:8000",
api_key="sk-xxxx" # Proxy ignores this for local models
)
Tip: Use Apidog to design, test, and document these API endpoints for your team.
Troubleshooting Common Issues
| Issue | Possible Cause | Solution |
|---|---|---|
| Connection Refused | Ollama not running, wrong API base, firewall | Start Ollama, check endpoint, update config |
| Model Not Found | Model not pulled, typo, missing prefix | ollama pull, check spelling, use ollama/ prefix |
| Timeout | Large model, slow hardware | Use smaller model, increase timeout |
| Dependency Conflicts | Python/pip version mismatch | Use virtualenv, reinstall dependencies |
| Verbose Logs Needed | Debugging issues | Set litellm.set_verbose = True or enable in config |
Conclusion & Next Steps
By integrating LiteLLM with Ollama, API developers and backend engineers can:
- Run and test LLMs locally—saving on latency, privacy, and cost
- Switch between local and cloud models without changing application logic
- Leverage robust features like streaming, retries, logging, and proxy-based management
Recommended Next Steps:
- Try different Ollama models and experiment with their strengths
- Explore LiteLLM's documentation for advanced features (routing, observability, integrations)
- Use Apidog to create, test, and document your LLM-powered APIs—boosting your team's productivity and collaboration



