Ollama Cheatsheet - How to Run LLMs Locally with Ollama

Introduction to Local LLMs with Ollama

The AI landscape is rapidly evolving, but one trend stands clear: developers increasingly want control, privacy, and flexibility over their AI implementations. Ollama delivers exactly that, offering a streamlined way to run powerful large language models locally on your hardware without the constraints of cloud-based APIs.

Why run models locally? Three compelling reasons: complete privacy for sensitive data, zero latency issues from API calls, and freedom from usage quotas or unexpected costs. When you're building applications that require consistent AI performance without sending user data to third parties, local inference becomes not just appealing but essential.

DeepSeek-R1 represents a significant advancement in open-source AI models, rivaling the capabilities of many commercial offerings. With strong reasoning capabilities, code generation prowess, and the ability to process multimodal inputs, it's an excellent all-around choice for developers looking to push the boundaries of what's possible with local AI.

Powerful LLMs deserve powerful API testing.

When building applications that integrate with local LLMs like DeepSeek through Ollama, you'll inevitably face the challenge of debugging streaming AI responses. That's where Apidog truly shines.

Unlike generic API tools, Apidog's specialized SSE debugging visualizes token-by-token generation in real-time—giving you unprecedented visibility into how your model thinks. Whether you're building a chatbot, content generator, or AI-powered search, Apidog makes working with Ollama's API endpoints remarkably painless.

I've personally found this combination to be game-changing for local LLM development.

button

Getting Started with Ollama

Installation

Installing Ollama is remarkably straightforward across major operating systems:

curl -fsSL https://ollama.com/install.sh | sh

After installation, start the Ollama server with:

ollama serve

This command launches Ollama as a service that listens for requests on localhost:11434. Keep this terminal window running, or set up Ollama as a background service if you plan to use it continuously.

System Requirements

For optimal performance with DeepSeek-R1:

Minimum: 8GB RAM, modern CPU with 4+ cores
Recommended: 16GB+ RAM, NVIDIA GPU with 8GB+ VRAM
Storage: At least 10GB free space for the base model

Basic Commands

Check your installed version:

ollama --version

Get help on available commands:

ollama help

Managing Models

Discovering and Pulling Models

Before diving into model manipulation, let's see what's available:

ollama list

This command shows all locally installed models. When you're ready to download DeepSeek-R1:

ollama pull deepseek-r1

Ollama provides different model sizes to match your hardware capabilities. For machines with limited resources, try:

ollama pull deepseek-r1:7b

For more powerful setups seeking enhanced capabilities:

ollama pull deepseek-r1:8b

Running into content restrictions? Some developers prefer less filtered models:

ollama pull open-r1

Running Models Effectively

The true power of Ollama becomes apparent when you start interacting with models. Launch an interactive chat session:

ollama run deepseek-r1

This opens a real-time conversation where you can explore DeepSeek-R1's capabilities. Type your queries and press Enter, or use /help to see special commands available during the session.

For quick, one-off queries without entering interactive mode:

ollama run deepseek-r1 "Explain quantum computing in simple terms"

Process text directly from files—incredibly useful for summarization, analysis, or transformation tasks:

ollama run deepseek-r1 "Summarize the content of this file in 50 words." < input.txt

Fine-tuning Model Parameters

DeepSeek-R1's behavior can be dramatically altered through parameter adjustments. For creative, varied outputs:

ollama run deepseek-r1 --temperature 0.7 --top-p 0.9

For factual, deterministic responses better suited to coding or technical explanation:

ollama run deepseek-r1 --temperature 0.1 --top-p 1.0

Parameter Guide:

Temperature (0.0-1.0): Lower values make responses more focused and deterministic; higher values introduce creativity and variety.
Top-p (0.0-1.0): Controls diversity by considering only the most probable tokens whose cumulative probability exceeds this threshold.
Context window: Determines how much previous conversation the model remembers.

Advanced Uses and API Integration

Custom Modelfiles for Specialized Applications

Ollama's true flexibility emerges when you create custom Modelfiles to adapt DeepSeek-R1 for specific tasks:

FROM deepseek-r1:8b
PARAMETER temperature 0.3
PARAMETER top_p 0.95
SYSTEM You are a senior software developer specializing in Python. Provide clean, efficient code with helpful comments.

Save this as Modelfile and create your customized model:

ollama create python-expert -f Modelfile

Run it just like any other model:

ollama run python-expert "Write a function to find prime numbers in a given range"

REST API for Application Integration

While command-line usage is convenient for experimentation, real-world applications need API access. Ollama provides a simple REST API on port 11434:

# Basic completion request
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "Write a recursive function to calculate Fibonacci numbers",
  "stream": false
}'

For streaming responses (ideal for chat interfaces):

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "Explain how neural networks learn in simple terms",
  "stream": true
}'

Testing API Endpoints with Apidog

When building applications that integrate with Ollama's API, testing and visualizing the streaming responses becomes crucial. Apidog excels at handling Server-Sent Events (SSE) like those produced by Ollama's streaming API:

Create a new HTTP project in Apidog
Add an endpoint with the URL http://localhost:11434/api/generate
Set up a POST request with the JSON body:

{
  "model": "deepseek-r1",
  "prompt": "Write a story about a programmer who discovers an AI",
  "stream": true
}

4. Send the request and watch as Apidog's SSE debugger visualizes the token-by-token generation process in real-time

This visualization helps identify issues with response formatting, token generation, or unexpected model behavior that might be hard to debug otherwise.

Real-World Applications with DeepSeek-R1

DeepSeek-R1 excels in various practical scenarios:

Content Generation

Create professional-quality blog posts:

ollama run deepseek-r1 "Write a 500-word blog post about sustainable technology"

Information Extraction

Process and analyze documents to extract key information:

ollama run deepseek-r1 "Extract the key points from this research paper: " < paper.txt

Image Analysis

Process images for content description or analysis:

ollama run deepseek-r1 "Analyze and describe the content of this image" < image.jpg

Code Generation and Explanation

Generate code solutions for specific problems:

ollama run deepseek-r1 "Write a Python function that implements a binary search algorithm with detailed comments"

Or explain complex code:

ollama run deepseek-r1 "Explain what this code does: " < complex_algorithm.py

Troubleshooting Common Issues

Memory and Performance Problems

If you encounter out-of-memory errors:

Try a smaller model variant (7B instead of 8B)
Reduce the context window size with --ctx N (e.g., --ctx 2048)
Close other memory-intensive applications
For CUDA users, ensure you have the latest NVIDIA drivers installed

API Connection Issues

If you can't connect to the API:

Ensure Ollama is running with ollama serve
Check if the default port is blocked or in use (lsof -i :11434)
Verify firewall settings if connecting from another machine

Conclusion

Ollama with DeepSeek-R1 represents a significant step toward democratizing AI by putting powerful language models directly in developers' hands. The combination offers privacy, control, and impressive capabilities—all without reliance on external services.

As you build applications with these local LLMs, remember that proper testing of your API integrations is crucial for reliable performance. Tools like Apidog can help visualize and debug the streaming responses from Ollama, especially when you're building complex applications that need to process model outputs in real-time.

Whether you're generating content, building conversational interfaces, or creating code assistants, this powerful duo provides the foundation you need for sophisticated AI integration—right on your own hardware.