Ollama Cheatsheet - How to Run LLMs Locally with Ollama

With strong reasoning capabilities, code generation prowess, and the ability to process multimodal inputs, it's an excellent all-around choice for developers looking to push the boundaries of what's possible with local AI.

Ashley Goolam

Ashley Goolam

11 March 2025

Ollama Cheatsheet - How to Run LLMs Locally with Ollama

Introduction to Local LLMs with Ollama

The AI landscape is rapidly evolving, but one trend stands clear: developers increasingly want control, privacy, and flexibility over their AI implementations. Ollama delivers exactly that, offering a streamlined way to run powerful large language models locally on your hardware without the constraints of cloud-based APIs.

Why run models locally? Three compelling reasons: complete privacy for sensitive data, zero latency issues from API calls, and freedom from usage quotas or unexpected costs. When you're building applications that require consistent AI performance without sending user data to third parties, local inference becomes not just appealing but essential.

DeepSeek-R1 represents a significant advancement in open-source AI models, rivaling the capabilities of many commercial offerings. With strong reasoning capabilities, code generation prowess, and the ability to process multimodal inputs, it's an excellent all-around choice for developers looking to push the boundaries of what's possible with local AI.


Powerful LLMs deserve powerful API testing.

When building applications that integrate with local LLMs like DeepSeek through Ollama, you'll inevitably face the challenge of debugging streaming AI responses. That's where Apidog truly shines.

Unlike generic API tools, Apidog's specialized SSE debugging visualizes token-by-token generation in real-time—giving you unprecedented visibility into how your model thinks. Whether you're building a chatbot, content generator, or AI-powered search, Apidog makes working with Ollama's API endpoints remarkably painless.

I've personally found this combination to be game-changing for local LLM development.

button

Getting Started with Ollama

Installation

Installing Ollama is remarkably straightforward across major operating systems:

curl -fsSL https://ollama.com/install.sh | sh

After installation, start the Ollama server with:

ollama serve

This command launches Ollama as a service that listens for requests on localhost:11434. Keep this terminal window running, or set up Ollama as a background service if you plan to use it continuously.

System Requirements

For optimal performance with DeepSeek-R1:

Basic Commands

Check your installed version:

ollama --version

Get help on available commands:

ollama help

Managing Models

Discovering and Pulling Models

Before diving into model manipulation, let's see what's available:

ollama list

This command shows all locally installed models. When you're ready to download DeepSeek-R1:

ollama pull deepseek-r1

Ollama provides different model sizes to match your hardware capabilities. For machines with limited resources, try:

ollama pull deepseek-r1:7b

For more powerful setups seeking enhanced capabilities:

ollama pull deepseek-r1:8b

Running into content restrictions? Some developers prefer less filtered models:

ollama pull open-r1

Running Models Effectively

The true power of Ollama becomes apparent when you start interacting with models. Launch an interactive chat session:

ollama run deepseek-r1

This opens a real-time conversation where you can explore DeepSeek-R1's capabilities. Type your queries and press Enter, or use /help to see special commands available during the session.

For quick, one-off queries without entering interactive mode:

ollama run deepseek-r1 "Explain quantum computing in simple terms"

Process text directly from files—incredibly useful for summarization, analysis, or transformation tasks:

ollama run deepseek-r1 "Summarize the content of this file in 50 words." < input.txt

Fine-tuning Model Parameters

DeepSeek-R1's behavior can be dramatically altered through parameter adjustments. For creative, varied outputs:

ollama run deepseek-r1 --temperature 0.7 --top-p 0.9

For factual, deterministic responses better suited to coding or technical explanation:

ollama run deepseek-r1 --temperature 0.1 --top-p 1.0

Parameter Guide:

Advanced Uses and API Integration

Custom Modelfiles for Specialized Applications

Ollama's true flexibility emerges when you create custom Modelfiles to adapt DeepSeek-R1 for specific tasks:

FROM deepseek-r1:8b
PARAMETER temperature 0.3
PARAMETER top_p 0.95
SYSTEM You are a senior software developer specializing in Python. Provide clean, efficient code with helpful comments.

Save this as Modelfile and create your customized model:

ollama create python-expert -f Modelfile

Run it just like any other model:

ollama run python-expert "Write a function to find prime numbers in a given range"

REST API for Application Integration

While command-line usage is convenient for experimentation, real-world applications need API access. Ollama provides a simple REST API on port 11434:

# Basic completion request
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "Write a recursive function to calculate Fibonacci numbers",
  "stream": false
}'

For streaming responses (ideal for chat interfaces):

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1",
  "prompt": "Explain how neural networks learn in simple terms",
  "stream": true
}'

Testing API Endpoints with Apidog

When building applications that integrate with Ollama's API, testing and visualizing the streaming responses becomes crucial. Apidog excels at handling Server-Sent Events (SSE) like those produced by Ollama's streaming API:

  1. Create a new HTTP project in Apidog
  2. Add an endpoint with the URL http://localhost:11434/api/generate
  3. Set up a POST request with the JSON body:
{
  "model": "deepseek-r1",
  "prompt": "Write a story about a programmer who discovers an AI",
  "stream": true
}

4. Send the request and watch as Apidog's SSE debugger visualizes the token-by-token generation process in real-time

This visualization helps identify issues with response formatting, token generation, or unexpected model behavior that might be hard to debug otherwise.

Real-World Applications with DeepSeek-R1

DeepSeek-R1 excels in various practical scenarios:

Content Generation

Create professional-quality blog posts:

ollama run deepseek-r1 "Write a 500-word blog post about sustainable technology"

Information Extraction

Process and analyze documents to extract key information:

ollama run deepseek-r1 "Extract the key points from this research paper: " < paper.txt

Image Analysis

Process images for content description or analysis:

ollama run deepseek-r1 "Analyze and describe the content of this image" < image.jpg

Code Generation and Explanation

Generate code solutions for specific problems:

ollama run deepseek-r1 "Write a Python function that implements a binary search algorithm with detailed comments"

Or explain complex code:

ollama run deepseek-r1 "Explain what this code does: " < complex_algorithm.py

Troubleshooting Common Issues

Memory and Performance Problems

If you encounter out-of-memory errors:

API Connection Issues

If you can't connect to the API:

Conclusion

Ollama with DeepSeek-R1 represents a significant step toward democratizing AI by putting powerful language models directly in developers' hands. The combination offers privacy, control, and impressive capabilities—all without reliance on external services.

As you build applications with these local LLMs, remember that proper testing of your API integrations is crucial for reliable performance. Tools like Apidog can help visualize and debug the streaming responses from Ollama, especially when you're building complex applications that need to process model outputs in real-time.

Whether you're generating content, building conversational interfaces, or creating code assistants, this powerful duo provides the foundation you need for sophisticated AI integration—right on your own hardware.

Explore more

15 Tools to Automate API Docs Generations

15 Tools to Automate API Docs Generations

In the fast-paced world of software development, the mantra is "if it's not documented, it doesn't exist." Yet, API documentation is often the most neglected part of the development lifecycle. Manual documentation is tedious, prone to human error, and perpetually out of sync with the actual code. This disconnect frustrates consuming developers, increases support tickets, and slows down integration and adoption. The solution is clear: automation. By integrating tools that automatically generate

12 June 2025

10 Real Estate APIs for Developers to Check Out in 2025

10 Real Estate APIs for Developers to Check Out in 2025

Data is the new bedrock. From instant home valuations to immersive virtual tours and AI-powered investment analysis, nearly every modern real estate innovation is fueled by vast quantities of accessible, accurate data. But how does this information travel from sprawling databases to the sleek applications on our screens? The answer lies in a powerful, unseen engine: the Application Programming Interface (API). For those outside the tech world, an API can be thought of as a secure, standardized

12 June 2025

OpenAI o3 API Pricing (Update: Drops 80%, Cheaper than Claude 4)

OpenAI o3 API Pricing (Update: Drops 80%, Cheaper than Claude 4)

Discover how OpenAI’s 80% price drop on O3 pricing transforms AI accessibility for developers and businesses. Learn about token costs, performance benchmarks, and industry implications in this detailed analysis.

12 June 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs