How to Download and Use Ollama to Run LLMs Locally

Ashley Goolam

Ashley Goolam

14 April 2025

How to Download and Use Ollama to Run LLMs Locally

The world of Artificial Intelligence (AI) is evolving at breakneck speed, with Large Language Models (LLMs) like ChatGPT, Claude, and Gemini capturing imaginations worldwide. These powerful tools can write code, draft emails, answer complex questions, and even generate creative content. However, using these cloud-based services often comes with concerns about data privacy, potential costs, and the need for a constant internet connection.

Enter Ollama.

Ollama is a powerful, open-source tool designed to democratize access to large language models by enabling you to download, run, and manage them directly on your own computer. It simplifies the often complex process of setting up and interacting with state-of-the-art AI models locally.

Why Use Ollama?

Running LLMs locally with Ollama offers several compelling advantages:

  1. Privacy: Your prompts and the model's responses stay on your machine. No data is sent to external servers unless you explicitly configure it to do so. This is crucial for sensitive information or proprietary work.
  2. Offline Access: Once a model is downloaded, you can use it without an internet connection, making it perfect for travel, remote locations, or situations with unreliable connectivity.
  3. Customization: Ollama allows you to easily modify models using 'Modelfiles', letting you tailor their behavior, system prompts, and parameters to your specific needs.
  4. Cost-Effective: There are no subscription fees or per-token charges. The only cost is the hardware you already own and the electricity to run it.
  5. Exploration & Learning: It provides a fantastic platform for experimenting with different open-source models, understanding their capabilities and limitations, and learning more about how LLMs work under the hood.

This article is designed for beginners who are comfortable using a command-line interface (like Terminal on macOS/Linux or Command Prompt/PowerShell on Windows) and want to start exploring the world of local LLMs with Ollama. We'll guide you through understanding the basics, installing Ollama, running your first model, interacting with it, and exploring basic customization.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

How Does Ollama Work?

Before diving into the installation, let's clarify a few fundamental concepts.

What are Large Language Models (LLMs)?

Think of an LLM as an incredibly advanced autocomplete system trained on vast amounts of text and code from the internet. By analyzing patterns in this data, it learns grammar, facts, reasoning abilities, and different styles of writing. When you give it a prompt (input text), it predicts the most likely sequence of words to follow, generating a coherent and often insightful response. Different LLMs are trained with different datasets, sizes, and architectures, leading to variations in their strengths, weaknesses, and personalities.

How Does Ollama Work?

Ollama acts as a manager and runner for these LLMs on your local machine. Its core functions include:

  1. Model Downloading: It fetches pre-packaged LLM weights and configurations from a central library (similar to how Docker pulls container images).
  2. Model Execution: It loads the chosen model into your computer's memory (RAM) and potentially utilizes your graphics card (GPU) for acceleration.
  3. Providing Interfaces: It offers a simple command-line interface (CLI) for direct interaction and also runs a local web server that provides an API (Application Programming Interface) for other applications to communicate with the running LLM.

Hardware Requirements for Ollama: Can My Computer Run It?

Running LLMs locally can be demanding, primarily on your computer's RAM (Random Access Memory). The size of the model you want to run dictates the minimum RAM required.

Other Factors You might need to consider:

Recommendation for Beginners: Start with smaller models (like phi3, mistral, or llama3:8b) and ensure you have at least 16GB of RAM for a comfortable initial experience. Check the Ollama website or model library for specific RAM recommendations for each model.

How to Install Ollama on Mac, Linux, and Windows (Using WSL)

Ollama supports macOS, Linux, and Windows (currently in preview, often requiring WSL).

Step 1: Prerequisites

Step 2: Downloading and Installing Ollama

The process varies slightly depending on your OS:

  1. Go to the official Ollama website: https://ollama.com
  2. Click the "Download" button, then select "Download for macOS".
  3. Once the .dmg file is downloaded, open it.
  4. Drag the Ollama application icon into your Applications folder.
  5. You might need to grant permissions the first time you run it.

The quickest way is usually via the official install script. Open your terminal and run:

curl -fsSL <https://ollama.com/install.sh> | sh

This command downloads the script and executes it, installing Ollama for your user. It will also attempt to detect and configure GPU support if applicable (NVIDIA drivers needed).

Follow any prompts displayed by the script. Manual installation instructions are also available on the Ollama GitHub repository if you prefer.

  1. Go to the official Ollama website: https://ollama.com
  2. Click the "Download" button, then select "Download for Windows (Preview)".
  3. Run the downloaded installer executable (.exe).
  4. Follow the installation wizard steps.
  5. Important Note: Ollama on Windows relies heavily on the Windows Subsystem for Linux (WSL2). The installer might prompt you to install or configure WSL2 if it's not already set up. GPU acceleration typically requires specific WSL configurations and NVIDIA drivers installed within the WSL environment. Using Ollama might feel more native within a WSL terminal.

Step 3: Verifying the Installation

Once installed, you need to verify that Ollama is working correctly.

Open your terminal or command prompt. (On Windows, using a WSL terminal is often recommended).

Type the following command and press Enter:

ollama --version

If the installation was successful, you should see output displaying the installed Ollama version number, like:

ollama version is 0.1.XX

If you see this, Ollama is installed and ready to go! If you encounter an error like "command not found," double-check the installation steps, ensure Ollama was added to your system's PATH (the installer usually handles this), or try restarting your terminal or computer.

Getting Started: Running Your First Model with Ollama

With Ollama installed, you can now download and interact with an LLM.

Concept: The Ollama Model Registry

Ollama maintains a library of readily available open-source models. When you ask Ollama to run a model it doesn't have locally, it automatically downloads it from this registry. Think of it like docker pull for LLMs. You can browse available models on the Ollama website's library section.

Choosing a Model

For beginners, it's best to start with a well-rounded and relatively small model. Good options include:

Check the Ollama library for details on each model's size, RAM requirements, and typical use cases.

Downloading and Running a Model (Command Line)

The primary command you'll use is ollama run.

Open your terminal.

Choose a model name (e.g., llama3:8b).

Type the command:

ollama run llama3:8b

Press Enter.

What Happens Next?

Download: If you don't have llama3:8b locally, Ollama will display progress bars as it downloads the model layers. This might take some time depending on your internet speed and the model size (often several gigabytes).

Loading: Once downloaded, Ollama loads the model into your computer's RAM (and GPU memory, if applicable). You might see messages about hardware detection.

Interactive Chat: After loading, you'll see a prompt like: This means the model is ready! You are now in an interactive chat session directly with the LLM running on your machine.

>>> Send a message (/? for help)

Interacting with the Model:

Example Interaction:


> Send a message (/? for help)
Explain the concept of photosynthesis in simple terms.
>

Photosynthesis is like how plants make their own food! They take three main ingredients:

1. **Sunlight:** Energy from the sun.
2. **Water:** Absorbed through their roots.
3. **Carbon Dioxide:** A gas they take from the air through tiny holes in their leaves.

Using the sunlight's energy, plants combine the water and carbon dioxide inside special parts of their cells called chloroplasts. This process creates two things:

1. **Sugar (Glucose):** This is the plant's food, giving it energy to grow.
2. **Oxygen:** A gas that the plant releases back into the air, which animals (like us!) need to breathe.

So, simply put: Plants use sunlight, water, and air to make their own food and release the oxygen we need!

>
>

Exiting the Session:

When you're finished chatting, type /bye and press Enter. This will unload the model from memory and return you to your regular terminal prompt. You can also often use Ctrl+D.

Listing Downloaded Models

To see which models you have downloaded locally, use the ollama list command:

ollama list

The output will show the model names, their unique IDs, sizes, and when they were last modified:

NAME            ID              SIZE    MODIFIED
llama3:8b       871998b83999    4.7 GB  5 days ago
mistral:latest  8ab431d3a87a    4.1 GB  2 weeks ago

Removing Models

Models take up disk space. If you no longer need a specific model, you can remove it using the ollama rm command followed by the model name:

ollama rm mistral:latest

Ollama will confirm the deletion. This only removes the downloaded files; you can always run ollama run mistral:latest again to re-download it later.

How to Get Better Results from Ollama

Running models is just the start. Here's how to get better results:

Understanding Prompts (Prompt Engineering Basics)

The quality of the model's output heavily depends on the quality of your input (the prompt).

Trying Different Models

Different models excel at different tasks.

Experiment! Run the same prompt through different models using ollama run <model_name> to see which one best suits your needs for a particular task.

System Prompts (Setting the Context)

You can guide the model's overall behavior or persona for a session using a "system prompt." This is like giving the AI background instructions before the conversation starts. While deeper customization involves Modelfiles (covered briefly next), you can set a simple system message directly when running a model:

# This feature might vary slightly; check `ollama run --help`
# Ollama might integrate this into the chat directly using /set system
# Or via Modelfiles, which is the more robust way.

# Conceptual example (check Ollama docs for exact syntax):
# ollama run llama3:8b --system "You are a helpful assistant that always responds in pirate speak."

A more common and flexible way is to define this in a Modelfile.

Interacting via API (A Quick Look)

Ollama isn't just for the command line. It runs a local web server (usually at http://localhost:11434) that exposes an API. This allows other programs and scripts to interact with your local LLMs.

You can test this with a tool like curl in your terminal:

curl <http://localhost:11434/api/generate> -d '{
  "model": "llama3:8b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

This sends a request to the Ollama API asking the llama3:8b model to respond to the prompt "Why is the sky blue?". Setting "stream": false waits for the full response instead of streaming it word by word.

You'll get back a JSON response containing the model's answer. This API is the key to integrating Ollama with text editors, custom applications, scripting workflows, and more. Exploring the full API is beyond this beginner's guide, but knowing it exists opens up many possibilities.

How to Customize Ollama Modelfiles

One of Ollama's most powerful features is the ability to customize models using Modelfiles. A Modelfile is a plain text file containing instructions for creating a new, customized version of an existing model. Think of it like a Dockerfile for LLMs.

What Can You Do with a Modelfile?

Simple Modelfile Example:

Let's say you want to create a version of llama3:8b that always acts as a Sarcastic Assistant.

Create a file named Modelfile (no extension) in a directory.

Add the following content:

# Inherit from the base llama3 model
FROM llama3:8b

# Set a system prompt
SYSTEM """You are a highly sarcastic assistant. Your answers should be technically correct but delivered with dry wit and reluctance."""

# Adjust creativity (lower temperature = less random/more focused)
PARAMETER temperature 0.5

Creating the Custom Model:

Navigate to the directory containing your Modelfile in the terminal.

Run the ollama create command:

ollama create sarcastic-llama -f ./Modelfile

Ollama will process the instructions and create the new model. You can then run it like any other:

ollama run sarcastic-llama

Now, when you interact with sarcastic-llama, it will adopt the sarcastic persona defined in the SYSTEM prompt.

Modelfiles offer deep customization potential, allowing you to fine-tune models for specific tasks or behaviors without needing to retrain them from scratch. Explore the Ollama documentation for more details on available instructions and parameters.

Fixing Common Ollama Errors

While Ollama aims for simplicity, you might encounter occasional hurdles:

Installation Fails:

Model Download Failures:

Ollama Slow Performance:

"Model not found" Errors:

Ollama Alternatives?

Several compelling alternatives to Ollama exist for running large language models locally.

Conclusion: Your Journey into Local AI

Ollama throws open the doors to the fascinating world of large language models, allowing anyone with a reasonably modern computer to run powerful AI tools locally, privately, and without ongoing costs.

This is just the beginning. The real fun starts as you experiment with different models, tailor them to your specific needs using Modelfiles, integrate Ollama into your own scripts or applications via its API, and explore the rapidly growing ecosystem of open-source AI.

The ability to run sophisticated AI locally is transformative, empowering individuals and developers alike. Dive in, explore, ask questions, and enjoy having the power of large language models right at your fingertips with Ollama.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Explore more

Using Mistral Agents API with MCP: How Good Is It?

Using Mistral Agents API with MCP: How Good Is It?

Artificial Intelligence (AI) is rapidly moving beyond simply generating text or recognizing images. The next frontier is about AI that can take action, solve problems, and interact with the world in meaningful ways. Mistral AI, a prominent name in the field, has taken a significant step in this direction with its Mistral Agents API. This powerful toolkit allows developers to build sophisticated AI agents that can do much more than traditional language models. At its core, the Agents API is desi

28 May 2025

4 Ways to Get Free Access to ChatGPT Plus

4 Ways to Get Free Access to ChatGPT Plus

Eager for free ChatGPT Plus? Uncover current offers, including a student deal expiring May 31, 2025! Learn how Apidog's revolutionary LLMs.txt feature transforms your AI coding experience, making API development with tools like ChatGPT Plus seamless and powerful.

28 May 2025

BAGEL-7B-MoT: ByteDance’s Breakthrough in Multimodal AI Innovation

BAGEL-7B-MoT: ByteDance’s Breakthrough in Multimodal AI Innovation

Discover BAGEL-7B-MoT, ByteDance’s open-source multimodal AI model with 7B parameters. Learn about its Mixture-of-Transformer-Experts architecture, advanced image editing, and world modeling capabilities.

28 May 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs