Apidog

All-in-one Collaborative API Development Platform

API Design

API Documentation

API Debugging

API Mocking

API Automated Testing

How to Run Mistral Small 3.1 Locally Using Ollama: A Step-by-Step Guide

Learn how to run Mistral Small 3.1, a top open-source AI model, locally using Ollama. This easy guide covers setup, usage, and tips.

Ashley Innocent

Ashley Innocent

Updated on March 19, 2025

Running advanced AI models locally offers developers and tech enthusiasts unparalleled control, privacy, and customization options. If you're eager to harness the power of cutting-edge artificial intelligence on your own machine, Mistral Small 3.1, combined with Ollama, provides an excellent solution. Mistral Small 3.1 is a state-of-the-art language model developed by Mistral AI, boasting 24 billion parameters and top-tier performance in its weight class. Meanwhile, Ollama simplifies the process of deploying such large language models (LLMs) locally, making it accessible even to those with modest technical setups. In this comprehensive guide, we’ll walk you through every step to get Mistral Small 3.1 running on your system using Ollama. Plus, we’ll show you how to enhance your experience by integrating Apidog, a powerful tool for API development and testing.

Why go local? By running Mistral Small 3.1 on your machine, you keep your data private, avoid cloud costs, and gain the flexibility to tweak the model for your needs whether that’s building a chatbot, generating code, or processing multilingual text.

💡
To make interacting with your local model even smoother, we recommend using Apidog. This free API tool lets you test and debug your model’s endpoints effortlessly. Download Apidog for free today and streamline your workflow as you explore Mistral Small 3.1’s capabilities!
button

Why Choose Mistral Small 3.1 and Ollama?

Before jumping into the setup, let’s explore why Mistral Small 3.1 and Ollama make such a compelling pair. Mistral Small 3.1, released under the open-source Apache 2.0 license, delivers exceptional performance for its size. With a 128k-token context window, it handles long conversations or documents with ease. It also supports multiple languages and multimodal inputs, making it versatile for tasks like text generation, translation, or even image-caption analysis. Developers love its efficiency, as it rivals larger models while running on relatively modest hardware.

Ollama, on the other hand, is a lightweight tool designed to run LLMs locally. It abstracts away much of the complexity think dependency management or GPU configuration so you can focus on using the model rather than wrestling with setup hurdles. Together, Mistral Small 3.1 and Ollama empower you to deploy a high-performing AI model without relying on cloud services.

Installing Ollama on Your Machine

Ollama simplifies running LLMs locally, and installing it is straightforward. Follow these steps to get it up and running:

Install Ollama:  Download it from Ollama’s official website and follow the prompts.

Verify Installation: Confirm Ollama is installed correctly by checking its version:

ollama --version

You should see a version number (e.g., 0.1.x). If not, troubleshoot by ensuring your PATH includes Ollama’s binary.

After installing Ollama, you’re one step closer to running Mistral Small 3.1. Next, you need to fetch the model itself.

Downloading Mistral Small 3.1 Model Weights

Open up your terminal and type:

ollama pull cnjack/mistral-samll-3.1

This downloads the model weights to your local storage (a community version of mistral small 3.1). Link: https://ollama.com/cnjack/mistral-samll-3.1

Depending on your internet speed, this could take 15-30 minutes due to the 50GB+ file size.

Verify Download: Run ollama list again. You should see mistral-small-3.1 listed, indicating it’s ready to use.

Now that you have the model, let’s load it into Ollama and start exploring its capabilities.

Loading Mistral Small 3.1 into Ollama

Loading the model prepares it for inference. Ollama handles the heavy lifting, so this step is quick:

  1. Load the Model: Execute this command to load Mistral Small 3.1 into memory:
ollama run cnjack/mistral-samll-3.1:24b-it-q4_K_S

The first time you run this, Ollama initializes the model, which may take a few minutes depending on your hardware. Subsequent runs are faster.

  1. Test It Out: Once loaded, Ollama drops you into an interactive prompt. Type a simple query:
Hello, how does Mistral Small 3.1 work?

The model responds directly in the terminal, showcasing its text generation prowess.

At this point, Mistral Small 3.1 is operational. However, to unlock its full potential especially for programmatic access, let’s explore how to interact with it further.

Interacting with Mistral Small 3.1 Locally

You can engage with Mistral Small 3.1 in two primary ways: direct command-line inference or via an API server. Both methods leverage Ollama’s flexibility, and we’ll tie in Apidog for the API approach.

Method 1: Direct Inference via Command Line

For quick tests or one-off generations, use Ollama’s run command with a prompt:

ollama run cnjack/mistral-samll-3.1:24b-it-q4_K_S "Write a short poem about AI."

The model processes the input and outputs a response, such as:

Artificial minds in circuits deep,
Learning patterns while we sleep,
Voices of code, they softly speak,
A future bright, their thoughts we keep.

This method suits experimentation but lacks scalability. For more robust applications, set up an API server.

Method 2: Running Mistral Small 3.1 as an API Server

To integrate Mistral Small 3.1 into projects or test it systematically, run it as a local API server:

  1. Start the Server: Launch Ollama in server mode:
ollama serve

This starts a REST API on http://localhost:11434 (Ollama’s default port). Keep this running in one terminal.

  1. Test the API: In a new terminal, use curl to send a request:
curl http://localhost:11434/api/generate -d '{"model": "mistral-small-3.1", "prompt": "Explain AI in one sentence."}'

The response might look like:

{
  "response": "AI is the simulation of human intelligence by machines, enabling them to learn, reason, and perform tasks autonomously."
}

This API setup opens the door to advanced interactions, which is where Apidog shines.

Using Apidog to Interact with Mistral Small 3.1’s API

Apidog simplifies API testing and development, making it perfect for querying your local Mistral Small 3.1 server. Here’s how to set it up:

Install Apidog:

  • Head to Apidog’s website and download the free desktop app for your OS.
  • Install it following the on-screen instructions.
button

Create a New Project:

  • Open Apidog and click “New Project.”
  • Name it something like “Mistral Small 3.1 Local API.”

Add an API Request:

  • Click “New Request” and set the method to POST.
  • Enter the endpoint: http://localhost:11434/api/generate.

Configure the Request Body:

  • Switch to the “Body” tab, select “JSON,” and input:
{
  "model": "mistral-small-3.1",
  "prompt": "Generate a Python script to print 'Hello, AI!'",
  "stream": false
}

Send and Review:

  • Hit “Send.” Apidog displays the response, such as:
{
  "response": "print('Hello, AI!')"
}
  • Use Apidog’s tools to tweak parameters, save requests, or debug errors.

Apidog’s intuitive interface lets you experiment with prompts, monitor response times, and even automate tests ideal for developers building on Mistral Small 3.1.

Troubleshooting Common Issues

Running a 24B-parameter model locally can hit snags. Here are solutions to frequent problems:

  • Out of Memory Errors:
  • Fix: Ensure you have 32GB+ RAM and a GPU with sufficient VRAM (e.g., 24GB on an RTX 4090). Reduce batch size or context length if needed.
  • Model Not Found:
  • Fix: Verify the download completed (ollama list) and the model name matches exactly.
  • API Server Fails to Start:
  • Fix: Check if port 11434 is in use (netstat -tuln | grep 11434) and free it or change Ollama’s port via config.
  • Slow Performance:
  • Fix: Upgrade your GPU or offload less critical tasks to the CPU.

With these tips, you can resolve most issues and keep Mistral Small 3.1 humming along.

Optimizing and Expanding Your Setup

Now that Mistral Small 3.1 runs locally, consider enhancing it:

  • Fine-Tuning: Use datasets specific to your domain (e.g., legal texts, code) to tailor the model’s outputs.
  • Scaling: Run multiple instances of Ollama for different models or tasks.
  • Integration: Hook the API into web apps, bots, or workflows using Apidog to prototype endpoints.

These steps unlock Mistral Small 3.1’s full potential, adapting it to your unique projects.

Conclusion

Running Mistral Small 3.1 locally using Ollama is a game-changer for developers and AI enthusiasts. This guide has walked you through the process from gathering prerequisites to installing Ollama, downloading the model, and interacting with it via command line or API. By adding Apidog into the mix, you streamline API testing and open new possibilities for integration. With its 24 billion parameters, 128k-token context, and open-source flexibility, Mistral Small 3.1 offers immense power at your fingertips. Start experimenting today, and see how this duo can transform your projects.

button

How to Add Cline Memory Bank in Cursor: Step-by-Step GuideTutorials

How to Add Cline Memory Bank in Cursor: Step-by-Step Guide

This guide explains how to enable and configure the feature, helping you maintain project continuity and efficiency.

Emmanuel Mumba

March 27, 2025

A Beginner's Guide to Use IndexedDB APIsTutorials

A Beginner's Guide to Use IndexedDB APIs

This tutorial covers the essentials of IndexedDB, its features, and how to integrate it into your web projects.

Emmanuel Mumba

March 27, 2025

Empower Vibe Coding: Publish MCP Supported API DocumentationTutorials

Empower Vibe Coding: Publish MCP Supported API Documentation

Learn how Apidog's "Vibe Coding (via MCP)" feature transforms API documentation into an active resource that AI coding assistants can directly access, dramatically improving developer productivity and code accuracy.

Oliver Kingsley

March 26, 2025