How to Run Mistral Small 3.1 Locally Using Ollama: A Step-by-Step Guide

Learn how to run Mistral Small 3.1, a top open-source AI model, locally using Ollama. This easy guide covers setup, usage, and tips.

Ashley Innocent

Ashley Innocent

19 March 2025

How to Run Mistral Small 3.1 Locally Using Ollama: A Step-by-Step Guide

Running advanced AI models locally offers developers and tech enthusiasts unparalleled control, privacy, and customization options. If you're eager to harness the power of cutting-edge artificial intelligence on your own machine, Mistral Small 3.1, combined with Ollama, provides an excellent solution. Mistral Small 3.1 is a state-of-the-art language model developed by Mistral AI, boasting 24 billion parameters and top-tier performance in its weight class. Meanwhile, Ollama simplifies the process of deploying such large language models (LLMs) locally, making it accessible even to those with modest technical setups. In this comprehensive guide, we’ll walk you through every step to get Mistral Small 3.1 running on your system using Ollama. Plus, we’ll show you how to enhance your experience by integrating Apidog, a powerful tool for API development and testing.

Why go local? By running Mistral Small 3.1 on your machine, you keep your data private, avoid cloud costs, and gain the flexibility to tweak the model for your needs whether that’s building a chatbot, generating code, or processing multilingual text.

💡
To make interacting with your local model even smoother, we recommend using Apidog. This free API tool lets you test and debug your model’s endpoints effortlessly. Download Apidog for free today and streamline your workflow as you explore Mistral Small 3.1’s capabilities!
button

Why Choose Mistral Small 3.1 and Ollama?

Before jumping into the setup, let’s explore why Mistral Small 3.1 and Ollama make such a compelling pair. Mistral Small 3.1, released under the open-source Apache 2.0 license, delivers exceptional performance for its size. With a 128k-token context window, it handles long conversations or documents with ease. It also supports multiple languages and multimodal inputs, making it versatile for tasks like text generation, translation, or even image-caption analysis. Developers love its efficiency, as it rivals larger models while running on relatively modest hardware.

Ollama, on the other hand, is a lightweight tool designed to run LLMs locally. It abstracts away much of the complexity think dependency management or GPU configuration so you can focus on using the model rather than wrestling with setup hurdles. Together, Mistral Small 3.1 and Ollama empower you to deploy a high-performing AI model without relying on cloud services.

Installing Ollama on Your Machine

Ollama simplifies running LLMs locally, and installing it is straightforward. Follow these steps to get it up and running:

Install Ollama:  Download it from Ollama’s official website and follow the prompts.

Verify Installation: Confirm Ollama is installed correctly by checking its version:

ollama --version

You should see a version number (e.g., 0.1.x). If not, troubleshoot by ensuring your PATH includes Ollama’s binary.

After installing Ollama, you’re one step closer to running Mistral Small 3.1. Next, you need to fetch the model itself.

Downloading Mistral Small 3.1 Model Weights

Open up your terminal and type:

ollama pull cnjack/mistral-samll-3.1

This downloads the model weights to your local storage (a community version of mistral small 3.1). Link: https://ollama.com/cnjack/mistral-samll-3.1

Depending on your internet speed, this could take 15-30 minutes due to the 50GB+ file size.

Verify Download: Run ollama list again. You should see mistral-small-3.1 listed, indicating it’s ready to use.

Now that you have the model, let’s load it into Ollama and start exploring its capabilities.

Loading Mistral Small 3.1 into Ollama

Loading the model prepares it for inference. Ollama handles the heavy lifting, so this step is quick:

  1. Load the Model: Execute this command to load Mistral Small 3.1 into memory:
ollama run cnjack/mistral-samll-3.1:24b-it-q4_K_S

The first time you run this, Ollama initializes the model, which may take a few minutes depending on your hardware. Subsequent runs are faster.

  1. Test It Out: Once loaded, Ollama drops you into an interactive prompt. Type a simple query:
Hello, how does Mistral Small 3.1 work?

The model responds directly in the terminal, showcasing its text generation prowess.

At this point, Mistral Small 3.1 is operational. However, to unlock its full potential especially for programmatic access, let’s explore how to interact with it further.

Interacting with Mistral Small 3.1 Locally

You can engage with Mistral Small 3.1 in two primary ways: direct command-line inference or via an API server. Both methods leverage Ollama’s flexibility, and we’ll tie in Apidog for the API approach.

Method 1: Direct Inference via Command Line

For quick tests or one-off generations, use Ollama’s run command with a prompt:

ollama run cnjack/mistral-samll-3.1:24b-it-q4_K_S "Write a short poem about AI."

The model processes the input and outputs a response, such as:

Artificial minds in circuits deep,
Learning patterns while we sleep,
Voices of code, they softly speak,
A future bright, their thoughts we keep.

This method suits experimentation but lacks scalability. For more robust applications, set up an API server.

Method 2: Running Mistral Small 3.1 as an API Server

To integrate Mistral Small 3.1 into projects or test it systematically, run it as a local API server:

  1. Start the Server: Launch Ollama in server mode:
ollama serve

This starts a REST API on http://localhost:11434 (Ollama’s default port). Keep this running in one terminal.

  1. Test the API: In a new terminal, use curl to send a request:
curl http://localhost:11434/api/generate -d '{"model": "mistral-small-3.1", "prompt": "Explain AI in one sentence."}'

The response might look like:

{
  "response": "AI is the simulation of human intelligence by machines, enabling them to learn, reason, and perform tasks autonomously."
}

This API setup opens the door to advanced interactions, which is where Apidog shines.

Using Apidog to Interact with Mistral Small 3.1’s API

Apidog simplifies API testing and development, making it perfect for querying your local Mistral Small 3.1 server. Here’s how to set it up:

Install Apidog:

button

Create a New Project:

Add an API Request:

Configure the Request Body:

{
  "model": "mistral-small-3.1",
  "prompt": "Generate a Python script to print 'Hello, AI!'",
  "stream": false
}

Send and Review:

{
  "response": "print('Hello, AI!')"
}

Apidog’s intuitive interface lets you experiment with prompts, monitor response times, and even automate tests ideal for developers building on Mistral Small 3.1.

Troubleshooting Common Issues

Running a 24B-parameter model locally can hit snags. Here are solutions to frequent problems:

With these tips, you can resolve most issues and keep Mistral Small 3.1 humming along.

Optimizing and Expanding Your Setup

Now that Mistral Small 3.1 runs locally, consider enhancing it:

These steps unlock Mistral Small 3.1’s full potential, adapting it to your unique projects.

Conclusion

Running Mistral Small 3.1 locally using Ollama is a game-changer for developers and AI enthusiasts. This guide has walked you through the process from gathering prerequisites to installing Ollama, downloading the model, and interacting with it via command line or API. By adding Apidog into the mix, you streamline API testing and open new possibilities for integration. With its 24 billion parameters, 128k-token context, and open-source flexibility, Mistral Small 3.1 offers immense power at your fingertips. Start experimenting today, and see how this duo can transform your projects.

button

Explore more

How to Integrate Claude Code with VSCode and JetBrains?

How to Integrate Claude Code with VSCode and JetBrains?

Learn how to integrate Claude Code with VSCode and JetBrains in this technical guide. Step-by-step setup, configuration, and usage tips for developers. Boost your coding with Claude Code!

10 June 2025

How to Generate Google Veo 3 Prompt Theory Videos (Google Veo 3 Prompt Guide)

How to Generate Google Veo 3 Prompt Theory Videos (Google Veo 3 Prompt Guide)

Learn how to craft effective prompts for Google Veo 3 to generate dynamic and expressive videos.

10 June 2025

How to Write technical documentations with examples

How to Write technical documentations with examples

Think of technical docs as the handshake between the people building the product and the folks using it. Whether you’re writing API guides, user manuals, or onboarding instructions for new team members, keeping things clear and simple makes life way easier for everyone involved. Nobody wants to dig through confusing or incomplete docs when they just want to get stuff done. These days, good documentation isn’t just a nice-to-have — it’s basically a must-have if you want your product to actually g

9 June 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs