How to Use the Mistral Medium 3.5 API?

Complete Mistral Medium 3.5 API guide: endpoints, auth, Python and Node examples, vision input, tool calling, JSON mode, streaming, and an Apidog workflow for testing.

Ashley Innocent

Ashley Innocent

30 April 2026

How to Use the Mistral Medium 3.5 API?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

Mistral released Medium 3.5 on April 29, 2026. The API model ID is mistral-medium-3.5, the endpoint is https://api.mistral.ai/v1/chat/completions, and the request shape is close enough to the OpenAI standard that swapping base URLs from another provider takes one line of code. The headline numbers are a 256K context window, native vision, function calling, 24-language support, and 77.6% on SWE-Bench Verified; figures that put it in the same conversation as GPT-5.5 and DeepSeek V4 for the kind of agentic, code-heavy work most teams are wiring up right now.

This guide covers authentication, every parameter that matters, Python and Node examples, vision input, tool calling, JSON mode, streaming, error handling, and an Apidog workflow that keeps cost visible while you iterate on prompts. For comparable model guides, see how to use the DeepSeek V4 API and how to use the GPT-5.5 API.

button

TL;DR

What changed in Medium 3.5

Medium 3 shipped earlier in the year as a text-only model with a 128K context. Medium 3.5 is a different beast. It is Mistral’s first flagship merged model: instruction following, reasoning, and coding live in a single set of weights, so you no longer pick between a chat checkpoint and a reasoning checkpoint. Vision is native, the context doubles to 256K, and function calling is wired in at the model level instead of bolted on through a separate API surface.

Three numbers anchor the upgrade. SWE-Bench Verified at 77.6% lands in the same band as the top frontier models for code patching. τ³-Telecom at 91.4 puts it ahead of most generalist models on multi-turn agentic dialogue. The 256K context covers a full mid-sized codebase or a several-hour transcript without truncation. None of these are marketing rounding errors; they map directly to whether the model can finish your task without a second pass.

The pricing shift is the part to budget for. Medium 3 sat at $0.40 per million input tokens and $2.00 per million output. Medium 3.5 jumps to $1.5 input and $7.5 output, roughly 4x. That is the cost of the merged-checkpoint approach plus vision plus the longer context. Treat the older Medium 3 as the bulk-throughput option and Medium 3.5 as the “I need this answer right” tier.

Prerequisites

Before the first call, line up four things.

Export the key once:

export MISTRAL_API_KEY="..."

Endpoint and authentication

Mistral’s La Plateforme exposes everything through one base URL.

POST https://api.mistral.ai/v1/chat/completions

Authentication is a bearer token on the Authorization header. The minimum viable request looks like this:

curl https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-medium-3.5",
    "messages": [
      {"role": "user", "content": "Explain dense merged checkpoints in two sentences."}
    ]
  }'

A successful response returns a JSON body with a choices array, a usage block broken down into prompt_tokens, completion_tokens, and total_tokens, and an id you can carry forward for tracing. Failures return an error envelope with code and message. The shape matches the OpenAI envelope closely enough that any error parser you already have works without modification.

Request parameters

Every field maps to either cost or behavior. Here is the map for mistral-medium-3.5.

Parameter Type Values Notes
model string mistral-medium-3.5 Required.
messages array role/content pairs Required. Same schema as OpenAI.
temperature float 0 to 1.5 Mistral recommends 0.7 for general use, 0.3 for code.
top_p float 0 to 1 Default 1.0.
max_tokens int 1 to context limit Caps output length.
stream bool true or false Enables SSE streaming.
tools array OpenAI tool spec Native function calling.
tool_choice string or object auto, any, none, or specific tool Controls tool use. Note: any instead of required.
response_format object {"type": "json_object"} or JSON schema Structured output.
random_seed int any int For reproducibility. Note: not seed.
safe_prompt bool true or false Adds Mistral’s safety preamble.
presence_penalty float -2 to 2 Penalize repeated topics.
frequency_penalty float -2 to 2 Penalize repeated tokens.

Two small differences trip people up when migrating from OpenAI: tool_choice="any" means “force a tool call” (OpenAI uses required), and the seed parameter is random_seed (OpenAI uses seed). Everything else lines up.

Python client

Mistral ships an official Python SDK that matches the API one-to-one.

import os
from mistralai import Mistral

client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "Reply in code only."},
        {"role": "user", "content": "Write a Rust function that debounces events."},
    ],
    temperature=0.3,
    max_tokens=2048,
)

print("Content:", response.choices[0].message.content)
print("Total tokens:", response.usage.total_tokens)
print("Cost estimate (USD):",
      response.usage.prompt_tokens * 1.5 / 1_000_000 +
      response.usage.completion_tokens * 7.5 / 1_000_000)

If you already have an OpenAI-shape codebase, the OpenAI Python SDK works against the Mistral endpoint with two changes: the base URL and the model ID.

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["MISTRAL_API_KEY"],
    base_url="https://api.mistral.ai/v1",
)

response = client.chat.completions.create(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Hello, Mistral."}],
)

The OpenAI SDK route is the path of least resistance for teams running provider-agnostic code; the native mistralai SDK is the path that exposes Mistral-specific features cleanly, so pick based on whether you plan to use vision and structured outputs heavily.

Node client

Same two-track choice on Node. The native SDK:

import { Mistral } from "@mistralai/mistralai";

const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });

const response = await client.chat.complete({
  model: "mistral-medium-3.5",
  messages: [
    { role: "user", content: "Explain dense merged checkpoints in plain English." },
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);
console.log("Usage:", response.usage);

The OpenAI SDK route, for parity with existing code:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.MISTRAL_API_KEY,
  baseURL: "https://api.mistral.ai/v1",
});

const response = await client.chat.completions.create({
  model: "mistral-medium-3.5",
  messages: [{ role: "user", content: "Hello, Mistral." }],
});

Streaming responses

Set stream: true and iterate the SSE chunks. The shape matches OpenAI exactly, and the cumulative reasoning trace is interleaved into choices[].delta.content instead of being separated into a sidecar field.

stream = client.chat.stream(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Stream a 300-word essay on merged checkpoints."}],
)

for chunk in stream:
    delta = chunk.data.choices[0].delta.content or ""
    print(delta, end="", flush=True)

For terminal output, the Mistral stream pacing is faster than DeepSeek V4-Pro on the same length of prompt and roughly even with GPT-5.5, based on side-by-side runs through the Apidog response viewer.

Tool calling

Medium 3.5 ships with native function calling. Functions defined in the tools array become callable, and the model picks when to invoke them.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Return the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "unit": {"type": "string", "enum": ["c", "f"]},
            },
            "required": ["city"],
        },
    },
}]

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{"role": "user", "content": "Weather in Lagos in Celsius?"}],
    tools=tools,
    tool_choice="auto",
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)

From there, run the function locally, append the result as a role: "tool" message, and call the API again to continue the loop. The pattern is identical to the OpenAI tool-use loop. The agentic capability shows in the τ³-Telecom score; in practice, that translates to fewer wasted hops on multi-turn workflows where the model has to decide between calling a tool, asking the user, and answering directly.

JSON mode and structured output

For schema-validated output, pass a JSON schema in response_format.

schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "release_note",
        "schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "date": {"type": "string"},
                "bullets": {"type": "array", "items": {"type": "string"}},
            },
            "required": ["title", "date", "bullets"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[
        {"role": "system", "content": "Reply with a single JSON object matching the schema."},
        {"role": "user", "content": "Summarize today's Mistral Medium 3.5 release."},
    ],
    response_format=schema,
)

The strict mode enforces the schema at decode time, so you do not need to add a Pydantic or Zod parse step on the client; the response either matches the schema or the call fails with a structured error. For lower-friction cases where you only need valid JSON of any shape, set response_format={"type": "json_object"} and validate on the client side.

Vision input

Medium 3.5’s vision encoder was trained from scratch to handle variable image sizes and aspect ratios; you do not need to pre-resize anything. Pass image content alongside text in the messages array.

response = client.chat.complete(
    model="mistral-medium-3.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image and what is it doing wrong?"},
            {"type": "image_url", "image_url": "https://example.com/diagram.png"},
        ],
    }],
)

Image inputs are billed as input tokens at the same $1.5 per million rate; the exact token count per image varies with resolution and is reported in the usage.prompt_tokens field. For high-volume image workloads, log the per-image token cost early and decide whether to compress, crop, or skip frames before scaling.

Build the collection in Apidog

Replaying requests from the terminal burns credits and hides the diff between runs. The workflow that survives real use:

  1. Download Apidog and create a project.
  2. Add an environment with {{MISTRAL_API_KEY}} stored as a secret variable so it never lands in shared exports.
  3. Save a POST request to {{BASE_URL}}/chat/completions with the Authorization: Bearer {{MISTRAL_API_KEY}} header.
  4. Parameterize model, temperature, and tool_choice so you can A/B across variants without duplicating requests.
  5. Use the response viewer to inspect usage on every run. Add a small post-response script that multiplies prompt_tokens * 1.5 / 1_000_000 + completion_tokens * 7.5 / 1_000_000 so the per-call cost shows up next to every result.

Teams already running the matching DeepSeek V4 API collection in Apidog can duplicate it, swap the base URL to https://api.mistral.ai/v1, change the model ID to mistral-medium-3.5, and run head-to-head prompts across both providers in minutes. The same pattern applies for comparing against GPT-5.5.

Error handling

The error envelope follows OpenAI conventions closely. The codes you will hit first:

Code Meaning Fix
400 Bad request Validate JSON schema, especially messages and tools.
401 Invalid key Regenerate at console.mistral.ai.
402 Payment required Top up the account or add a card.
403 Model not allowed Check the key’s project scope and the model ID spelling.
422 Parameter out of range max_tokens exceeds context, or tool_choice is malformed.
429 Rate limit Back off, then retry with exponential jitter.
500 Server error Retry once. If it repeats, check the status page.
503 Overloaded Fall back to Mistral Medium 3 or wait 30 seconds.

Wrap calls in a retry helper that handles 429 and 5xx with exponential backoff. Do not retry 4xx errors automatically; those are logic bugs, not transient failures. Apidog’s response viewer makes it trivial to spot a malformed tools payload because the offending field is highlighted in the request body next to the error.

Cost control patterns

The 4x price jump from Medium 3 to Medium 3.5 punishes lazy routing. Five patterns keep the bill predictable.

  1. Default to Medium 3, escalate to Medium 3.5. Run a cheap first pass on Medium 3 and route hard prompts to 3.5 only when the cheap pass returns low confidence or fails a validator.
  2. Cap max_tokens. Most answers fit in 2,000 output tokens. The 256K context window is for input bulk, not output bulk; output is the expensive side at $7.5 per million.
  3. Keep system prompts lean. Every system-prompt token is billed on every call; trimming a 2K-token preamble down to 500 tokens cuts your input bill by 75% on a high-volume endpoint.
  4. Log usage on every call. Ship prompt_tokens, completion_tokens, and the per-call USD estimate to your observability stack. An alert on a sudden output-token spike catches prompts that drifted into chain-of-thought territory.
  5. Use vision selectively. Image tokens add up fast. Crop to the relevant region before sending, and downscale to the lowest resolution that still answers the question.

Comparing Medium 3.5 to other Mistral tiers

Mistral’s lineup as of late April 2026:

Model Context Input $/M Output $/M Vision Best for
mistral-small 32K $0.10 $0.30 No High-volume classification, light chat
mistral-medium-3 128K $0.40 $2.00 No Bulk throughput, longer chat
mistral-medium-3.5 256K $1.5 $7.5 Yes Reasoning, code, vision, agents
mistral-large 128K $2.00 $6.00 Limited Frontier-tier text reasoning

Medium 3.5 is the only tier that combines the long context, vision, and merged reasoning capabilities. Large-tier offers a different cost curve (cheaper output, more expensive input) and beats 3.5 on a few text-only benchmarks; pick by workload, not by tier name.

Migrating from another provider

The migration is mostly a base-URL change.

From OpenAI:

- base_url="https://api.openai.com/v1"
- model="gpt-5.5"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"

From DeepSeek:

- base_url="https://api.deepseek.com/v1"
- model="deepseek-v4-pro"
+ base_url="https://api.mistral.ai/v1"
+ model="mistral-medium-3.5"

Two gotchas to watch:

Run the diff through your existing test suite before flipping production traffic. Better yet, mirror traffic to Mistral in shadow mode for a day, log both responses, and diff them in Apidog before promoting.

Real-world use cases

A few patterns where Medium 3.5 already pays for itself:

FAQ

What is the model ID for Mistral Medium 3.5 on the API?mistral-medium-3.5. The Hugging Face checkpoint is published as mistralai/Mistral-Medium-3.5-128B. If you serve the open weights yourself with vLLM or Unsloth, use the Hugging Face ID. For the hosted API, use the short ID.

Is Medium 3.5 OpenAI-compatible?Close, but not identical. The endpoint shape, headers, and most parameters match OpenAI exactly, so the OpenAI Python and Node SDKs work with a base URL override. The two divergences are tool_choice="any" (vs OpenAI’s required) and random_seed (vs OpenAI’s seed).

Can I run Medium 3.5 locally?Yes. The weights are open under a Modified MIT License with a large-revenue carve-out. The 128B parameter count means you need significant GPU memory; quantized GGUF builds from unsloth/Mistral-Medium-3.5-128B-GGUF run on a single high-end consumer card. The patterns from how to run DeepSeek V4 locally translate directly.

Does it support streaming with tool calls?Yes. Streaming returns tool-call argument fragments incrementally on delta.tool_calls, the same shape as OpenAI’s streamed tool-call format. The fragments accumulate into a complete JSON object once the stream closes.

How do I count input tokens before sending?Use the mistral-common Python package’s tokenizer for exact counts. It is the same tokenizer the API uses, so byte-for-byte counts match usage.prompt_tokens on the response.

What context length should I plan for in production?The 256K window is the cap, but pricing scales linearly. A 200K-token call costs $0.30 in input alone before the model even starts generating. Most production workloads fit comfortably under 32K; reach for the long context only when the task genuinely needs it.

Is there a free tier?Mistral does not advertise a permanent free tier, but new accounts typically come with a small trial credit. For sustained free experimentation on similar tier models, see how to use the DeepSeek V4 API for free.

button

Explore more

How to Extend Your Claude Fable 5 Usage With the Perfect Prompt

How to Extend Your Claude Fable 5 Usage With the Perfect Prompt

Get more from every Claude Fable 5 call. Turn Anthropic's official prompting guide into a measurable playbook, then test effort and token use in Apidog.

12 June 2026

How to Test an AI Agent's Tool Calls with Apidog (Before They Break in Production)

How to Test an AI Agent's Tool Calls with Apidog (Before They Break in Production)

A reliable AI agent is a tested tool layer, not a smarter prompt. Build an agent and use Apidog to mock, assert, and test every tool call, including the failure paths.

12 June 2026

Claude Fable 5 & Mythos API Changes: What Still Works (and How to Test It)

Claude Fable 5 & Mythos API Changes: What Still Works (and How to Test It)

Claude Fable 5 and Mythos changed data retention and guardrails, not the API contract. See what still works for programmatic access and how to test it in Apidog.

12 June 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

How to Use the Mistral Medium 3.5 API?