How to Run Phi-4 Reasoning (with Free API, Locally with Ollama)

The field of Artificial Intelligence is rapidly evolving, with large language models (LLMs) often taking center stage. However, a parallel revolution is happening in the realm of Small Language Models (SLMs). Microsoft Research has been a key player in this space, notably with their Phi series. Building on the success of models like Phi-3, Microsoft recently unveiled two new powerhouses: Phi-4-reasoning and Phi-4-reasoning-plus. These models represent a significant leap forward, demonstrating that smaller, more efficient models can rival their larger counterparts in complex reasoning tasks.

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!

button

Phi-4 Has Reasoning Models Now

The journey began with Phi-4, a 14-billion parameter dense decoder-only Transformer model. While already capable, Microsoft sought to imbue it with stronger reasoning abilities, particularly in math, science, and coding domains. This led to the development of Phi-4-reasoning and its enhanced variant, Phi-4-reasoning-plus.

Both models share the Phi-4 architecture but undergo specialized post-training focused on reasoning. The key differentiator lies in the training methodology:

Phi-4-reasoning: This model is created by supervised fine-tuning (SFT) Phi-4 on a meticulously curated dataset. This dataset blends high-quality filtered public data with synthetic prompts, focusing specifically on chain-of-thought (CoT) traces. CoT reasoning involves breaking down complex problems into intermediate steps, mimicking a more human-like thought process. The SFT dataset also incorporates alignment data to ensure safety and responsible AI practices. Microsoft leveraged reasoning demonstrations from OpenAI's o3-mini as part of this curated data.
Phi-4-reasoning-plus: This model takes Phi-4-reasoning a step further by incorporating Reinforcement Learning (RL). The RL phase allows the model to learn to utilize more inference-time compute, generating more detailed and often longer reasoning chains (approximately 1.5 times more tokens than the base Phi-4-reasoning). This additional computational effort translates directly into higher accuracy on complex tasks, albeit with a potential increase in latency.

Both models boast a 32k token context length, enabling them to handle complex prompts and generate extensive reasoning processes. Interestingly, the model card for Phi-4-reasoning-plus notes promising results when extending the context window to 64k tokens during experiments, maintaining coherence over longer sequences.

Phi-4 Reasoning & Phi-4 Reasoning Plus & Phi-4-Reasoning-Mini Benchmarks

Phi-4-Reasoning & Phi-4-Reasoning-Plus Benchmarks

The true measure of these models lies in their performance. Microsoft evaluated them against a suite of challenging benchmarks, particularly those focused on reasoning:

Mathematical Reasoning: AIME (American Invitational Mathematics Examination) qualifiers from 2022-2025, OmniMath (a collection of over 4000 olympiad-level problems).
Scientific Reasoning: GPQA-Diamond (graduate-level science questions).
Coding & Algorithmic Problem Solving: LiveCodeBench (competitive coding contest problems), 3SAT (Satisfiability), TSP (Traveling Salesman Problem).
Planning & Spatial Understanding: BA Calendar, Maze, SpatialMap.

The results, as presented in the technical reports and model cards, are impressive:

Model	AIME 24	AIME 25	OmniMath	GPQA-D	LiveCodeBench (8/1/24–2/1/25)
Phi-4-reasoning	75.3	62.9	76.6	65.8	53.8
Phi-4-reasoning-plus	81.3	78.0	81.9	68.9	53.1
OpenThinker2-32B	58.0	58.0	—	64.1	—
QwQ 32B	79.5	65.8	—	59.5	63.4
EXAONE-Deep-32B	72.1	65.8	—	66.1	59.5
DeepSeek-R1-Distill-70B	69.3	51.5	63.4	66.2	57.5
DeepSeek-R1	78.7	70.4	85.0	73.0	62.8
o1-mini	63.6	54.8	—	60.0	53.8
o1	74.6	75.3	67.5	76.7	71.0
o3-mini	88.0	78.0	74.6	77.7	69.5
Claude-3.7-Sonnet	55.3	58.7	54.6	76.8	—
Gemini-2.5-Pro	92.0	86.7	61.1	84.0	69.2

(Table data sourced from Hugging Face model cards & user input)

Key takeaways from the benchmarks:

Outperforming Larger Models: Both Phi-4-reasoning models significantly outperform much larger open-weight models like the DeepSeek-R1-Distill-70B (which is 5x larger) on many reasoning benchmarks.
Competitive with Giants: They approach or even surpass the performance of models like the full DeepSeek-R1 (a 671B MoE model) and OpenAI's o1-mini and o1 on specific tasks (e.g., AIME 25).
Reasoning-Plus Advantage: Phi-4-reasoning-plus consistently scores higher than Phi-4-reasoning across the board, validating the effectiveness of the additional RL training for accuracy.
General Capabilities: While trained for reasoning, the models also show significant improvements over the base Phi-4 on general benchmarks like instruction following (IFEval), coding (HumanEvalPlus), and even safety (ToxiGen), indicating strong generalization.

These results underscore Microsoft's central thesis: high-quality, reasoning-focused data and targeted fine-tuning can allow smaller models to achieve remarkable reasoning capabilities previously thought exclusive to massive models.

Running Phi-4-reasoning Locally with Ollama (Step-by-Step)

One of the major advantages of SLMs is their potential for local execution. Ollama, a popular platform for running LLMs locally, provides out-of-the-box support for the Phi-4 reasoning family.

Follow these steps to run them on your machine:

Step 1: Install Ollama
If you haven't already, go to ollama.com and download the installer for your operating system (macOS, Windows, or Linux). Run the installer.

Step 2: Pull the Models via Terminal
Open your command prompt or terminal application. Use the appropriate command below to download the desired model. This might take some time depending on your internet speed.

To download Phi-4-reasoning:
ollama pull phi4-reasoning
To download Phi-4-reasoning-plus:
ollama pull phi4-reasoning:plus
(Note: The plus variant is specified using a tag after the colon.)

Step 3: Run the Model for Interaction
Once the download is complete, you can start chatting with the model directly from your terminal:

To run Phi-4-reasoning:
ollama run phi4-reasoning
To run Phi-4-reasoning-plus:
ollama run phi4-reasoning:plus

After running the command, you'll see a prompt (like >>> or Send a message...) where you can type your questions.

Step 4: Use the Recommended Prompt Structure (Crucial!)
These models perform best when guided by a specific system prompt and structure. When interacting (especially for complex tasks), structure your input like this:

Start with the System Prompt: Before your actual question, provide the system prompt that tells the model how to reason.
Use ChatML Format: Although Ollama's run command simplifies this, internally the model expects <|im_start|>system, <|im_start|>user, <|im_start|>assistant tags.
Expect <think> and <solution>: The model is trained to output its reasoning process within <think>...</think> tags and the final answer within <solution>...</solution> tags.

Recommended System Prompt:

Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:

(While you can't easily prefix the system prompt in the basic ollama run command, be aware of this structure when interpreting outputs or using Ollama's API/libraries where you can set system prompts explicitly.)

Hardware Considerations: Remember that 14B models need substantial RAM/VRAM. The default quantized versions (~11GB) help, but check Ollama's resource requirements.

Accessing Phi-4-reasoning via Free API using OpenRouter (Step-by-Step)

For cloud-based access or integration into applications without local hardware constraints, OpenRouter offers a free API tier for Phi-4-reasoning.

Here’s how to use it:

Step 1: Get an OpenRouter API Key

Go to openrouter.ai.
Sign up or log in.
Navigate to your settings/API keys section and create a new API key. Copy it securely.

Step 2: Install the OpenAI Python Library
If you don't have it, install the library using pip:
pip install openai

Step 3. Setting up Apidog for Testing

Apidog, a robust API testing platform, simplifies interacting with the Phi-4-reasoning APIs. Its intuitive interface lets you send requests, view responses, and debug issues efficiently. Follow these steps to configure it.

button

Start by downloading Apidog and installing it on your system. Launch the application and create a new project.

Inside this project, add a new request. Set the method to POST and input the OpenRouter endpoint: https://openrouter.ai/api/v1/chat/completions.

Next, configure the headers. Add an “Authorization” header with the value Bearer YOUR_API_KEY, replacing YOUR_API_KEY with the key from OpenRouter. This authenticates your request. Then, switch to the body tab, select JSON format, and craft your request payload. Here’s an example for microsoft/phi-4-reasoning:free:

{
  "model": "microsoft/phi-4-reasoning:free",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ]
}

Click “Send” in Apidog to execute the request. The response pane will display the model’s output, typically including generated text and metadata like token usage. Apidog’s features, such as saving requests or organizing them into collections, enhance your workflow. With this setup, you can now explore the Qwen 3 models’ capabilities.

Conclusion

Phi-4-reasoning and Phi-4-reasoning-plus mark a significant advancement in the capabilities of small language models. By focusing on high-quality reasoning data and employing sophisticated fine-tuning techniques like SFT and RL, Microsoft has demonstrated that remarkable reasoning performance is achievable without resorting to massive parameter counts. Their availability through platforms like Ollama for local use and OpenRouter for free API access democratizes access to powerful reasoning tools. As the development of SLMs continues, the Phi-4 reasoning family stands out as a testament to the power of efficient, focused AI.