Apidog

All-in-one Collaborative API Development Platform

API Design

API Documentation

API Debugging

API Mocking

API Automated Testing

Skywork-OR1-32B: Open Source SOTA Model Better Than Deepseek R1

Ashley Innocent

Ashley Innocent

Updated on April 13, 2025

On April 13, 2025, SkyworkAI released the Skywork-OR1 (Open Reasoner 1) series, comprising three models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview, and Skywork-OR1-32B-Preview.

  • These models are trained using large-scale rule-based reinforcement learning specifically targeting mathematical and code reasoning capabilities.
  • The models are built upon DeepSeek's distilled architectures: the 7B variants use DeepSeek-R1-Distill-Qwen-7B as their base, while the 32B model builds upon DeepSeek-R1-Distill-Qwen-32B.
💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Skywork-OR1-32B: Not Just Another Open Source Reasoning Model

The Skywork-OR1-32B-Preview model contains 32.8 billion parameters and utilizes the BF16 tensor type for numerical precision. The model is distributed in the safetensors format and is based on the Qwen2 architecture. According to the model repository, it maintains the same architecture as the DeepSeek-R1-Distill-Qwen-32B base model, but with specialized training for mathematical and coding reasoning tasks.

Let's take a look at some of the Skywork model families' basic technical info:

Skywork-OR1-32B-Preview

  • Parameter count: 32.8 billion
  • Base model: DeepSeek-R1-Distill-Qwen-32B
  • Tensor type: BF16
  • Specialization: General-purpose reasoning
  • Key performance:
  • AIME24: 79.7 (Avg@32)
  • AIME25: 69.0 (Avg@32)
  • LiveCodeBench: 63.9 (Avg@4)

The 32B model demonstrates a 6.8-point improvement on AIME24 and a 10.0-point improvement on AIME25 over its base model. It achieves parameter efficiency by delivering performance comparable to the 671B parameter DeepSeek-R1 with only 4.9% of the parameters.

Skywork-OR1-Math-7B

  • Parameter count: 7.62 billion
  • Base model: DeepSeek-R1-Distill-Qwen-7B
  • Tensor type: BF16
  • Specialization: Mathematical reasoning
  • Key performance:
  • AIME24: 69.8 (Avg@32)
  • AIME25: 52.3 (Avg@32)
  • LiveCodeBench: 43.6 (Avg@4)

The model outperforms the base DeepSeek-R1-Distill-Qwen-7B significantly on mathematical tasks (69.8 vs. 55.5 on AIME24, 52.3 vs. 39.2 on AIME25), demonstrating the effectiveness of the specialized training approach.

Skywork-OR1-7B-Preview

  • Parameter count: 7.62 billion
  • Base model: DeepSeek-R1-Distill-Qwen-7B
  • Tensor type: BF16
  • Specialization: General-purpose reasoning
  • Key performance:
  • AIME24: 63.6 (Avg@32)
  • AIME25: 45.8 (Avg@32)
  • LiveCodeBench: 43.9 (Avg@4)

While showing less mathematical specialization than the Math-7B variant, this model offers better balanced performance between mathematical and coding tasks.

Traning Dataset of Skywork-OR1-32B

The Skywork-OR1 training dataset contains:

  • 110,000 verifiable and diverse mathematical problems
  • 14,000 coding questions
  • All sourced from open-source datasets

Data Processing Pipeline

  1. Model-aware Difficulty Estimation: Each problem undergoes difficulty scoring relative to the model's current capabilities, allowing for targeted training.
  2. Quality Assessment: Rigorous filtering is applied prior to training to ensure dataset quality.
  3. Offline and Online Filtering: A two-stage filtering process is implemented to:
  • Remove suboptimal examples before training (offline)
  • Dynamically adjust problem selection during training (online)

4. Rejection Sampling: This technique is employed to control the distribution of training examples, helping maintain an optimal learning curve.

Advanced Reinforcement Learning Training Pipeline

The models utilize a customized version of GRPO (Generative Reinforcement via Policy Optimization) with several technical enhancements:

  1. Multi-stage Training Pipeline: The training proceeds through distinct phases, each building on previously acquired capabilities. The GitHub repository includes a graph plotting AIME24 scores against training steps, demonstrating clear performance improvements at each stage.
  2. Adaptive Entropy Control: This technique dynamically adjusts the exploration-exploitation trade-off during training, encouraging broader exploration while maintaining convergence stability.
  3. Custom Fork of VERL Framework: The models are trained using a modified version of the VERL project, specifically adapted for reasoning tasks.

You can read the full paper here.

Skywork-OR1-32B Benchmarks

Technical specifications:

  • Parameter count: 32.8 billion
  • Tensor type: BF16
  • Model format: Safetensors
  • Architecture family: Qwen2
  • Base model: DeepSeek-R1-Distill-Qwen-32B

The Skywork-OR1 series introduces Avg@K as their primary evaluation metric instead of the conventional Pass@1. This metric calculates average performance across multiple independent attempts (32 for AIME tests, 4 for LiveCodeBench), reducing variance and providing a more reliable measure of reasoning consistency.

Below are the exact benchmark results for all models in the series:

Model AIME24 (Avg@32) AIME25 (Avg@32) LiveCodeBench (8/1/24-2/1/25) (Avg@4)
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2 37.6
Light-R1-7B-DS 59.1 44.3 39.5
DeepSeek-R1-Distill-Qwen-32B 72.9 59.0 57.2
TinyR1-32B-Preview 78.1 65.3 61.6
QwQ-32B 79.5 65.3 61.6
DeepSeek-R1 79.8 70.0 65.9
Skywork-OR1-Math-7B 69.8 52.3 43.6
Skywork-OR1-7B-Preview 63.6 45.8 43.9
Skywork-OR1-32B-Preview 79.7 69.0 63.9

The data shows that Skywork-OR1-32B-Preview performs at near-parity with DeepSeek-R1 (79.7 vs. 79.8 on AIME24, 69.0 vs. 70.0 on AIME25, and 63.9 vs. 65.9 on LiveCodeBench), despite the latter having 20 times more parameters (671B vs. 32.8B).

The Skywork-OR1 models can be implemented using the following technical specifications:

How to Test the Skywork-OR1 Models

Here are the Skywork-OR1-32B, Skywork-OR1-7B, and Skywork-OR1-Math-7B Hugging Face model cards:

Skywork/Skywork-OR1-32B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-7B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-Math-7B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

To run the Evaluation Scripts, take the following steps. First:

Docker Environment:

docker pull whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
docker run --runtime=nvidia -it --rm --shm-size=10g --cap-add=SYS_ADMIN -v <path>:<path> image:tag

Conda Environment Setup:

conda create -n verl python==3.10
conda activate verl
pip3 install torch==2.4.0 --index-url <https://download.pytorch.org/whl/cu124>
pip3 install flash-attn --no-build-isolation
git clone <https://github.com/SkyworkAI/Skywork-OR1.git>
cd Skywork-OR1
pip3 install -e .

For reproducing AIME24 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-32B-Preview \\\\
DATA_PATH=or1_data/eval/aime24.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime24_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_32b.sh

For AIME25 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/aime25.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime25_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

For LiveCodeBench evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/livecodebench/livecodebench_2408_2502.parquet \\\\
SAMPLES=4 \\\\
TASK_NAME=LiveCodeBench_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

The current Skywork-OR1 models are labeled as "Preview" versions, with final releases scheduled to be available within two weeks of the initial announcement. The developers have indicated that additional technical documentation will be released, including:

  1. A comprehensive technical report detailing the training methodology
  2. The Skywork-OR1-RL-Data dataset
  3. Additional training scripts

The GitHub repository notes that the training scripts are "currently being organized and will be available in 1-2 days."

GitHub - SkyworkAI/Skywork-OR1
Contribute to SkyworkAI/Skywork-OR1 development by creating an account on GitHub.

Conclusion: Technical Assessment of Skywork-OR1-32B

The Skywork-OR1-32B-Preview model represents a significant advancement in parameter-efficient reasoning models. With 32.8 billion parameters, it achieves performance metrics nearly identical to the 671 billion parameter DeepSeek-R1 model across multiple benchmarks.

Though not verified yet, these results indicate that for practical applications requiring advanced reasoning capabilities, the Skywork-OR1-32B-Preview offers a viable alternative to significantly larger models, with substantially reduced computational requirements.

Additionally, the open-source nature of these models, along with their evaluation scripts and forthcoming training data, provides valuable technical resources for researchers and practitioners working on reasoning capabilities in language models.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button
Getting Started with Bun APIViewpoint

Getting Started with Bun API

Among the most exciting recent dev tools is Bun, an incredibly fast, all-in-one JavaScript toolkit designed to enhance developer productivity and application performance. Bun isn't just another runtime; it's a cohesive ecosystem encompassing a runtime, bundler, test runner, package manager, and more, all within a single native executable. This guide will walk you through the essentials of Bun, focusing on its core concepts and powerful APIs. 💡Want a great API Testing tool that generates beauti

Mark Ponomarev

May 4, 2025

How to Run Phi-4 Reasoning (with Free API, Locally with Ollama)Viewpoint

How to Run Phi-4 Reasoning (with Free API, Locally with Ollama)

The field of Artificial Intelligence is rapidly evolving, with large language models (LLMs) often taking center stage. However, a parallel revolution is happening in the realm of Small Language Models (SLMs). Microsoft Research has been a key player in this space, notably with their Phi series. Building on the success of models like Phi-3, Microsoft recently unveiled two new powerhouses: Phi-4-reasoning and Phi-4-reasoning-plus. These models represent a significant leap forward, demonstrating th

Emmanuel Mumba

May 2, 2025

How to Activate Python venv (Beginner's Guide)Viewpoint

How to Activate Python venv (Beginner's Guide)

In the dynamic world of Python development, managing dependencies and project environments is crucial for sanity and success. Imagine working on two different projects: one requires an older version of a popular library like requests, while the other needs the very latest features. Installing both system-wide would inevitably lead to conflicts, breakage, and frustration. This is precisely the problem Python virtual environments are designed to solve. This tutorial will guide you through the fun

Stefania Boiko

May 2, 2025