Skywork-OR1-32B: Open Source SOTA Model Better Than Deepseek R1

Ashley Innocent

Ashley Innocent

26 June 2025

Skywork-OR1-32B: Open Source SOTA Model Better Than Deepseek R1

On April 13, 2025, SkyworkAI released the Skywork-OR1 (Open Reasoner 1) series, comprising three models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview, and Skywork-OR1-32B-Preview.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Skywork-OR1-32B: Not Just Another Open Source Reasoning Model

The Skywork-OR1-32B-Preview model contains 32.8 billion parameters and utilizes the BF16 tensor type for numerical precision. The model is distributed in the safetensors format and is based on the Qwen2 architecture. According to the model repository, it maintains the same architecture as the DeepSeek-R1-Distill-Qwen-32B base model, but with specialized training for mathematical and coding reasoning tasks.

Let's take a look at some of the Skywork model families' basic technical info:

Skywork-OR1-32B-Preview

The 32B model demonstrates a 6.8-point improvement on AIME24 and a 10.0-point improvement on AIME25 over its base model. It achieves parameter efficiency by delivering performance comparable to the 671B parameter DeepSeek-R1 with only 4.9% of the parameters.

Skywork-OR1-Math-7B

The model outperforms the base DeepSeek-R1-Distill-Qwen-7B significantly on mathematical tasks (69.8 vs. 55.5 on AIME24, 52.3 vs. 39.2 on AIME25), demonstrating the effectiveness of the specialized training approach.

Skywork-OR1-7B-Preview

While showing less mathematical specialization than the Math-7B variant, this model offers better balanced performance between mathematical and coding tasks.

Traning Dataset of Skywork-OR1-32B

The Skywork-OR1 training dataset contains:

Data Processing Pipeline

  1. Model-aware Difficulty Estimation: Each problem undergoes difficulty scoring relative to the model's current capabilities, allowing for targeted training.
  2. Quality Assessment: Rigorous filtering is applied prior to training to ensure dataset quality.
  3. Offline and Online Filtering: A two-stage filtering process is implemented to:

4. Rejection Sampling: This technique is employed to control the distribution of training examples, helping maintain an optimal learning curve.

Advanced Reinforcement Learning Training Pipeline

The models utilize a customized version of GRPO (Generative Reinforcement via Policy Optimization) with several technical enhancements:

  1. Multi-stage Training Pipeline: The training proceeds through distinct phases, each building on previously acquired capabilities. The GitHub repository includes a graph plotting AIME24 scores against training steps, demonstrating clear performance improvements at each stage.
  2. Adaptive Entropy Control: This technique dynamically adjusts the exploration-exploitation trade-off during training, encouraging broader exploration while maintaining convergence stability.
  3. Custom Fork of VERL Framework: The models are trained using a modified version of the VERL project, specifically adapted for reasoning tasks.

You can read the full paper here.

Skywork-OR1-32B Benchmarks

Technical specifications:

The Skywork-OR1 series introduces Avg@K as their primary evaluation metric instead of the conventional Pass@1. This metric calculates average performance across multiple independent attempts (32 for AIME tests, 4 for LiveCodeBench), reducing variance and providing a more reliable measure of reasoning consistency.

Below are the exact benchmark results for all models in the series:

Model AIME24 (Avg@32) AIME25 (Avg@32) LiveCodeBench (8/1/24-2/1/25) (Avg@4)
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2 37.6
Light-R1-7B-DS 59.1 44.3 39.5
DeepSeek-R1-Distill-Qwen-32B 72.9 59.0 57.2
TinyR1-32B-Preview 78.1 65.3 61.6
QwQ-32B 79.5 65.3 61.6
DeepSeek-R1 79.8 70.0 65.9
Skywork-OR1-Math-7B 69.8 52.3 43.6
Skywork-OR1-7B-Preview 63.6 45.8 43.9
Skywork-OR1-32B-Preview 79.7 69.0 63.9

The data shows that Skywork-OR1-32B-Preview performs at near-parity with DeepSeek-R1 (79.7 vs. 79.8 on AIME24, 69.0 vs. 70.0 on AIME25, and 63.9 vs. 65.9 on LiveCodeBench), despite the latter having 20 times more parameters (671B vs. 32.8B).

The Skywork-OR1 models can be implemented using the following technical specifications:

How to Test the Skywork-OR1 Models

Here are the Skywork-OR1-32B, Skywork-OR1-7B, and Skywork-OR1-Math-7B Hugging Face model cards:

Skywork/Skywork-OR1-32B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-7B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-Math-7B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

To run the Evaluation Scripts, take the following steps. First:

Docker Environment:

docker pull whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
docker run --runtime=nvidia -it --rm --shm-size=10g --cap-add=SYS_ADMIN -v <path>:<path> image:tag

Conda Environment Setup:

conda create -n verl python==3.10
conda activate verl
pip3 install torch==2.4.0 --index-url <https://download.pytorch.org/whl/cu124>
pip3 install flash-attn --no-build-isolation
git clone <https://github.com/SkyworkAI/Skywork-OR1.git>
cd Skywork-OR1
pip3 install -e .

For reproducing AIME24 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-32B-Preview \\\\
DATA_PATH=or1_data/eval/aime24.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime24_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_32b.sh

For AIME25 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/aime25.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime25_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

For LiveCodeBench evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/livecodebench/livecodebench_2408_2502.parquet \\\\
SAMPLES=4 \\\\
TASK_NAME=LiveCodeBench_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

The current Skywork-OR1 models are labeled as "Preview" versions, with final releases scheduled to be available within two weeks of the initial announcement. The developers have indicated that additional technical documentation will be released, including:

  1. A comprehensive technical report detailing the training methodology
  2. The Skywork-OR1-RL-Data dataset
  3. Additional training scripts

The GitHub repository notes that the training scripts are "currently being organized and will be available in 1-2 days."

GitHub - SkyworkAI/Skywork-OR1
Contribute to SkyworkAI/Skywork-OR1 development by creating an account on GitHub.

Conclusion: Technical Assessment of Skywork-OR1-32B

The Skywork-OR1-32B-Preview model represents a significant advancement in parameter-efficient reasoning models. With 32.8 billion parameters, it achieves performance metrics nearly identical to the 671 billion parameter DeepSeek-R1 model across multiple benchmarks.

Though not verified yet, these results indicate that for practical applications requiring advanced reasoning capabilities, the Skywork-OR1-32B-Preview offers a viable alternative to significantly larger models, with substantially reduced computational requirements.

Additionally, the open-source nature of these models, along with their evaluation scripts and forthcoming training data, provides valuable technical resources for researchers and practitioners working on reasoning capabilities in language models.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Explore more

Why Are KYC APIs Essential for Modern Financial Compliance Success

Why Are KYC APIs Essential for Modern Financial Compliance Success

Discover why KYC APIs are transforming financial compliance. Learn about document verification, AML checks, biometric authentication, and implementation best practices.

16 July 2025

What is Async API and Why Should Every Developer Care About It

What is Async API and Why Should Every Developer Care About It

Discover what AsyncAPI is and why it's essential for modern event-driven applications. Learn about asynchronous API documentation, real-time messaging, and how AsyncAPI differs from REST APIs.

16 July 2025

Voxtral: Mistral AI's Open Source Whisper Alternative

Voxtral: Mistral AI's Open Source Whisper Alternative

For the past few years, OpenAI's Whisper has reigned as the undisputed champion of open-source speech recognition. It offered a level of accuracy that democratized automatic speech recognition (ASR) for developers, researchers, and hobbyists worldwide. It was a monumental leap forward, but the community has been eagerly awaiting the next step—a model that goes beyond mere transcription into the realm of true understanding. That wait is now over. Mistral AI has entered the ring with Voxtral, a ne

15 July 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs