Skywork-OR1-32B: Open Source SOTA Model Better Than Deepseek R1

Ashley Innocent

Ashley Innocent

13 April 2025

Skywork-OR1-32B: Open Source SOTA Model Better Than Deepseek R1

On April 13, 2025, SkyworkAI released the Skywork-OR1 (Open Reasoner 1) series, comprising three models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview, and Skywork-OR1-32B-Preview.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Skywork-OR1-32B: Not Just Another Open Source Reasoning Model

The Skywork-OR1-32B-Preview model contains 32.8 billion parameters and utilizes the BF16 tensor type for numerical precision. The model is distributed in the safetensors format and is based on the Qwen2 architecture. According to the model repository, it maintains the same architecture as the DeepSeek-R1-Distill-Qwen-32B base model, but with specialized training for mathematical and coding reasoning tasks.

Let's take a look at some of the Skywork model families' basic technical info:

Skywork-OR1-32B-Preview

The 32B model demonstrates a 6.8-point improvement on AIME24 and a 10.0-point improvement on AIME25 over its base model. It achieves parameter efficiency by delivering performance comparable to the 671B parameter DeepSeek-R1 with only 4.9% of the parameters.

Skywork-OR1-Math-7B

The model outperforms the base DeepSeek-R1-Distill-Qwen-7B significantly on mathematical tasks (69.8 vs. 55.5 on AIME24, 52.3 vs. 39.2 on AIME25), demonstrating the effectiveness of the specialized training approach.

Skywork-OR1-7B-Preview

While showing less mathematical specialization than the Math-7B variant, this model offers better balanced performance between mathematical and coding tasks.

Traning Dataset of Skywork-OR1-32B

The Skywork-OR1 training dataset contains:

Data Processing Pipeline

  1. Model-aware Difficulty Estimation: Each problem undergoes difficulty scoring relative to the model's current capabilities, allowing for targeted training.
  2. Quality Assessment: Rigorous filtering is applied prior to training to ensure dataset quality.
  3. Offline and Online Filtering: A two-stage filtering process is implemented to:

4. Rejection Sampling: This technique is employed to control the distribution of training examples, helping maintain an optimal learning curve.

Advanced Reinforcement Learning Training Pipeline

The models utilize a customized version of GRPO (Generative Reinforcement via Policy Optimization) with several technical enhancements:

  1. Multi-stage Training Pipeline: The training proceeds through distinct phases, each building on previously acquired capabilities. The GitHub repository includes a graph plotting AIME24 scores against training steps, demonstrating clear performance improvements at each stage.
  2. Adaptive Entropy Control: This technique dynamically adjusts the exploration-exploitation trade-off during training, encouraging broader exploration while maintaining convergence stability.
  3. Custom Fork of VERL Framework: The models are trained using a modified version of the VERL project, specifically adapted for reasoning tasks.

You can read the full paper here.

Skywork-OR1-32B Benchmarks

Technical specifications:

The Skywork-OR1 series introduces Avg@K as their primary evaluation metric instead of the conventional Pass@1. This metric calculates average performance across multiple independent attempts (32 for AIME tests, 4 for LiveCodeBench), reducing variance and providing a more reliable measure of reasoning consistency.

Below are the exact benchmark results for all models in the series:

Model AIME24 (Avg@32) AIME25 (Avg@32) LiveCodeBench (8/1/24-2/1/25) (Avg@4)
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2 37.6
Light-R1-7B-DS 59.1 44.3 39.5
DeepSeek-R1-Distill-Qwen-32B 72.9 59.0 57.2
TinyR1-32B-Preview 78.1 65.3 61.6
QwQ-32B 79.5 65.3 61.6
DeepSeek-R1 79.8 70.0 65.9
Skywork-OR1-Math-7B 69.8 52.3 43.6
Skywork-OR1-7B-Preview 63.6 45.8 43.9
Skywork-OR1-32B-Preview 79.7 69.0 63.9

The data shows that Skywork-OR1-32B-Preview performs at near-parity with DeepSeek-R1 (79.7 vs. 79.8 on AIME24, 69.0 vs. 70.0 on AIME25, and 63.9 vs. 65.9 on LiveCodeBench), despite the latter having 20 times more parameters (671B vs. 32.8B).

The Skywork-OR1 models can be implemented using the following technical specifications:

How to Test the Skywork-OR1 Models

Here are the Skywork-OR1-32B, Skywork-OR1-7B, and Skywork-OR1-Math-7B Hugging Face model cards:

Skywork/Skywork-OR1-32B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-7B-Preview · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Skywork/Skywork-OR1-Math-7B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

To run the Evaluation Scripts, take the following steps. First:

Docker Environment:

docker pull whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
docker run --runtime=nvidia -it --rm --shm-size=10g --cap-add=SYS_ADMIN -v <path>:<path> image:tag

Conda Environment Setup:

conda create -n verl python==3.10
conda activate verl
pip3 install torch==2.4.0 --index-url <https://download.pytorch.org/whl/cu124>
pip3 install flash-attn --no-build-isolation
git clone <https://github.com/SkyworkAI/Skywork-OR1.git>
cd Skywork-OR1
pip3 install -e .

For reproducing AIME24 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-32B-Preview \\\\
DATA_PATH=or1_data/eval/aime24.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime24_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_32b.sh

For AIME25 evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/aime25.parquet \\\\
SAMPLES=32 \\\\
TASK_NAME=Aime25_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

For LiveCodeBench evaluation:

MODEL_PATH=Skywork/Skywork-OR1-Math-7B \\\\
DATA_PATH=or1_data/eval/livecodebench/livecodebench_2408_2502.parquet \\\\
SAMPLES=4 \\\\
TASK_NAME=LiveCodeBench_Avg-Skywork_OR1_Math_7B \\\\
bash ./or1_script/eval/eval_7b.sh

The current Skywork-OR1 models are labeled as "Preview" versions, with final releases scheduled to be available within two weeks of the initial announcement. The developers have indicated that additional technical documentation will be released, including:

  1. A comprehensive technical report detailing the training methodology
  2. The Skywork-OR1-RL-Data dataset
  3. Additional training scripts

The GitHub repository notes that the training scripts are "currently being organized and will be available in 1-2 days."

GitHub - SkyworkAI/Skywork-OR1
Contribute to SkyworkAI/Skywork-OR1 development by creating an account on GitHub.

Conclusion: Technical Assessment of Skywork-OR1-32B

The Skywork-OR1-32B-Preview model represents a significant advancement in parameter-efficient reasoning models. With 32.8 billion parameters, it achieves performance metrics nearly identical to the 671 billion parameter DeepSeek-R1 model across multiple benchmarks.

Though not verified yet, these results indicate that for practical applications requiring advanced reasoning capabilities, the Skywork-OR1-32B-Preview offers a viable alternative to significantly larger models, with substantially reduced computational requirements.

Additionally, the open-source nature of these models, along with their evaluation scripts and forthcoming training data, provides valuable technical resources for researchers and practitioners working on reasoning capabilities in language models.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demans, and replaces Postman at a much more affordable price!
button

Explore more

Top 10 Stablecoin APIs for Developers

Top 10 Stablecoin APIs for Developers

Stablecoins have become a vital component of the cryptocurrency ecosystem, offering traders and developers a way to mitigate market volatility while still benefiting from blockchain technology. Whether you are designing a payment solution, executing automated trading strategies, or providing real-time market analytics, incorporating stablecoin APIs into your platform can help streamline processes and enhance functionality. In this article, we explore the top 10 stablecoin trading APIs for develo

31 May 2025

Top 10 Best Online Sports Betting APIs / Sports Odds APIs 2025

Top 10 Best Online Sports Betting APIs / Sports Odds APIs 2025

The online sports betting industry is evolving at an unprecedented pace, and with it comes the need for robust, real-time data integration. In 2025, sports betting APIs and sports odds APIs are more vital than ever for bookmakers, developers, and betting enthusiasts. This article dives into the top 10 online sports betting APIs that are set to shape the industry this year, with betcore.eu proudly leading the pack as the number one choice. With technology continually advancing and customer expec

31 May 2025

Vibetest-Use MCP Server: AI Powered QA Testing

Vibetest-Use MCP Server: AI Powered QA Testing

Master QA with Vibetest-use MCP! This tutorial shows how to use Browser-Use to automate website testing, catching 404s, dead buttons, and UI glitches in under 60 seconds.

30 May 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs