How to Use Open AI’s GPT-OSS-120B with API

Discover GPT-OSS-120B, Open AI’s open-weight model. Learn its benchmarks, pricing, and how to integrate it with Cursor or Cline using OpenRouter API for coding.

Ashley Goolam

Ashley Goolam

7 August 2025

How to Use Open AI’s GPT-OSS-120B with API

Hey, AI enthusiasts! Buckle up because Open AI just dropped a bombshell with their new open-weight model, GPT-OSS-120B, and it’s turning heads in the AI community. Released under the Apache 2.0 license, this powerhouse is designed for reasoning, coding, and agentic tasks, all while running on a single GPU. In this guide, we’ll dive into what makes GPT-OSS-120B special, its stellar benchmarks, affordable pricing, and how you can use it via the OpenRouter API. Let’s explore this open-source gem and get you coding with it in no time!

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

What Is GPT-OSS-120B?

Open AI’s GPT-OSS-120B is a 117-billion-parameter language model (with 5.1 billion active per token) that’s part of their new open-weight GPT-OSS series, alongside the smaller GPT-OSS-20B. Released on August 5, 2025, it’s a Mixture-of-Experts (MoE) model optimized for efficiency, running on a single NVIDIA H100 GPU or even consumer hardware with MXFP4 quantization. It’s built for tasks like complex reasoning, code generation, and tool use, with a massive 128K token context window—think 300–400 pages of text! Under the Apache 2.0 license, you can customize, deploy, or even commercialize it, making it a dream for developers and businesses craving control and privacy.

gpt-oss-120b

Benchmarks: How Does GPT-OSS-120B Stack Up?

The GPT-OSS-120B is no slouch when it comes to performance. Open AI’s benchmarks show it’s a serious contender against proprietary models like their own o4-mini and even Claude 3.5 Sonnet. Here’s the lowdown:

health benchmarks

These stats show GPT-OSS-120B is near-parity with top-tier proprietary models while being open and customizable. It’s a beast for math, coding, and general problem-solving, with safety baked in through adversarial fine-tuning to keep risks low.

Pricing: Affordable and Transparent

One of the best parts about GPT-OSS-120B? It’s cost-effective, especially compared to proprietary models. Here’s how it breaks down across major providers, based on recent data for a 131K context window:

With GPT-OSS-120B, you get high performance at a fraction of GPT-4’s cost (~$20.00/M tokens), with providers like Groq and Cerebras offering blazing-fast throughput for real-time applications.

How to Use GPT-OSS-120B with Cline via OpenRouter

Want to harness the power of GPT-OSS-120B for your coding projects? While Claude Desktop and Claude Code do not support direct integration with OpenAI models like GPT-OSS-120B due to their reliance on Anthropic’s ecosystem, you can easily use this model with Cline, a free, open-source VS Code extension, via the OpenRouter API. Additionally, Cursor has recently restricted its Bring Your Own Key (BYOK) option for non-Pro users, locking features like Agent and Edit modes behind a $20/month subscription, making Cline a more flexible alternative for BYOK users. Here’s how to set up GPT-OSS-120B with Cline and OpenRouter, step by step.

Step 1: Get an OpenRouter API Key

  1. Sign Up with OpenRouter:
sign up with openrouter

2. Find GPT-OSS-120B:

find the gpt-oss-12b model

3. Generate an API Key:

create an api_key

Step 2: Use Cline in VS Code with BYOK

For unrestricted BYOK access, Cline (an open-source VS Code extension) is a fantastic Cursor alternative. It supports GPT-OSS-120B via OpenRouter without feature lockouts. Here’s how to set it up:

  1. Install Cline:
install cline

2. Configure OpenRouter:

configure cline to use gpt-oss-120b

3. Save and Test:

Generate a JavaScript function to parse JSON data.
function parseJSON(data) {
  try {
    return JSON.parse(data);
  } catch (e) {
    console.error("Invalid JSON:", e.message);
    return null;
  }
}
Summarize src/api/server.js

Why Cline Over Cursor or Claude?

Troubleshooting Tips

Why Use GPT-OSS-120B?

The GPT-OSS-120B model is a game-changer for developers and businesses, offering a compelling mix of performance, flexibility, and cost-efficiency. Here’s why it stands out:

aime 2024 benchmarks
chain of thought reasoning

Community buzz on X highlights its speed (up to 1,515 tokens/sec on Cerebras) and coding prowess, with developers loving its ability to handle multi-file projects and its open-weight nature for customization. Whether you’re building AI agents or fine-tuning for niche tasks, GPT-OSS-120B delivers unmatched value.

Conclusion

Open AI’s GPT-OSS-120B is a revolutionary open-weight model, blending top-tier performance with cost-effective deployment. Its benchmarks rival proprietary models, its pricing is wallet-friendly, and it’s easy to integrate with Cursor or Cline via OpenRouter’s API. Whether you’re coding, debugging, or reasoning through complex problems, this model delivers. Try it out, experiment with its 128K context window, and let us know your cool use cases in the comments—I’m all ears!

For more details, check out the repo at github.com/openai/gpt-oss or Open AI’s announcement at openai.com.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

Top 10 API Trends for 2025: Shaping the Future of Development

Top 10 API Trends for 2025: Shaping the Future of Development

Explore the top 10 API trends transforming development in 2025 from async APIs and GraphQL to API-first design and AI-driven tooling. See how tools like Apidog help you stay ahead.

7 August 2025

What's New with Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507? Smarter AI Models with 256K Context

What's New with Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507? Smarter AI Models with 256K Context

Discover Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, Alibaba Cloud’s latest AI models with 256K context length, advanced reasoning, and multilingual support.

7 August 2025

How to Use Gemini CLI GitHub Actions for Free

How to Use Gemini CLI GitHub Actions for Free

Gemini CLI GitHub Actions automates coding tasks like PR reviews and issue triage using Google’s Gemini AI. Explore its features and setup in this guide.

6 August 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs