How to Use Open AI’s GPT-OSS-120B with API

Discover GPT-OSS-120B, Open AI’s open-weight model. Learn its benchmarks, pricing, and how to integrate it with Cursor or Cline using OpenRouter API for coding.

Ashley Goolam

Ashley Goolam

7 August 2025

How to Use Open AI’s GPT-OSS-120B with API

Hey, AI enthusiasts! Buckle up because Open AI just dropped a bombshell with their new open-weight model, GPT-OSS-120B, and it’s turning heads in the AI community. Released under the Apache 2.0 license, this powerhouse is designed for reasoning, coding, and agentic tasks, all while running on a single GPU. In this guide, we’ll dive into what makes GPT-OSS-120B special, its stellar benchmarks, affordable pricing, and how you can use it via the OpenRouter API. Let’s explore this open-source gem and get you coding with it in no time!

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

What Is GPT-OSS-120B?

Open AI’s GPT-OSS-120B is a 117-billion-parameter language model (with 5.1 billion active per token) that’s part of their new open-weight GPT-OSS series, alongside the smaller GPT-OSS-20B. Released on August 5, 2025, it’s a Mixture-of-Experts (MoE) model optimized for efficiency, running on a single NVIDIA H100 GPU or even consumer hardware with MXFP4 quantization. It’s built for tasks like complex reasoning, code generation, and tool use, with a massive 128K token context window—think 300–400 pages of text! Under the Apache 2.0 license, you can customize, deploy, or even commercialize it, making it a dream for developers and businesses craving control and privacy.

gpt-oss-120b

Benchmarks: How Does GPT-OSS-120B Stack Up?

The GPT-OSS-120B is no slouch when it comes to performance. Open AI’s benchmarks show it’s a serious contender against proprietary models like their own o4-mini and even Claude 3.5 Sonnet. Here’s the lowdown:

health benchmarks

These stats show GPT-OSS-120B is near-parity with top-tier proprietary models while being open and customizable. It’s a beast for math, coding, and general problem-solving, with safety baked in through adversarial fine-tuning to keep risks low.

Pricing: Affordable and Transparent

One of the best parts about GPT-OSS-120B? It’s cost-effective, especially compared to proprietary models. Here’s how it breaks down across major providers, based on recent data for a 131K context window:

With GPT-OSS-120B, you get high performance at a fraction of GPT-4’s cost (~$20.00/M tokens), with providers like Groq and Cerebras offering blazing-fast throughput for real-time applications.

How to Use GPT-OSS-120B with Cline via OpenRouter

Want to harness the power of GPT-OSS-120B for your coding projects? While Claude Desktop and Claude Code do not support direct integration with OpenAI models like GPT-OSS-120B due to their reliance on Anthropic’s ecosystem, you can easily use this model with Cline, a free, open-source VS Code extension, via the OpenRouter API. Additionally, Cursor has recently restricted its Bring Your Own Key (BYOK) option for non-Pro users, locking features like Agent and Edit modes behind a $20/month subscription, making Cline a more flexible alternative for BYOK users. Here’s how to set up GPT-OSS-120B with Cline and OpenRouter, step by step.

Step 1: Get an OpenRouter API Key

  1. Sign Up with OpenRouter:
sign up with openrouter

2. Find GPT-OSS-120B:

find the gpt-oss-12b model

3. Generate an API Key:

create an api_key

Step 2: Use Cline in VS Code with BYOK

For unrestricted BYOK access, Cline (an open-source VS Code extension) is a fantastic Cursor alternative. It supports GPT-OSS-120B via OpenRouter without feature lockouts. Here’s how to set it up:

  1. Install Cline:
install cline

2. Configure OpenRouter:

configure cline to use gpt-oss-120b

3. Save and Test:

Generate a JavaScript function to parse JSON data.
function parseJSON(data) {
  try {
    return JSON.parse(data);
  } catch (e) {
    console.error("Invalid JSON:", e.message);
    return null;
  }
}
Summarize src/api/server.js

Why Cline Over Cursor or Claude?

Troubleshooting Tips

Why Use GPT-OSS-120B?

The GPT-OSS-120B model is a game-changer for developers and businesses, offering a compelling mix of performance, flexibility, and cost-efficiency. Here’s why it stands out:

aime 2024 benchmarks
chain of thought reasoning

Community buzz on X highlights its speed (up to 1,515 tokens/sec on Cerebras) and coding prowess, with developers loving its ability to handle multi-file projects and its open-weight nature for customization. Whether you’re building AI agents or fine-tuning for niche tasks, GPT-OSS-120B delivers unmatched value.

Conclusion

Open AI’s GPT-OSS-120B is a revolutionary open-weight model, blending top-tier performance with cost-effective deployment. Its benchmarks rival proprietary models, its pricing is wallet-friendly, and it’s easy to integrate with Cursor or Cline via OpenRouter’s API. Whether you’re coding, debugging, or reasoning through complex problems, this model delivers. Try it out, experiment with its 128K context window, and let us know your cool use cases in the comments—I’m all ears!

For more details, check out the repo at github.com/openai/gpt-oss or Open AI’s announcement at openai.com.

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

What Is Status Code 504: Gateway Timeout? The Waiting Game

What Is Status Code 504: Gateway Timeout? The Waiting Game

Discover what HTTP Status Code 504: Gateway Timeout means, why it happens, and how to fix it. Learn how Apidog helps developers detect and prevent slow API responses, keeping your apps fast and reliable.

24 October 2025

What Is Status Code 503: Service Unavailable? The "We're Overwhelmed" Signal

What Is Status Code 503: Service Unavailable? The "We're Overwhelmed" Signal

What is HTTP 503 Service Unavailable? This guide explains this server error code, its common causes, how it differs from 500 errors, and best practices for handling it.

24 October 2025

Top JSON Converters in 2025

Top JSON Converters in 2025

Explore the leading JSON converters in 2025, with a focus on Apidog's robust capabilities for API-integrated JSON handling. This guide covers features, comparisons, and best practices to optimize your development workflow and ensure efficient data transformation.

24 October 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs