How to Use gpt-oss with Claude Code

Integrate GPT-OSS with Claude Code for free or low-cost AI coding. This guide covers setup with Hugging Face, OpenRouter, or LiteLLM for powerful workflows.

Ashley Goolam

Ashley Goolam

8 August 2025

How to Use gpt-oss with Claude Code

Want to supercharge your coding workflow with GPT-OSS, Open AI’s open-weight model, right inside Claude Code? You’re in for a treat! Released in August 2025, GPT-OSS (20B or 120B variants) is a powerhouse for coding and reasoning, and you can pair it with Claude Code’s sleek CLI interface for free or low-cost setups. In this conversational guide, we’ll walk you through three paths to integrate GPT-OSS with Claude Code using Hugging Face, OpenRouter, or LiteLLM. Let’s dive in and get your AI coding sidekick up and running!

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

What Is GPT-OSS and Why Use It with Claude Code?

GPT-OSS is Open AI’s open-weight model family, with the 20B and 120B variants offering stellar performance for coding, reasoning, and agentic tasks. With a 128K token context window and Apache 2.0 license, it’s perfect for developers who want flexibility and control. Claude Code, Anthropic’s CLI tool (version 0.5.3+), is a developer favorite for its conversational coding capabilities. By routing Claude Code to GPT-OSS via OpenAI-compatible APIs, you can enjoy Claude’s familiar interface while leveraging GPT-OSS’s open-source power—without Anthropic’s subscription costs. Ready to make it happen? Let’s explore the setup options!

open ai's open weight models

Prerequisites for Using GPT-OSS with Claude Code

Before we start, ensure you have:

Claude code

Path A: Self-Host GPT-OSS on Hugging Face

Want full control? Host GPT-OSS on Hugging Face’s Inference Endpoints for a private, scalable setup. Here’s how:

Step 1: Grab the Model

  1. Visit the GPT-OSS repo on Hugging Face (openai/gpt-oss-20b or openai/gpt-oss-120b).
  2. Accept the Apache 2.0 license to access the model.
  3. Alternatively, try Qwen3-Coder-480B-A35B-Instruct (Qwen/Qwen3-Coder-480B-A35B-Instruct) for a coding-focused model (use a GGUF version for lighter hardware).
hugging face gpt-oss model

Step 2: Deploy a Text Generation Inference Endpoint

  1. On the model page, click Deploy > Inference Endpoint.
  2. Select the Text Generation Inference (TGI) template (≥ v1.4.0).
  3. Enable OpenAI compatibility by checking Enable OpenAI compatibility or adding --enable-openai in advanced settings.
  4. Choose hardware: A10G or CPU for 20B, A100 for 120B. Create the endpoint.

Step 3: Collect Credentials

  1. Once the endpoint status is Running, copy:

2. Note the model ID (e.g., gpt-oss-20b or gpt-oss-120b).

Step 4: Configure Claude Code

  1. Set environment variables in your terminal:
export ANTHROPIC_BASE_URL="https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_xxxxxxxxxxxxxxxxx"
export ANTHROPIC_MODEL="gpt-oss-20b"  # or gpt-oss-120b

Replace <your-endpoint> and hf_xxxxxxxxxxxxxxxxx with your values.

2. Test the setup:

claude --model gpt-oss-20b

Claude Code routes to your GPT-OSS endpoint, streaming responses via TGI’s /v1/chat/completions API, mimicking OpenAI’s schema.

Step 5: Cost and Scaling Notes

docker run --name tgi -p 8080:80 -e HF_TOKEN=hf_xxxxxxxxxxxxxxxxx ghcr.io/huggingface/text-generation-inference:latest --model-id openai/gpt-oss-20b --enable-openai

Then set ANTHROPIC_BASE_URL="http://localhost:8080".

Path B: Proxy GPT-OSS Through OpenRouter

No DevOps? No problem! Use OpenRouter to access GPT-OSS with minimal setup. It’s fast and handles billing for you.

Step 1: Register and Pick a Model

  1. Sign up at openrouter.ai and copy your API key from the Keys section.
  2. Choose a model slug:
gpt-oss model on openrouter

Step 2: Configure Claude Code

  1. Set environment variables:
export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_xxxxxxxxx"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"

Replace or_xxxxxxxxx with your OpenRouter API key.

2. Test it:

claude --model openai/gpt-oss-20b

Claude Code connects to GPT-OSS via OpenRouter’s unified API, with streaming and fallback support.

Step 3: Cost Notes

Path C: Use LiteLLM for Mixed Model Fleets

Want to juggle GPT-OSS, Qwen, and Anthropic models in one workflow? LiteLLM acts as a proxy to hot-swap models seamlessly.

Step 1: Install and Configure LiteLLM

  1. Install LiteLLM:
pip install litellm

2. Create a config file (litellm.yaml):

model_list:
  - model_name: gpt-oss-20b
    litellm_params:
      model: openai/gpt-oss-20b
      api_key: or_xxxxxxxxx  # OpenRouter key
      api_base: https://openrouter.ai/api/v1
  - model_name: qwen3-coder
    litellm_params:
      model: openrouter/qwen/qwen3-coder
      api_key: or_xxxxxxxxx
      api_base: https://openrouter.ai/api/v1

Replace or_xxxxxxxxx with your OpenRouter key.

3. Start the proxy:

litellm --config litellm.yaml

Step 2: Point Claude Code to LiteLLM

  1. Set environment variables:
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="litellm_master"
export ANTHROPIC_MODEL="gpt-oss-20b"

2. Test it:

claude --model gpt-oss-20b

LiteLLM routes requests to GPT-OSS via OpenRouter, with cost logging and simple-shuffle routing for reliability.

Step 3: Notes

How to Call LLM APIs using the OpenAI format with LiteLLM
Use LiteLLM to call LLM APIs like OpenAI, Ollama, and more with one format! This guide shows setup, calling GPT-4o, Llama 3, and streaming responses.
New to LiteLLM? Click here to learn more.

Testing GPT-OSS with Claude Code

Let’s make sure GPT-OSS is working! Open Claude Code and try these commands:

Code Generation:

claude --model gpt-oss-20b "Write a Python REST API with Flask"

Expect a response like:

from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/api', methods=['GET'])
def get_data():
    return jsonify({"message": "Hello from GPT-OSS!"})
if __name__ == '__main__':
    app.run(debug=True)

Codebase Analysis:

claude --model gpt-oss-20b "Summarize src/server.js"

GPT-OSS leverages its 128K context window to analyze your JavaScript file and return a summary.

Debugging:

claude --model gpt-oss-20b "Debug this buggy Python code: [paste code]"

With an 87.3% HumanEval pass rate, GPT-OSS should spot and fix issues accurately.

Troubleshooting Tips

Why Use GPT-OSS with Claude Code?

Pairing GPT-OSS with Claude Code is a developer’s dream. You get:

Users are raving about GPT-OSS’s coding prowess, calling it “a budget-friendly beast for multi-file projects.” Whether you self-host or proxy through OpenRouter, this setup keeps costs low and productivity high.

Conclusion

You’re now ready to rock GPT-OSS with Claude Code! Whether you self-host on Hugging Face, proxy via OpenRouter, or use LiteLLM for model juggling, you’ve got a powerful, cost-effective coding setup. From generating REST APIs to debugging code, GPT-OSS delivers, and Claude Code makes it feel effortless. Try it out, share your favorite prompts in the comments, and let’s geek out over AI coding!

💡
Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!
button

Explore more

GPT-5 vs Claude Opus: Comparing their API Pricing, and Which is Better for Coding

GPT-5 vs Claude Opus: Comparing their API Pricing, and Which is Better for Coding

Dive into GPT-5 vs Claude Opus 4.1! Compare coding benchmarks, API pricing, and features to find the best AI for your 2025 projects. Perfect for developers seeking speed or precision.

8 August 2025

How to Connect to Any LLM to Any MCP Server, Locally and Open Source

How to Connect to Any LLM to Any MCP Server, Locally and Open Source

Use Director CLI to connect any LLM to MCP servers locally. This guide shows how to set up proxies, add servers like Hacker News or Slack, and use clients like Claude or Cursor for AI-powered workflows.

8 August 2025

What Is API Monitoring? The Complete Guide You Need in 2025

What Is API Monitoring? The Complete Guide You Need in 2025

Discover what API monitoring is, why it matters, and how to implement it effectively. Learn best practices, challenges, and how Apidog can help monitor, test, and optimize your APIs for peak performance.

8 August 2025

Practice API Design-first in Apidog

Discover an easier way to build and use APIs