How to Use gpt-oss with Claude Code

Want to supercharge your coding workflow with GPT-OSS, Open AI’s open-weight model, right inside Claude Code? You’re in for a treat! Released in August 2025, GPT-OSS (20B or 120B variants) is a powerhouse for coding and reasoning, and you can pair it with Claude Code’s sleek CLI interface for free or low-cost setups. In this conversational guide, we’ll walk you through three paths to integrate GPT-OSS with Claude Code using Hugging Face, OpenRouter, or LiteLLM. Let’s dive in and get your AI coding sidekick up and running!

💡

Want a great API Testing tool that generates beautiful API Documentation?

Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?

Apidog delivers all your demands, and replaces Postman at a much more affordable price!

button

What Is GPT-OSS and Why Use It with Claude Code?

GPT-OSS is Open AI’s open-weight model family, with the 20B and 120B variants offering stellar performance for coding, reasoning, and agentic tasks. With a 128K token context window and Apache 2.0 license, it’s perfect for developers who want flexibility and control. Claude Code, Anthropic’s CLI tool (version 0.5.3+), is a developer favorite for its conversational coding capabilities. By routing Claude Code to GPT-OSS via OpenAI-compatible APIs, you can enjoy Claude’s familiar interface while leveraging GPT-OSS’s open-source power—without Anthropic’s subscription costs. Ready to make it happen? Let’s explore the setup options!

Prerequisites for Using GPT-OSS with Claude Code

Before we start, ensure you have:

Claude Code ≥ 0.5.3: Check with claude --version. Install via pip install claude-code or update with pip install --upgrade claude-code.
Hugging Face Account: Sign up at huggingface.co and create a read/write token (Settings > Access Tokens).
OpenRouter API Key: Optional, for Path B. Get one at openrouter.ai.
Python 3.10+ and Docker: For local setups or LiteLLM (Path C).
Basic CLI Knowledge: Familiarity with environment variables and terminal commands helps.

Path A: Self-Host GPT-OSS on Hugging Face

Want full control? Host GPT-OSS on Hugging Face’s Inference Endpoints for a private, scalable setup. Here’s how:

Step 1: Grab the Model

Visit the GPT-OSS repo on Hugging Face (openai/gpt-oss-20b or openai/gpt-oss-120b).
Accept the Apache 2.0 license to access the model.
Alternatively, try Qwen3-Coder-480B-A35B-Instruct (Qwen/Qwen3-Coder-480B-A35B-Instruct) for a coding-focused model (use a GGUF version for lighter hardware).

Step 2: Deploy a Text Generation Inference Endpoint

On the model page, click Deploy > Inference Endpoint.
Select the Text Generation Inference (TGI) template (≥ v1.4.0).
Enable OpenAI compatibility by checking Enable OpenAI compatibility or adding --enable-openai in advanced settings.
Choose hardware: A10G or CPU for 20B, A100 for 120B. Create the endpoint.

Step 3: Collect Credentials

Once the endpoint status is Running, copy:

ENDPOINT_URL: Looks like https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud.
HF_API_TOKEN: Your Hugging Face token from Settings > Access Tokens.

2. Note the model ID (e.g., gpt-oss-20b or gpt-oss-120b).

Step 4: Configure Claude Code

Set environment variables in your terminal:

export ANTHROPIC_BASE_URL="https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_xxxxxxxxxxxxxxxxx"
export ANTHROPIC_MODEL="gpt-oss-20b"  # or gpt-oss-120b

Replace <your-endpoint> and hf_xxxxxxxxxxxxxxxxx with your values.

2. Test the setup:

claude --model gpt-oss-20b

Claude Code routes to your GPT-OSS endpoint, streaming responses via TGI’s /v1/chat/completions API, mimicking OpenAI’s schema.

Step 5: Cost and Scaling Notes

Hugging Face Costs: Inference Endpoints auto-scale, so monitor usage to avoid credit burn. A10G costs ~$0.60/hour, A100 ~$3/hour.
Local Option: For zero cloud costs, run TGI locally with Docker:

docker run --name tgi -p 8080:80 -e HF_TOKEN=hf_xxxxxxxxxxxxxxxxx ghcr.io/huggingface/text-generation-inference:latest --model-id openai/gpt-oss-20b --enable-openai

Then set ANTHROPIC_BASE_URL="http://localhost:8080".

Path B: Proxy GPT-OSS Through OpenRouter

No DevOps? No problem! Use OpenRouter to access GPT-OSS with minimal setup. It’s fast and handles billing for you.

Step 1: Register and Pick a Model

Sign up at openrouter.ai and copy your API key from the Keys section.
Choose a model slug:

openai/gpt-oss-20b
openai/gpt-oss-120b
qwen/qwen3-coder-480b (for Qwen’s coder model)

Step 2: Configure Claude Code

Set environment variables:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_xxxxxxxxx"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"

Replace or_xxxxxxxxx with your OpenRouter API key.

2. Test it:

claude --model openai/gpt-oss-20b

Claude Code connects to GPT-OSS via OpenRouter’s unified API, with streaming and fallback support.

Step 3: Cost Notes

OpenRouter Pricing: ~$0.50/M input tokens, ~$2.00/M output tokens for GPT-OSS-120B, significantly cheaper than proprietary models like GPT-4 (~$20.00/M).
Billing: OpenRouter manages usage, so you only pay for what you use.

Path C: Use LiteLLM for Mixed Model Fleets

Want to juggle GPT-OSS, Qwen, and Anthropic models in one workflow? LiteLLM acts as a proxy to hot-swap models seamlessly.

Step 1: Install and Configure LiteLLM

Install LiteLLM:

pip install litellm

2. Create a config file (litellm.yaml):

model_list:
  - model_name: gpt-oss-20b
    litellm_params:
      model: openai/gpt-oss-20b
      api_key: or_xxxxxxxxx  # OpenRouter key
      api_base: https://openrouter.ai/api/v1
  - model_name: qwen3-coder
    litellm_params:
      model: openrouter/qwen/qwen3-coder
      api_key: or_xxxxxxxxx
      api_base: https://openrouter.ai/api/v1

Replace or_xxxxxxxxx with your OpenRouter key.

3. Start the proxy:

litellm --config litellm.yaml

Step 2: Point Claude Code to LiteLLM

Set environment variables:

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="litellm_master"
export ANTHROPIC_MODEL="gpt-oss-20b"

2. Test it:

claude --model gpt-oss-20b

LiteLLM routes requests to GPT-OSS via OpenRouter, with cost logging and simple-shuffle routing for reliability.

Step 3: Notes

Avoid Latency Routing: Use simple-shuffle mode in LiteLLM to prevent issues with Anthropic models.
Cost Tracking: LiteLLM logs usage for transparency.

New to LiteLLM? Click here to learn more.

Testing GPT-OSS with Claude Code

Let’s make sure GPT-OSS is working! Open Claude Code and try these commands:

Code Generation:

claude --model gpt-oss-20b "Write a Python REST API with Flask"

Expect a response like:

from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/api', methods=['GET'])
def get_data():
    return jsonify({"message": "Hello from GPT-OSS!"})
if __name__ == '__main__':
    app.run(debug=True)

Codebase Analysis:

claude --model gpt-oss-20b "Summarize src/server.js"

GPT-OSS leverages its 128K context window to analyze your JavaScript file and return a summary.

Debugging:

claude --model gpt-oss-20b "Debug this buggy Python code: [paste code]"

With an 87.3% HumanEval pass rate, GPT-OSS should spot and fix issues accurately.

Troubleshooting Tips

404 on /v1/chat/completions? Ensure --enable-openai is active in TGI (Path A) or check OpenRouter’s model availability (Path B).
Empty Responses? Verify ANTHROPIC_MODEL matches the slug (e.g., gpt-oss-20b).
400 Error After Model Swap? Use simple-shuffle routing in LiteLLM (Path C).
Slow First Token? Warm up Hugging Face endpoints with a small prompt after scaling to zero.
Claude Code Crashing? Update to ≥ 0.5.3 and ensure environment variables are set correctly.

Why Use GPT-OSS with Claude Code?

Pairing GPT-OSS with Claude Code is a developer’s dream. You get:

Cost Savings: OpenRouter’s $0.50/M input tokens beat proprietary models, and local TGI setups are free after hardware costs.
Open-Source Power: GPT-OSS’s Apache 2.0 license lets you customize or deploy privately.
Seamless Workflow: Claude Code’s CLI feels like chatting with a coding buddy, while GPT-OSS handles heavy lifting with 94.2% MMLU and 96.6% AIME scores.
Flexibility: Swap between GPT-OSS, Qwen, or Anthropic models with LiteLLM or OpenRouter.

Users are raving about GPT-OSS’s coding prowess, calling it “a budget-friendly beast for multi-file projects.” Whether you self-host or proxy through OpenRouter, this setup keeps costs low and productivity high.

Conclusion

You’re now ready to rock GPT-OSS with Claude Code! Whether you self-host on Hugging Face, proxy via OpenRouter, or use LiteLLM for model juggling, you’ve got a powerful, cost-effective coding setup. From generating REST APIs to debugging code, GPT-OSS delivers, and Claude Code makes it feel effortless. Try it out, share your favorite prompts in the comments, and let’s geek out over AI coding!

💡

button