Want to supercharge your coding workflow with GPT-OSS, Open AI’s open-weight model, right inside Claude Code? You’re in for a treat! Released in August 2025, GPT-OSS (20B or 120B variants) is a powerhouse for coding and reasoning, and you can pair it with Claude Code’s sleek CLI interface for free or low-cost setups. In this conversational guide, we’ll walk you through three paths to integrate GPT-OSS with Claude Code using Hugging Face, OpenRouter, or LiteLLM. Let’s dive in and get your AI coding sidekick up and running!
Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?
Apidog delivers all your demands, and replaces Postman at a much more affordable price!
What Is GPT-OSS and Why Use It with Claude Code?
GPT-OSS is Open AI’s open-weight model family, with the 20B and 120B variants offering stellar performance for coding, reasoning, and agentic tasks. With a 128K token context window and Apache 2.0 license, it’s perfect for developers who want flexibility and control. Claude Code, Anthropic’s CLI tool (version 0.5.3+), is a developer favorite for its conversational coding capabilities. By routing Claude Code to GPT-OSS via OpenAI-compatible APIs, you can enjoy Claude’s familiar interface while leveraging GPT-OSS’s open-source power—without Anthropic’s subscription costs. Ready to make it happen? Let’s explore the setup options!

Prerequisites for Using GPT-OSS with Claude Code
Before we start, ensure you have:
- Claude Code ≥ 0.5.3: Check with
claude --version
. Install viapip install claude-code
or update withpip install --upgrade claude-code
. - Hugging Face Account: Sign up at huggingface.co and create a read/write token (Settings > Access Tokens).
- OpenRouter API Key: Optional, for Path B. Get one at openrouter.ai.
- Python 3.10+ and Docker: For local setups or LiteLLM (Path C).
- Basic CLI Knowledge: Familiarity with environment variables and terminal commands helps.

Path A: Self-Host GPT-OSS on Hugging Face
Want full control? Host GPT-OSS on Hugging Face’s Inference Endpoints for a private, scalable setup. Here’s how:
Step 1: Grab the Model
- Visit the GPT-OSS repo on Hugging Face (openai/gpt-oss-20b or openai/gpt-oss-120b).
- Accept the Apache 2.0 license to access the model.
- Alternatively, try Qwen3-Coder-480B-A35B-Instruct (Qwen/Qwen3-Coder-480B-A35B-Instruct) for a coding-focused model (use a GGUF version for lighter hardware).

Step 2: Deploy a Text Generation Inference Endpoint
- On the model page, click Deploy > Inference Endpoint.
- Select the Text Generation Inference (TGI) template (≥ v1.4.0).
- Enable OpenAI compatibility by checking Enable OpenAI compatibility or adding
--enable-openai
in advanced settings. - Choose hardware: A10G or CPU for 20B, A100 for 120B. Create the endpoint.
Step 3: Collect Credentials
- Once the endpoint status is Running, copy:
- ENDPOINT_URL: Looks like
https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud
. - HF_API_TOKEN: Your Hugging Face token from Settings > Access Tokens.
2. Note the model ID (e.g., gpt-oss-20b
or gpt-oss-120b
).
Step 4: Configure Claude Code
- Set environment variables in your terminal:
export ANTHROPIC_BASE_URL="https://<your-endpoint>.us-east-1.aws.endpoints.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_xxxxxxxxxxxxxxxxx"
export ANTHROPIC_MODEL="gpt-oss-20b" # or gpt-oss-120b
Replace <your-endpoint>
and hf_xxxxxxxxxxxxxxxxx
with your values.
2. Test the setup:
claude --model gpt-oss-20b
Claude Code routes to your GPT-OSS endpoint, streaming responses via TGI’s /v1/chat/completions
API, mimicking OpenAI’s schema.
Step 5: Cost and Scaling Notes
- Hugging Face Costs: Inference Endpoints auto-scale, so monitor usage to avoid credit burn. A10G costs ~$0.60/hour, A100 ~$3/hour.
- Local Option: For zero cloud costs, run TGI locally with Docker:
docker run --name tgi -p 8080:80 -e HF_TOKEN=hf_xxxxxxxxxxxxxxxxx ghcr.io/huggingface/text-generation-inference:latest --model-id openai/gpt-oss-20b --enable-openai
Then set ANTHROPIC_BASE_URL="http://localhost:8080"
.
Path B: Proxy GPT-OSS Through OpenRouter
No DevOps? No problem! Use OpenRouter to access GPT-OSS with minimal setup. It’s fast and handles billing for you.
Step 1: Register and Pick a Model
- Sign up at openrouter.ai and copy your API key from the Keys section.
- Choose a model slug:
openai/gpt-oss-20b
openai/gpt-oss-120b
qwen/qwen3-coder-480b
(for Qwen’s coder model)

Step 2: Configure Claude Code
- Set environment variables:
export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_xxxxxxxxx"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"
Replace or_xxxxxxxxx
with your OpenRouter API key.
2. Test it:
claude --model openai/gpt-oss-20b
Claude Code connects to GPT-OSS via OpenRouter’s unified API, with streaming and fallback support.
Step 3: Cost Notes
- OpenRouter Pricing: ~$0.50/M input tokens, ~$2.00/M output tokens for GPT-OSS-120B, significantly cheaper than proprietary models like GPT-4 (~$20.00/M).
- Billing: OpenRouter manages usage, so you only pay for what you use.
Path C: Use LiteLLM for Mixed Model Fleets
Want to juggle GPT-OSS, Qwen, and Anthropic models in one workflow? LiteLLM acts as a proxy to hot-swap models seamlessly.
Step 1: Install and Configure LiteLLM
- Install LiteLLM:
pip install litellm
2. Create a config file (litellm.yaml
):
model_list:
- model_name: gpt-oss-20b
litellm_params:
model: openai/gpt-oss-20b
api_key: or_xxxxxxxxx # OpenRouter key
api_base: https://openrouter.ai/api/v1
- model_name: qwen3-coder
litellm_params:
model: openrouter/qwen/qwen3-coder
api_key: or_xxxxxxxxx
api_base: https://openrouter.ai/api/v1
Replace or_xxxxxxxxx
with your OpenRouter key.
3. Start the proxy:
litellm --config litellm.yaml
Step 2: Point Claude Code to LiteLLM
- Set environment variables:
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="litellm_master"
export ANTHROPIC_MODEL="gpt-oss-20b"
2. Test it:
claude --model gpt-oss-20b
LiteLLM routes requests to GPT-OSS via OpenRouter, with cost logging and simple-shuffle routing for reliability.
Step 3: Notes
- Avoid Latency Routing: Use simple-shuffle mode in LiteLLM to prevent issues with Anthropic models.
- Cost Tracking: LiteLLM logs usage for transparency.

Testing GPT-OSS with Claude Code
Let’s make sure GPT-OSS is working! Open Claude Code and try these commands:
Code Generation:
claude --model gpt-oss-20b "Write a Python REST API with Flask"
Expect a response like:
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/api', methods=['GET'])
def get_data():
return jsonify({"message": "Hello from GPT-OSS!"})
if __name__ == '__main__':
app.run(debug=True)
Codebase Analysis:
claude --model gpt-oss-20b "Summarize src/server.js"
GPT-OSS leverages its 128K context window to analyze your JavaScript file and return a summary.
Debugging:
claude --model gpt-oss-20b "Debug this buggy Python code: [paste code]"
With an 87.3% HumanEval pass rate, GPT-OSS should spot and fix issues accurately.
Troubleshooting Tips
- 404 on /v1/chat/completions? Ensure
--enable-openai
is active in TGI (Path A) or check OpenRouter’s model availability (Path B). - Empty Responses? Verify
ANTHROPIC_MODEL
matches the slug (e.g.,gpt-oss-20b
). - 400 Error After Model Swap? Use simple-shuffle routing in LiteLLM (Path C).
- Slow First Token? Warm up Hugging Face endpoints with a small prompt after scaling to zero.
- Claude Code Crashing? Update to ≥ 0.5.3 and ensure environment variables are set correctly.
Why Use GPT-OSS with Claude Code?
Pairing GPT-OSS with Claude Code is a developer’s dream. You get:
- Cost Savings: OpenRouter’s $0.50/M input tokens beat proprietary models, and local TGI setups are free after hardware costs.
- Open-Source Power: GPT-OSS’s Apache 2.0 license lets you customize or deploy privately.
- Seamless Workflow: Claude Code’s CLI feels like chatting with a coding buddy, while GPT-OSS handles heavy lifting with 94.2% MMLU and 96.6% AIME scores.
- Flexibility: Swap between GPT-OSS, Qwen, or Anthropic models with LiteLLM or OpenRouter.
Users are raving about GPT-OSS’s coding prowess, calling it “a budget-friendly beast for multi-file projects.” Whether you self-host or proxy through OpenRouter, this setup keeps costs low and productivity high.
Conclusion
You’re now ready to rock GPT-OSS with Claude Code! Whether you self-host on Hugging Face, proxy via OpenRouter, or use LiteLLM for model juggling, you’ve got a powerful, cost-effective coding setup. From generating REST APIs to debugging code, GPT-OSS delivers, and Claude Code makes it feel effortless. Try it out, share your favorite prompts in the comments, and let’s geek out over AI coding!
Want an integrated, All-in-One platform for your Developer Team to work together with maximum productivity?
Apidog delivers all your demands, and replaces Postman at a much more affordable price!