The Claude Opus 4.8 API went live with the model launch on May 28, 2026. The model ID is claude-opus-4-8, and it runs on the same Messages API you already know. This guide walks through the full setup: getting a key, your first call, the new effort parameter, adaptive thinking, streaming, tool use, and testing the whole thing in Apidog.
If you’ve called any Claude model before, the only string that changes is the model name. The one new concept is effort control, and it’s worth ten minutes to understand because it replaces the old thinking-budget pattern. New to the Claude API? You can be making working Opus 4.8 calls in about ten minutes. For background on the model itself, see what is Claude Opus 4.8.
What you get with the Opus 4.8 API
The numbers that shape your integration:
claude-opus-4-8: 1M token input context, 128K token output- Same Messages endpoint: drop-in for projects already calling Opus 4.7
effortcontrol: five levels fromlowtomax, set per request- Adaptive thinking: the model decides how deeply to reason
- Standard pricing: $5 per million input tokens, $25 per million output tokens
For the full cost math and fast-mode rates, see the Opus 4.8 pricing guide. If you don’t have a paid plan yet, the free access guide covers your options.
Step 1: Get your Claude API key
- Go to console.anthropic.com
- Sign in or create an account
- Open Settings, then API Keys
- Click Create Key, name it, and copy it
Store the key in an environment variable so it never lands in your code:
export ANTHROPIC_API_KEY="sk-ant-..."
New accounts get trial credits to test against before you add billing. The key works against claude-opus-4-8 immediately.
Step 2: Install the SDK
Anthropic ships official SDKs for Python, TypeScript, Go, Java, C#, Ruby, and PHP. Pick your language:
# Python
pip install anthropic
# Node.js / TypeScript
npm install @anthropic-ai/sdk
You can skip the SDK entirely and call the REST endpoint with curl, shown below. The Python SDK source is the reference if you need exact types.
Step 3: Make your first Opus 4.8 call
Python
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=[
{"role": "user", "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."}
],
)
print(message.content[0].text)
Node.js
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 4096,
messages: [
{ role: "user", content: "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs." },
],
});
console.log(message.content[0].text);
curl
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data '{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{"role": "user", "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."}
]
}'
That’s the happy path. From here you layer on the features you need.
Effort control: the one new parameter
The effort parameter controls how many tokens Opus 4.8 spends across the entire response: text, tool calls, and reasoning. It lives inside output_config and accepts low, medium, high, xhigh, and max. The default is high, so omitting it gives you high behavior.
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=8192,
messages=[{"role": "user", "content": "Refactor this 600-line module for testability."}],
output_config={"effort": "xhigh"},
)
Node:
const message = await client.messages.create({
model: "claude-opus-4-8",
max_tokens: 8192,
messages: [{ role: "user", content: "Refactor this 600-line module for testability." }],
output_config: { effort: "xhigh" },
});
How to choose, per Anthropic’s effort docs:
| Level | Use it for |
|---|---|
low |
Classification, quick lookups, high-volume jobs, subagents |
medium |
Balanced agentic work where cost matters |
high |
Default. Complex reasoning where quality beats speed |
xhigh |
Coding and long-horizon agentic tasks; the recommended starting point |
max |
Genuinely frontier problems where you’ve measured headroom |
Two practical rules. Start at xhigh for coding and agentic loops. When you run xhigh or max, set a large max_tokens (64K is a reasonable starting point) so the model has room to think and act.
Adaptive thinking
Opus 4.8 uses adaptive thinking. Set thinking: {type: "adaptive"} and the model decides when and how much to reason. Without it, requests run with no thinking.
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages=[{"role": "user", "content": "Find the race condition in this scheduler."}],
)
for block in message.content:
if block.type == "thinking":
print("[thinking]", block.thinking[:200])
elif block.type == "text":
print(block.text)
One migration trap: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If you carried that over from Opus 4.5 or earlier, delete the budget_tokens field and use adaptive thinking with effort instead.
Streaming responses
Streaming makes Opus 4.8 feel fast in a UI. The SDK gives you a helper:
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=4096,
messages=[{"role": "user", "content": "Write a 5-step guide to writing a REST client in Go."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Node:
const stream = client.messages.stream({
model: "claude-opus-4-8",
max_tokens: 4096,
messages: [{ role: "user", content: "Write a 5-step guide to writing a REST client in Go." }],
});
for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
For raw REST, add "stream": true to the request body and read the server-sent events.
Tool use and function calling
Opus 4.8 calls tools more efficiently than 4.7, and the effort level shapes how many calls it makes. Define a tool with an input_schema:
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
}
]
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather in Singapore right now?"}],
)
for block in message.content:
if block.type == "tool_use":
print(f"Call: {block.name}")
print(f"Args: {block.input}")
You run the tool locally, append a tool_result block, and call again to continue. Lower effort makes Claude batch operations into fewer calls; higher effort makes it explain its plan first. If you’re building multi-agent systems, our managed agents vs Agent SDK guide covers the architecture choices.
Mid-conversation system messages
Opus 4.8 ships with a Messages API change: you can now place a system entry partway through the messages array, not only at the start. That lets you inject new instructions or permissions mid-task, which is the foundation for Claude Code’s Dynamic Workflows. If you’re orchestrating subagents through the API, read the Dynamic Workflows deep-dive for the full pattern.
Testing your Opus 4.8 integration with Apidog
A working SDK call is step one. Production integrations have to handle the messy parts: streamed chunks, tool-call validation, the new output_config shape, and adaptive-thinking blocks in the response. That’s where a real testing setup pays back.
Apidog handles the full Messages API surface in one workspace:
- Save the endpoint as a request: paste
https://api.anthropic.com/v1/messages, attach yourx-api-keyandanthropic-versionheaders, hit Send - Replay across model versions: swap
claude-opus-4-7forclaude-opus-4-8on the same request and diff outputs - Stream responses inline: Apidog renders streamed chunks as they arrive, with per-chunk timings
- Validate response shape: add assertions that catch drift when you change
effortlevels or toggle thinking - Mock the endpoint: generate a mock Messages response so you can test downstream code without spending credits
- Build agent-loop scenarios: chain calls with tool-call validation between steps
To start, download Apidog, create a request pointing at the Messages endpoint, and import the curl snippet from earlier. Setup takes about two minutes. The same flow works for the Gemini 3.5 API and Qwen 3.7 API if you run more than one provider.
Error handling and rate limits
Claude’s error model is consistent. The codes that matter:
- 400
invalid_request_error: malformed body, oftenbudget_tokenson Opus 4.8 or a badeffortvalue - 401
authentication_error: bad or missing API key - 403
permission_error: your key can’t access the model - 429
rate_limit_error: back off and retry - 500
api_error: server side, retry with backoff - 529
overloaded_error: the API is temporarily overloaded, retry with backoff
Wrap calls with a retry loop and exponential backoff:
import time
import anthropic
client = anthropic.Anthropic()
def call_with_retry(prompt, max_retries=4):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}],
)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
Rate limits scale with your usage tier. For high-throughput batch jobs that don’t need real-time latency, the Batch API also unlocks up to 300K output tokens with a beta header.
Migrating from Opus 4.7 to 4.8
Most projects change exactly one string:
# Before
model="claude-opus-4-7"
# After
model="claude-opus-4-8"
What to verify after the swap:
- Effort levels: behavior is the same range as 4.7, but rerun your evals at the level you use
- Thinking config: if you ever set
budget_tokens, remove it; Opus 4.8 rejects it with a 400 - Tool schemas: they carry forward, but rerun your tool-use eval
- Cost: identical per-token rates to 4.7, so no billing surprise
FAQ
What is the Claude Opus 4.8 API model ID? claude-opus-4-8 on the Claude API and Vertex AI, and anthropic.claude-opus-4-8 on AWS Bedrock.
Is there a free tier for the Opus 4.8 API? No standing free API tier, but new accounts get trial credits. See the free access guide for other low-cost paths.
How do I set the effort level? Pass output_config: {"effort": "xhigh"} (or low, medium, high, max) in the request. The default is high.
Why does my request return a 400 about budget_tokens? Opus 4.8 doesn’t support manual extended thinking. Remove budget_tokens and use thinking: {type: "adaptive"} with the effort parameter.
Does Opus 4.8 work with the OpenAI-compatible SDK? Anthropic provides a compatibility layer for the OpenAI SDK. Point the base URL at the Anthropic endpoint and use your Anthropic key; keep the model string claude-opus-4-8.
What max_tokens should I set for agentic work? Start at 64K when running xhigh or max effort so the model has room to think and chain tool calls. Tune down once you see real usage.
How do I test streaming responses in Apidog? Open the request, enable streaming in the body, and Apidog renders the server-sent event chunks as they arrive, which makes incomplete responses easy to spot.
