How to Use the Claude Opus 4.8 API?

Complete Claude Opus 4.8 API guide: get an API key, make your first call in Python/Node/curl, use the effort parameter and adaptive thinking, handle streaming, tool use, and errors.

Ashley Innocent

Ashley Innocent

29 May 2026

How to Use the Claude Opus 4.8 API?

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

The Claude Opus 4.8 API went live with the model launch on May 28, 2026. The model ID is claude-opus-4-8, and it runs on the same Messages API you already know. This guide walks through the full setup: getting a key, your first call, the new effort parameter, adaptive thinking, streaming, tool use, and testing the whole thing in Apidog.

If you’ve called any Claude model before, the only string that changes is the model name. The one new concept is effort control, and it’s worth ten minutes to understand because it replaces the old thinking-budget pattern. New to the Claude API? You can be making working Opus 4.8 calls in about ten minutes. For background on the model itself, see what is Claude Opus 4.8.

What you get with the Opus 4.8 API

The numbers that shape your integration:

For the full cost math and fast-mode rates, see the Opus 4.8 pricing guide. If you don’t have a paid plan yet, the free access guide covers your options.

Step 1: Get your Claude API key

  1. Go to console.anthropic.com
  2. Sign in or create an account
  3. Open Settings, then API Keys
  4. Click Create Key, name it, and copy it

Store the key in an environment variable so it never lands in your code:

export ANTHROPIC_API_KEY="sk-ant-..."

New accounts get trial credits to test against before you add billing. The key works against claude-opus-4-8 immediately.

Step 2: Install the SDK

Anthropic ships official SDKs for Python, TypeScript, Go, Java, C#, Ruby, and PHP. Pick your language:

# Python
pip install anthropic

# Node.js / TypeScript
npm install @anthropic-ai/sdk

You can skip the SDK entirely and call the REST endpoint with curl, shown below. The Python SDK source is the reference if you need exact types.

Step 3: Make your first Opus 4.8 call

Python

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."}
    ],
)

print(message.content[0].text)

Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [
    { role: "user", content: "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs." },
  ],
});

console.log(message.content[0].text);

curl

curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-opus-4-8",
    "max_tokens": 4096,
    "messages": [
      {"role": "user", "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."}
    ]
  }'

That’s the happy path. From here you layer on the features you need.

Effort control: the one new parameter

The effort parameter controls how many tokens Opus 4.8 spends across the entire response: text, tool calls, and reasoning. It lives inside output_config and accepts low, medium, high, xhigh, and max. The default is high, so omitting it gives you high behavior.

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Refactor this 600-line module for testability."}],
    output_config={"effort": "xhigh"},
)

Node:

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 8192,
  messages: [{ role: "user", content: "Refactor this 600-line module for testability." }],
  output_config: { effort: "xhigh" },
});

How to choose, per Anthropic’s effort docs:

Level Use it for
low Classification, quick lookups, high-volume jobs, subagents
medium Balanced agentic work where cost matters
high Default. Complex reasoning where quality beats speed
xhigh Coding and long-horizon agentic tasks; the recommended starting point
max Genuinely frontier problems where you’ve measured headroom

Two practical rules. Start at xhigh for coding and agentic loops. When you run xhigh or max, set a large max_tokens (64K is a reasonable starting point) so the model has room to think and act.

Adaptive thinking

Opus 4.8 uses adaptive thinking. Set thinking: {type: "adaptive"} and the model decides when and how much to reason. Without it, requests run with no thinking.

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "Find the race condition in this scheduler."}],
)

for block in message.content:
    if block.type == "thinking":
        print("[thinking]", block.thinking[:200])
    elif block.type == "text":
        print(block.text)

One migration trap: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If you carried that over from Opus 4.5 or earlier, delete the budget_tokens field and use adaptive thinking with effort instead.

Streaming responses

Streaming makes Opus 4.8 feel fast in a UI. The SDK gives you a helper:

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Write a 5-step guide to writing a REST client in Go."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Node:

const stream = client.messages.stream({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [{ role: "user", content: "Write a 5-step guide to writing a REST client in Go." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

For raw REST, add "stream": true to the request body and read the server-sent events.

Tool use and function calling

Opus 4.8 calls tools more efficiently than 4.7, and the effort level shapes how many calls it makes. Define a tool with an input_schema:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["city"],
        },
    }
]

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Singapore right now?"}],
)

for block in message.content:
    if block.type == "tool_use":
        print(f"Call: {block.name}")
        print(f"Args: {block.input}")

You run the tool locally, append a tool_result block, and call again to continue. Lower effort makes Claude batch operations into fewer calls; higher effort makes it explain its plan first. If you’re building multi-agent systems, our managed agents vs Agent SDK guide covers the architecture choices.

Mid-conversation system messages

Opus 4.8 ships with a Messages API change: you can now place a system entry partway through the messages array, not only at the start. That lets you inject new instructions or permissions mid-task, which is the foundation for Claude Code’s Dynamic Workflows. If you’re orchestrating subagents through the API, read the Dynamic Workflows deep-dive for the full pattern.

Testing your Opus 4.8 integration with Apidog

A working SDK call is step one. Production integrations have to handle the messy parts: streamed chunks, tool-call validation, the new output_config shape, and adaptive-thinking blocks in the response. That’s where a real testing setup pays back.

Apidog handles the full Messages API surface in one workspace:

To start, download Apidog, create a request pointing at the Messages endpoint, and import the curl snippet from earlier. Setup takes about two minutes. The same flow works for the Gemini 3.5 API and Qwen 3.7 API if you run more than one provider.

Error handling and rate limits

Claude’s error model is consistent. The codes that matter:

Wrap calls with a retry loop and exponential backoff:

import time
import anthropic

client = anthropic.Anthropic()

def call_with_retry(prompt, max_retries=4):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-opus-4-8",
                max_tokens=4096,
                messages=[{"role": "user", "content": prompt}],
            )
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

Rate limits scale with your usage tier. For high-throughput batch jobs that don’t need real-time latency, the Batch API also unlocks up to 300K output tokens with a beta header.

Migrating from Opus 4.7 to 4.8

Most projects change exactly one string:

# Before
model="claude-opus-4-7"

# After
model="claude-opus-4-8"

What to verify after the swap:

  1. Effort levels: behavior is the same range as 4.7, but rerun your evals at the level you use
  2. Thinking config: if you ever set budget_tokens, remove it; Opus 4.8 rejects it with a 400
  3. Tool schemas: they carry forward, but rerun your tool-use eval
  4. Cost: identical per-token rates to 4.7, so no billing surprise

FAQ

What is the Claude Opus 4.8 API model ID? claude-opus-4-8 on the Claude API and Vertex AI, and anthropic.claude-opus-4-8 on AWS Bedrock.

Is there a free tier for the Opus 4.8 API? No standing free API tier, but new accounts get trial credits. See the free access guide for other low-cost paths.

How do I set the effort level? Pass output_config: {"effort": "xhigh"} (or low, medium, high, max) in the request. The default is high.

Why does my request return a 400 about budget_tokens? Opus 4.8 doesn’t support manual extended thinking. Remove budget_tokens and use thinking: {type: "adaptive"} with the effort parameter.

Does Opus 4.8 work with the OpenAI-compatible SDK? Anthropic provides a compatibility layer for the OpenAI SDK. Point the base URL at the Anthropic endpoint and use your Anthropic key; keep the model string claude-opus-4-8.

What max_tokens should I set for agentic work? Start at 64K when running xhigh or max effort so the model has room to think and chain tool calls. Tune down once you see real usage.

How do I test streaming responses in Apidog? Open the request, enable streaming in the body, and Apidog renders the server-sent event chunks as they arrive, which makes incomplete responses easy to spot.

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

How to Use the Claude Opus 4.8 API?