How to use Claude Sonnet 4.6 API?

TL;DR / Quick Answer

Claude Sonnet 4.6 is Anthropic's latest mid-tier model, combining frontier-level coding performance with a 1M token context window (beta) at just $3/$15 per million input/output tokens. To start using the API: 1) Get an API key from console.anthropic.com, 2) Install the SDK (pip install anthropic), 3) Use model ID claude-sonnet-4-6, and 4) Switch to adaptive thinking (thinking: {type: "adaptive"}) for best results. Early testers preferred it over Sonnet 4.5 by 70% and even over Opus 4.5 by 59%.

Introduction

Anthropic released Claude Sonnet 4.6 , and it immediately reshapes the mid-tier AI model category. This isn't an incremental update—it's a model that beats the previous premium-tier Opus 4.5 in head-to-head testing by 59% according to early adopters, all while keeping Sonnet's price tag.

The headline changes: a 1M token context window entering beta, a new adaptive thinking mode that replaces the old binary extended thinking approach, and a suite of tools—web search, code execution, memory, and tool search—graduating to general availability. For developers building agentic applications, Sonnet 4.6 delivers the capabilities previously reserved for expensive frontier models at roughly a third of the cost.

The coding improvements are tangible. Users report better instruction-following in code generation, smarter context comprehension before making modifications, and reduced code duplication through automatic logic consolidation. Computer use reaches 94% accuracy on complex insurance workflows. The SWE-bench Verified score lands at 79.6%.

This guide covers everything you need to start building with the Claude Sonnet 4.6 API today: authentication, practical code examples in Python and JavaScript, the new adaptive thinking parameter, how to unlock the 1M context window, and how to test your integration with Apidog's visual API client.

💡

Build faster with Apidog. Testing Claude API calls manually is slow. Download Apidog to configure requests, inspect streaming responses, debug authentication, and generate SDK code—all from a single interface. No boilerplate required.

button

What's New in Claude Sonnet 4.6

Adaptive Thinking Mode

The old thinking: {type: "enabled", budget_tokens: N} pattern is deprecated on Sonnet 4.6. The replacement is adaptive thinking: thinking: {type: "adaptive"}. Claude now decides dynamically how much reasoning a task needs.

Pair adaptive thinking with the effort parameter (available in GA now) to tune cost versus performance:

effort: "high" (default) — Claude almost always thinks, best for complex problems
effort: "medium" — recommended for most Sonnet 4.6 use cases, balances speed and quality
effort: "low" — minimal thinking, fastest responses for simple tasks

Improved Coding Performance

Sonnet 4.6 brings three concrete improvements to code generation:

Better instruction-following — generates code matching specifications more precisely
Context comprehension — reads and understands existing code before modifying it, reducing regressions
Logic consolidation — identifies duplicate patterns and suggests shared abstractions

Early testers running coding benchmarks reported preferring Sonnet 4.6 outputs over Sonnet 4.5 in 70% of cases and over Opus 4.5 in 59% of cases.

Computer Use Improvements

Computer use accuracy reaches 72.5% on OSWorld-Verified (within 0.2% of Opus 4.6), up significantly from Sonnet 4.5. The model shows 94% accuracy on insurance workflows requiring UI navigation, spreadsheet manipulation, and multi-step form completion. It's also more resistant to prompt injection attacks during automated tasks.

ARC-AGI-2 Breakthrough

The most striking benchmark number: ARC-AGI-2 performance jumps from 13.6% to 58.3% — a 4.3x improvement. This measures novel problem-solving on tasks the model hasn't seen patterns for, suggesting genuine reasoning improvements rather than memorization.

API Specs and Pricing

Feature	Value
API model ID	`claude-sonnet-4-6`
AWS Bedrock ID	`anthropic.claude-sonnet-4-6`
GCP Vertex AI ID	`claude-sonnet-4-6`
Context window	200K tokens (1M beta with header)
Max output tokens	64K
Input pricing	$3 / million tokens
Output pricing	$15 / million tokens
Prompt caching savings	Up to 90%
Batch API savings	Up to 50%
Knowledge cutoff (reliable)	August 2025
Training data cutoff	January 2026
Extended thinking	Yes
Adaptive thinking	Yes
Priority Tier	Yes

Cost reduction options:

Prompt caching: Cache static portions of your system prompt and save up to 90%
Batch API: Process requests asynchronously for 50% off
Long context pricing: Requests exceeding 200K tokens use a separate long-context rate

For production budgets: a million-token conversation in adaptive thinking mode at effort: "medium" costs roughly $3 in input tokens. Most single API calls fall well under a cent.

Getting Started with the Claude Sonnet 4.6 API

Step 1: Get Your API Key

Log into platform.anthropic.com
Navigate to API Keys in the settings
Click Create Key and copy the value immediately (it's only shown once)

Store your key as an environment variable—never hardcode it:

export ANTHROPIC_API_KEY="sk-ant-..."

Step 2: Install the SDK

Python:

pip install anthropic

JavaScript/Node.js:

npm install @anthropic-ai/sdk

Step 3: Make Your First Request

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain the difference between async/await and promises in JavaScript."}
    ]
)

print(response.content[0].text)

That's the minimum viable call. The response object includes usage stats (input tokens, output tokens), stop reason, and model version.

Python Code Examples

Basic Text Generation

import anthropic

client = anthropic.Anthropic()

def ask_claude(question: str, system: str = None) -> str:
    """Simple wrapper for Claude Sonnet 4.6 text generation."""
    messages = [{"role": "user", "content": question}]

    kwargs = {
        "model": "claude-sonnet-4-6",
        "max_tokens": 2048,
        "messages": messages,
    }
    if system:
        kwargs["system"] = system

    response = client.messages.create(**kwargs)
    return response.content[0].text

# Example usage
answer = ask_claude(
    "Review this Python function for performance issues:\n\ndef find_duplicates(lst):\n    return [x for x in lst if lst.count(x) > 1]",
    system="You are a senior Python engineer. Be specific and provide corrected code."
)
print(answer)

Streaming Responses

For long outputs or real-time UX, use streaming:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": "Write a complete REST API handler in FastAPI for user authentication with JWT."
    }]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get final message with usage stats after stream completes
message = stream.get_final_message()
print(f"\n\nTokens used: {message.usage.input_tokens} in, {message.usage.output_tokens} out")

Tool Calling / Function Use

import anthropic
import json

client = anthropic.Anthropic()

# Define tools
tools = [
    {
        "name": "get_repository_info",
        "description": "Fetch information about a GitHub repository including stars, forks, and recent commits.",
        "input_schema": {
            "type": "object",
            "properties": {
                "owner": {
                    "type": "string",
                    "description": "Repository owner or organization name"
                },
                "repo": {
                    "type": "string",
                    "description": "Repository name"
                }
            },
            "required": ["owner", "repo"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{
        "role": "user",
        "content": "What can you tell me about the anthropics/anthropic-sdk-python repository?"
    }]
)

# Handle tool use response
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool called: {block.name}")
        print(f"Arguments: {json.dumps(block.input, indent=2)}")
        # In production, call your actual implementation here
        # result = get_repository_info(block.input["owner"], block.input["repo"])

Vision and Image Analysis

import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

def analyze_image(image_path: str, question: str) -> str:
    """Analyze an image with Claude Sonnet 4.6."""
    image_data = base64.standard_b64encode(Path(image_path).read_bytes()).decode("utf-8")

    # Detect media type from extension
    ext = Path(image_path).suffix.lower()
    media_type_map = {
        ".jpg": "image/jpeg",
        ".jpeg": "image/jpeg",
        ".png": "image/png",
        ".gif": "image/gif",
        ".webp": "image/webp"
    }
    media_type = media_type_map.get(ext, "image/jpeg")

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": question
                }
            ],
        }]
    )
    return response.content[0].text

# Example: analyze a UI screenshot for accessibility issues
result = analyze_image(
    "screenshot.png",
    "Identify any accessibility issues in this UI design. Check contrast ratios, missing alt text indicators, and keyboard navigation concerns."
)
print(result)

JavaScript/Node.js Examples

Basic Setup and Request

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY, // default, shown explicitly for clarity
});

async function askClaude(userMessage, systemPrompt = null) {
  const params = {
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    messages: [{ role: "user", content: userMessage }],
  };

  if (systemPrompt) {
    params.system = systemPrompt;
  }

  const response = await client.messages.create(params);
  return response.content[0].text;
}

// Usage
const answer = await askClaude(
  "Refactor this Express route to use async/await:\n\napp.get('/users', (req, res) => {\n  User.find({}, (err, users) => {\n    if (err) return res.status(500).send(err);\n    res.json(users);\n  });\n});",
  "You are a senior Node.js developer. Return only the refactored code with a brief explanation."
);

console.log(answer);

Streaming with TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function streamCodeReview(codeSnippet: string): Promise<void> {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-6",
    max_tokens: 4096,
    messages: [
      {
        role: "user",
        content: `Perform a thorough code review of this TypeScript function:\n\n\`\`\`typescript\n${codeSnippet}\n\`\`\`\n\nFocus on: type safety, edge cases, performance, and security.`,
      },
    ],
  });

  // Stream text as it arrives
  stream.on("text", (text) => {
    process.stdout.write(text);
  });

  // Get final stats
  const finalMessage = await stream.finalMessage();
  console.log(
    `\n\n---\nTotal tokens: ${finalMessage.usage.input_tokens + finalMessage.usage.output_tokens}`
  );
}

Multi-turn Conversation

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

class ConversationManager {
  constructor(systemPrompt = null) {
    this.messages = [];
    this.systemPrompt = systemPrompt;
  }

  async chat(userMessage) {
    this.messages.push({ role: "user", content: userMessage });

    const params = {
      model: "claude-sonnet-4-6",
      max_tokens: 2048,
      messages: this.messages,
    };

    if (this.systemPrompt) {
      params.system = this.systemPrompt;
    }

    const response = await client.messages.create(params);
    const assistantMessage = response.content[0].text;

    // Maintain conversation history
    this.messages.push({ role: "assistant", content: assistantMessage });

    return assistantMessage;
  }
}

// Example: multi-turn debugging session
const debugSession = new ConversationManager(
  "You are an expert debugger. Ask clarifying questions and walk through issues step by step."
);

console.log(await debugSession.chat("My API keeps returning 401 errors."));
console.log(await debugSession.chat("I'm including the Authorization header."));
console.log(
  await debugSession.chat("The token is coming from localStorage after login.")
);

Adaptive Thinking: The New Extended Thinking

Adaptive thinking replaces the old extended thinking model on Sonnet 4.6. The key difference: instead of setting a fixed token budget for thinking, you set an effort level and Claude determines how much reasoning the problem actually warrants.

How to Use Adaptive Thinking

import anthropic

client = anthropic.Anthropic()

# Recommended: adaptive thinking with medium effort for most use cases
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    effort="medium",  # options: "low", "medium", "high" (default: high)
    messages=[{
        "role": "user",
        "content": """
        Design a database schema for a SaaS analytics platform that needs to:
        - Track events from millions of users
        - Support real-time queries on the last 24 hours
        - Archive historical data cost-effectively
        - Handle tenant isolation for enterprise customers
        """
    }]
)

# Thinking blocks appear before the text response
for block in response.content:
    if block.type == "thinking":
        print(f"[Claude's reasoning - {len(block.thinking)} chars]")
    elif block.type == "text":
        print(block.text)

Effort Levels in Practice

Effort	Best For	Relative Cost	Relative Speed
`low`	Classification, simple Q&A, formatting	1x	Fastest
`medium`	Code generation, analysis, most tasks	1.5-2x	Fast
`high`	Architecture decisions, complex debugging, math	3-5x	Moderate

Migration note: If you're using thinking: {type: "enabled", budget_tokens: N}, that syntax still works on Sonnet 4.6 but is deprecated. Migrate to thinking: {type: "adaptive"} with effort before the next major release removes it.

1M Token Context Window

The 1M token context window lets you feed Claude entire codebases, extensive document sets, or months of conversation history. That's roughly 750,000 words or the equivalent of 5–10 full codebases in a single request.

How to Enable 1M Context

Pass the context-1m-2025-08-07 beta header in your request:

import anthropic

client = anthropic.Anthropic()

# Read an entire large codebase
with open("large_codebase.txt", "r") as f:
    codebase_content = f.read()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["context-1m-2025-08-07"],
    messages=[{
        "role": "user",
        "content": f"""
        Here is our entire backend codebase:\n\n{codebase_content}\n\n
        Find all database queries that could cause N+1 problems and suggest fixes.
        """
    }]
)

print(response.content[0].text)

// JavaScript equivalent
const response = await client.beta.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  betas: ["context-1m-2025-08-07"],
  messages: [
    {
      role: "user",
      content: `Review this entire codebase for security vulnerabilities:\n\n${codebaseContent}`,
    },
  ],
});

What 1M Tokens Enables

Full codebase analysis: Send your entire repo and ask Claude to find bugs, suggest refactors, or generate tests
Long document processing: Analyze full legal contracts, financial reports, or research papers
Extended agentic sessions: Keep full history of long multi-step tasks without losing context
Cross-file dependency tracing: Find all usages of a function or class across a large project

Pricing note: Requests exceeding 200K tokens use long-context pricing. Plan accordingly for high-volume use.

Web Search and Dynamic Filtering

Web search and web fetch tools now support dynamic filtering in public beta on Sonnet 4.6. Claude writes and executes code to filter search results before they enter the context window—keeping only relevant information and cutting token usage significantly.

Code execution is free when used alongside web search or web fetch tools (no separate billing).

Setting Up Dynamic Web Search

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["code-execution-web-tools-2026-02-09"],
    tools=[
        {
            "type": "web_search_20260209",  # Use this version for dynamic filtering
            "name": "web_search",
        }
    ],
    messages=[{
        "role": "user",
        "content": "Find the latest CVEs for Apache Log4j published in the last 30 days and summarize the severity levels."
    }]
)

for block in response.content:
    if hasattr(block, "text"):
        print(block.text)

Web Fetch with Dynamic Filtering

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    betas=["code-execution-web-tools-2026-02-09"],
    tools=[
        {
            "type": "web_fetch_20260209",  # New version with dynamic filtering
            "name": "web_fetch",
        }
    ],
    messages=[{
        "role": "user",
        "content": "Fetch the Anthropic pricing page and extract only the Claude Sonnet pricing rows."
    }]
)

print(response.content[-1].text)

Why dynamic filtering matters: Without filtering, fetching a full web page might consume 100K tokens for a page where you only need 2K tokens of relevant content. Dynamic filtering lets Claude parse the page in code and return only what's needed, reducing costs by 90%+ on content-heavy pages.

Context Compaction API

Context compaction handles long-running agentic sessions where context accumulates beyond the window limit. The API automatically summarizes older parts of the conversation server-side when approaching the limit, enabling effectively unlimited conversation length.

import anthropic

client = anthropic.Anthropic()

# Enable context compaction via beta header
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    betas=["interleaved-thinking-2025-05-14"],  # Enables compaction in beta
    system="You are a long-running software development agent. Maintain context about the codebase changes made during this session.",
    messages=[
        {"role": "user", "content": "Start a refactoring session for our authentication module."},
        # ... many more turns would go here in a real session
    ]
)

# The compaction_details field shows when summarization occurred
if hasattr(response, "usage") and hasattr(response.usage, "cache_read_input_tokens"):
    print(f"Context tokens saved via caching: {response.usage.cache_read_input_tokens}")

When to use compaction: Any agentic workflow that runs for more than a few minutes and accumulates history—CI/CD agents, long coding sessions, customer support threads, or multi-step research tasks.

Testing Claude Sonnet 4.6 with Apidog

Before writing SDK code, use Apidog to explore the Claude Sonnet 4.6 API visually. Apidog speeds up development by letting you configure headers, build request bodies, and inspect streaming responses without boilerplate.

Setting Up the Anthropic API in Apidog

Open Apidog and create a new HTTP request
Set the method to POST and the URL to https://api.anthropic.com/v1/messages
Add these headers:

Header	Value
`x-api-key`	`{{ANTHROPIC_API_KEY}}`
`anthropic-version`	`2023-06-01`
`Content-Type`	`application/json`

Set the request body to JSON:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": "What are the key improvements in Claude Sonnet 4.6?"
    }
  ]
}

Testing Adaptive Thinking

To test adaptive thinking mode in Apidog, add the thinking and effort fields to your request body:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 8192,
  "thinking": {
    "type": "adaptive"
  },
  "effort": "medium",
  "messages": [
    {
      "role": "user",
      "content": "Design a rate limiting strategy for a public API serving 10M requests per day."
    }
  ]
}

Testing Beta Features in Apidog

For beta features (1M context, dynamic web search, context compaction), add the beta header:

Header	Value
`anthropic-beta`	`context-1m-2025-08-07`

Or for web search dynamic filtering:

Header	Value
`anthropic-beta`	`code-execution-web-tools-2026-02-09`

Apidog lets you save these configurations as presets and share them with your team, so everyone has consistent API testing environments.

💡

Try Apidog free — Download Apidog to test your Claude Sonnet 4.6 integrations with visual request building, auto-generated code snippets, and team collaboration features. No credit card required.

button

New Tools Now in GA

Several tools that were previously in beta are now generally available on Sonnet 4.6, meaning no special beta headers required:

Code Execution Tool

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250522", "name": "code_execution"}],
    messages=[{
        "role": "user",
        "content": "Calculate the compound interest on $10,000 at 5% annual rate over 10 years, compounding monthly. Show the year-by-year breakdown."
    }]
)

for block in response.content:
    if block.type == "tool_result":
        print("Execution output:", block.content)
    elif hasattr(block, "text"):
        print(block.text)

Memory Tool

The memory tool lets Claude persist information across conversation sessions, useful for building stateful assistants:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=[{"type": "memory_20250416", "name": "memory"}],
    messages=[{
        "role": "user",
        "content": "Remember that our API base URL is https://api.company.com/v2 and we require Bearer token auth on all endpoints."
    }]
)

Programmatic Tool Calling

Programmatic tool calling lets Claude generate structured API calls directly:

import anthropic

client = anthropic.Anthropic()

# Claude can now call tools programmatically without human-in-the-loop
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=[
        {
            "name": "execute_sql",
            "description": "Execute a SQL query and return results",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The SQL query to execute"},
                    "database": {"type": "string", "description": "Database name"}
                },
                "required": ["query", "database"]
            }
        }
    ],
    messages=[{
        "role": "user",
        "content": "Find all users who signed up in January 2026 and haven't made a purchase."
    }]
)

Sonnet 4.6 vs Previous Models

	Claude Sonnet 4.6	Claude Sonnet 4.5	Claude Opus 4.5
API ID	`claude-sonnet-4-6`	`claude-sonnet-4-5-20250929`	`claude-opus-4-5-20251101`
Input price	$3/MTok	$3/MTok	$5/MTok
Output price	$15/MTok	$15/MTok	$25/MTok
Context window	200K / 1M (beta)	200K / 1M (beta)	200K
Max output	64K	64K	64K
Adaptive thinking	Yes	No	No
Extended thinking	Yes (deprecated)	Yes	Yes
SWE-bench score	79.6%	~72%	~76%
OSWorld (computer use)	72.5%	~65%	~72.7%
Web search filtering	Yes (beta)	No	No
GA tools	Code exec, web fetch, memory, tool search	Fewer GA	Full suite
Preferred by users vs Sonnet 4.5	70%	—	—
Preferred by users vs Opus 4.5	59%	—	—

Bottom line: If you're on Sonnet 4.5, upgrading to Sonnet 4.6 is a no-brainer—same price, meaningfully better coding performance, and adaptive thinking. If you're on Opus 4.5, Sonnet 4.6 now matches or beats it in most use cases at 60% of the cost.

Best Practices and Tips

1. Use Adaptive Thinking at Medium Effort by Default

For most Sonnet 4.6 use cases, effort: "medium" provides the best cost-performance balance. Reserve effort: "high" for genuinely complex tasks like architectural design, multi-step reasoning chains, or mathematical proofs.

# Good default pattern for Sonnet 4.6
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    effort="medium",  # Sweet spot for most tasks
    messages=[...]
)

2. Use Streaming for Outputs Over 1K Tokens

Large responses benefit from streaming to avoid HTTP timeouts and give users faster perceived response times:

# For any response expected to be long
with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=8192,
    messages=[...]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

3. Enable Prompt Caching for Repeated System Prompts

If you use the same system prompt across many calls, cache it to save up to 90%:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "Your lengthy system prompt here...",
            "cache_control": {"type": "ephemeral"}  # Cache this block
        }
    ],
    messages=[{"role": "user", "content": user_input}]
)

4. Handle Rate Limits Gracefully

Sonnet 4.6 has generous rate limits, but production systems should implement exponential backoff:

import anthropic
import time

def create_with_retry(client, max_retries=3, **kwargs):
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + 1  # 2s, 3s, 5s
            time.sleep(wait_time)

5. Migrate Away from Prefill on Sonnet 4.6

Prefilling assistant messages (last-turn prefills) is not supported on Claude 4.6 models. Use structured outputs instead:

# Old approach (breaks on Claude 4.6):
# messages = [
#     {"role": "user", "content": "Give me JSON..."},
#     {"role": "assistant", "content": "{"}  # Prefill
# ]

# New approach: use structured outputs
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "score": {"type": "number"}
                }
            }
        }
    },
    messages=[{"role": "user", "content": "Generate a sample user object."}]
)

6. Specify AWS or GCP for Data Residency

If your compliance requirements need US-only inference, use the inference_geo parameter:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    inference_geo="us",  # Ensures inference runs in US data centers
    messages=[...]
)

US-only inference is priced at 1.1x the standard rate on Sonnet 4.6.

Ready to build with Claude Sonnet 4.6? Download Apidog free to test your API calls visually, collaborate with your team on request configurations, and auto-generate SDK code—no credit card required.

button