How Much Does Claude Sonnet 4.6 Really Cost ?

TL;DR

Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens—the same price as Sonnet 4.5, while delivering near-Opus performance. With prompt caching, cache reads drop to $0.30/MTok (90% savings). Batch API cuts costs in half to $1.50/$7.50 per MTok. The 1M token context window (beta) triggers long-context pricing at $6/$22.50 per MTok for requests over 200K tokens.

Claude Sonnet 4.6 Base Pricing

Claude Sonnet 4.6 keeps the same price point as its predecessor while delivering meaningfully better results. Here's the core pricing at a glance:

Pricing Tier	Input Tokens	Output Tokens
Standard	$3.00 / MTok	$15.00 / MTok
Batch API	$1.50 / MTok	$7.50 / MTok
Cache writes (5-min)	$3.75 / MTok	—
Cache writes (1-hour)	$6.00 / MTok	—
Cache reads	$0.30 / MTok	—
Long context >200K (standard)	$6.00 / MTok	$22.50 / MTok
Long context >200K (batch)	$3.00 / MTok	$11.25 / MTok

MTok = million tokens. All prices in USD.

The value story here is hard to ignore. Early testers preferred Sonnet 4.6 over the previous premium model Opus 4.5 in 59% of head-to-head comparisons—at 60% of the cost.

For most coding, analysis, and agentic tasks, you no longer need to pay Opus prices to get Opus-level results.

💡

Testing these requests before writing production code saves money at scale. Download Apidog to run trial API calls against Claude Sonnet 4.6, inspect actual token usage per request, and size your budget accurately before you commit.

button

Full Pricing Breakdown by Feature

Standard API Pricing

The standard rates apply to all synchronous API calls made through the Anthropic API:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this document."}]
)

# Check exact token usage
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")

# Calculate cost
input_cost  = response.usage.input_tokens  / 1_000_000 * 3.00
output_cost = response.usage.output_tokens / 1_000_000 * 15.00
print(f"Request cost: ${input_cost + output_cost:.6f}")

For the typical API call with a 500-token input and 300-token output, cost is roughly $0.0060. That's less than a cent per request at standard rates.

Prompt Caching Pricing

Prompt caching is Sonnet 4.6's most impactful cost lever. It stores portions of your prompt server-side and charges dramatically less on cache hits.

Cache write rates:- 5-minute cache: $3.75/MTok (1.25× base input price) - 1-hour cache: $6.00/MTok (2× base input price)

Cache read rate:- $0.30/MTok — one-tenth of the standard input price

If your system prompt is 10,000 tokens and you process 1,000 requests per day: - Without caching: 10,000 × 1,000 × $3/MTok = $30/day- With caching (write once, read 999×): $3.75 + (999 × 0.30) × 10,000/MTok ≈ $3.04/day

That's a 90% reduction for a static system prompt alone.

import anthropic

client = anthropic.Anthropic()

# Mark expensive static content for caching
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a senior code reviewer specializing in Python, FastAPI, and distributed systems. Here are our coding standards and review guidelines: [large block of standards text]...",
            "cache_control": {"type": "ephemeral"}  # Cache this block
        }
    ],
    messages=[{"role": "user", "content": "Review this pull request: [PR content]"}]
)

# Check what came from cache vs fresh tokens
usage = response.usage
print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens:  {usage.cache_read_input_tokens}")
print(f"Uncached tokens:    {usage.input_tokens}")

When to use which cache duration:- 5-minute cache: High-frequency calls, bursty traffic, short conversation windows - 1-hour cache: Background processing pipelines, batch jobs with longer gaps, agent loops

Batch API Pricing

The Batch API offers a flat 50% discount on both input and output tokens in exchange for asynchronous processing (results available within 24 hours, typically much sooner).

	Standard	Batch API
Input	$3.00/MTok	$1.50/MTok
Output	$15.00/MTok	$7.50/MTok

Best use cases for Batch API:- Content moderation pipelines - Document classification at scale - Overnight data enrichment - Generating embeddings or summaries for large datasets - Any non-interactive processing where latency doesn't matter

At $1.50/$7.50/MTok, processing one million documents each at 500 input tokens and 100 output tokens costs: - Input: 500M tokens × $1.50/MTok = $750- Output: 100M tokens × $7.50/MTok = $750- Total: $1,500 for 1 million documents (~$0.0015 per document)

Batch API: 50% Discount for Non-Real-Time Workloads

Batch processing is straightforward: submit requests, get results asynchronously at half price. The trade-off is latency—results arrive within 24 hours, though usually much faster.

import anthropic, time

client = anthropic.Anthropic()

def batch_classify(texts: list[str]) -> list[str]:
    """Classify a list of texts at Batch API rates."""

    # Submit batch
    requests = [
        {
            "custom_id": f"item-{i}",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 20,
                "messages": [{
                    "role": "user",
                    "content": f"Classify as POSITIVE, NEGATIVE, or NEUTRAL. Reply with one word only.\n\n{text}"
                }]
            }
        }
        for i, text in enumerate(texts)
    ]

    batch = client.messages.batches.create(requests=requests)

    # Poll until complete
    while True:
        status = client.messages.batches.retrieve(batch.id)
        if status.processing_status == "ended":
            break
        time.sleep(60)

    # Collect results in order
    results = {}
    for result in client.messages.batches.results(batch.id):
        if result.result.type == "succeeded":
            results[result.custom_id] = result.result.message.content[0].text.strip()

    return [results.get(f"item-{i}", "ERROR") for i in range(len(texts))]

Long Context (1M Token) Pricing

When you enable the 1M token context window via the context-1m-2025-08-07 beta header, requests exceeding 200K input tokens are charged at a higher rate.

Long Context Rate Table

Input Tokens	Input Price	Output Price
≤ 200K	$3.00/MTok	$15.00/MTok
> 200K	$6.00/MTok	$22.50/MTok

The 200K threshold is based on total input tokens, which includes: - input_tokens (standard input) - cache_creation_input_tokens (if using prompt caching) - cache_read_input_tokens (if using prompt caching)

If the total exceeds 200K, all tokens in that request are charged at the higher rate.

Long Context + Batch API

The Batch API 50% discount stacks with long-context pricing:

Scenario	Input Rate	Output Rate
Standard	$3.00/MTok	$15.00/MTok
Long context (>200K)	$6.00/MTok	$22.50/MTok
Batch API	$1.50/MTok	$7.50/MTok
Long context + Batch	$3.00/MTok	$11.25/MTok

Processing large documents in bulk via Batch API keeps long-context costs manageable.

Tool and Feature Pricing

Several tools carry separate charges beyond token costs.

Web Search Tool

$10.00 per 1,000 searches
+ standard token costs for search-generated content

Each web search call counts as one use regardless of how many results are returned. No charge if the search errors out.

import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    betas=["code-execution-web-tools-2026-02-09"],
    tools=[{"type": "web_search_20260209", "name": "web_search"}],
    messages=[{"role": "user", "content": "What's the latest LLM benchmark news from this week?"}]
)

usage = response.usage
search_count = getattr(usage, 'server_tool_use', {})
print(f"Web searches used: {search_count.get('web_search_requests', 0)}")
# Each search: $0.01

Code Execution Tool

Free when bundled with web search or web fetch (using the web_search_20260209 or web_fetch_20260209 tool versions).

When used standalone: - 1,550 free hours per organization per month - $0.05 per hour per container beyond the free tier - Minimum billing unit: 5 minutes

For most development and testing workloads, the free tier is more than sufficient.

Web Fetch Tool

No additional charges. You only pay standard token costs for content that enters the conversation.

Tool	Additional Cost	Notes
Web search	$10/1K searches	Per-search fee
Web fetch	Free	Token costs only
Code execution (with web tools)	Free	Bundled
Code execution (standalone)	$0.05/hr after 1,550 free hrs/mo	Per container
Computer use overhead	~735 extra input tokens	Per tool definition
Text editor overhead	~700 extra input tokens	Per tool definition

Computer Use Overhead

Computer use adds fixed token overhead: - System prompt addition: 466–499 tokens - Tool definition tokens: 735 tokens per tool (Claude 4.x models)

For a computer use session with 100 turns at 200 tokens/turn plus screenshots: - Tool overhead: 735 tokens × $3/MTok = $0.0022 (negligible) - Screenshot tokens depend on resolution; plan for ~2,000–5,000 tokens per screenshot

Claude Sonnet 4.6 vs All Models: Full Comparison

Current Model Pricing

Model	Input	Output	Cache Read	Batch Input	Batch Output
Claude Sonnet 4.6	$3.00	$15.00	$0.30	$1.50	$7.50
Claude Haiku 4.5	$1.00	$5.00	$0.10	$0.50	$2.50
Claude Opus 4.6	$5.00	$25.00	$0.50	$2.50	$12.50
Claude Opus 4.5	$5.00	$25.00	$0.50	$2.50	$12.50
Claude Opus 4.1	$15.00	$75.00	$1.50	$7.50	$37.50

All prices in USD per million tokens.

Sonnet 4.6 vs Opus 4.6: The Value Question

	Claude Sonnet 4.6	Claude Opus 4.6
Input price	$3/MTok	$5/MTok
Output price	$15/MTok	$25/MTok
Relative cost	1×	1.67×
SWE-bench Verified	79.6%	~80.8%
OSWorld (computer use)	72.5%	72.7%
User preference vs Sonnet 4.5	70%	N/A
User preference vs Opus 4.5	59%	N/A
1M context window	Yes (beta)	Yes (beta)
Adaptive thinking	Yes	Yes
Max output	64K tokens	128K tokens

For the vast majority of tasks—coding, analysis, document processing, agentic workflows—Sonnet 4.6 matches Opus performance at 60% of the price. Opus 4.6 is worth the premium when you need 128K output tokens or the absolute maximum on novel reasoning tasks.

Sonnet 4.6 vs Haiku 4.5: When to Use Each

Use Case	Sonnet 4.6	Haiku 4.5
Complex code generation	✅	⚠️
Simple classification	⚠️ Overkill	✅
Document summarization	✅	✅
Multi-step agentic tasks	✅	❌
High-volume low-complexity	❌ Expensive	✅
Tool calling / function use	✅	✅
Long reasoning chains	✅	❌
Latency-sensitive apps	✅ Fast	✅ Fastest

The smart pattern: use Haiku 4.5 for routing, classification, and simple extraction; route complex tasks to Sonnet 4.6. This hybrid approach typically costs 60–80% less than Sonnet 4.6 for everything.

Testing Costs with Apidog Before Going Live

Before deploying to production, you want to know exactly what each request costs. Apidog's visual API client lets you test Claude Sonnet 4.6 calls, inspect the full response including the usage object, and track token counts per request.

Set Up Cost Visibility in Apidog

Create a new POST request to https://api.anthropic.com/v1/messages
Add headers: x-api-key, anthropic-version: 2023-06-01, Content-Type: application/json
Set the body with your model and messages
Run the request — the response usage object shows exact token counts

{
  "usage": {
    "input_tokens": 523,
    "cache_creation_input_tokens": 5000,
    "cache_read_input_tokens": 0,
    "output_tokens": 312
  }
}

From those numbers, calculate actual cost: - Input: 523 tokens × $3/MTok = $0.00157 - Cache write: 5,000 tokens × $3.75/MTok = $0.01875 - Output: 312 tokens × $15/MTok = $0.00468 - Total first call: $0.025 (subsequent calls with cache hit: ~$0.006)

You can save these requests as a collection in Apidog, share them with your team, and run cost estimates across different prompt variations before finalizing your production design.

Ready to start building? Download Apidog free to test Claude Sonnet 4.6 API calls visually, inspect token usage per request, and size your costs accurately before deploying.

button