TL;DR
Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens—the same price as Sonnet 4.5, while delivering near-Opus performance. With prompt caching, cache reads drop to $0.30/MTok (90% savings). Batch API cuts costs in half to $1.50/$7.50 per MTok. The 1M token context window (beta) triggers long-context pricing at $6/$22.50 per MTok for requests over 200K tokens.
Claude Sonnet 4.6 Base Pricing
Claude Sonnet 4.6 keeps the same price point as its predecessor while delivering meaningfully better results. Here's the core pricing at a glance:
| Pricing Tier | Input Tokens | Output Tokens |
|---|---|---|
| Standard | $3.00 / MTok | $15.00 / MTok |
| Batch API | $1.50 / MTok | $7.50 / MTok |
| Cache writes (5-min) | $3.75 / MTok | — |
| Cache writes (1-hour) | $6.00 / MTok | — |
| Cache reads | $0.30 / MTok | — |
| Long context >200K (standard) | $6.00 / MTok | $22.50 / MTok |
| Long context >200K (batch) | $3.00 / MTok | $11.25 / MTok |
MTok = million tokens. All prices in USD.
The value story here is hard to ignore. Early testers preferred Sonnet 4.6 over the previous premium model Opus 4.5 in 59% of head-to-head comparisons—at 60% of the cost.

For most coding, analysis, and agentic tasks, you no longer need to pay Opus prices to get Opus-level results.
Full Pricing Breakdown by Feature
Standard API Pricing
The standard rates apply to all synchronous API calls made through the Anthropic API:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this document."}]
)
# Check exact token usage
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
# Calculate cost
input_cost = response.usage.input_tokens / 1_000_000 * 3.00
output_cost = response.usage.output_tokens / 1_000_000 * 15.00
print(f"Request cost: ${input_cost + output_cost:.6f}")
For the typical API call with a 500-token input and 300-token output, cost is roughly $0.0060. That's less than a cent per request at standard rates.
Prompt Caching Pricing
Prompt caching is Sonnet 4.6's most impactful cost lever. It stores portions of your prompt server-side and charges dramatically less on cache hits.
Cache write rates:- 5-minute cache: $3.75/MTok (1.25× base input price) - 1-hour cache: $6.00/MTok (2× base input price)
Cache read rate:- $0.30/MTok — one-tenth of the standard input price
If your system prompt is 10,000 tokens and you process 1,000 requests per day: - Without caching: 10,000 × 1,000 × $3/MTok = $30/day- With caching (write once, read 999×): $3.75 + (999 × 0.30) × 10,000/MTok ≈ $3.04/day
That's a 90% reduction for a static system prompt alone.
import anthropic
client = anthropic.Anthropic()
# Mark expensive static content for caching
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a senior code reviewer specializing in Python, FastAPI, and distributed systems. Here are our coding standards and review guidelines: [large block of standards text]...",
"cache_control": {"type": "ephemeral"} # Cache this block
}
],
messages=[{"role": "user", "content": "Review this pull request: [PR content]"}]
)
# Check what came from cache vs fresh tokens
usage = response.usage
print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
print(f"Uncached tokens: {usage.input_tokens}")
When to use which cache duration:- 5-minute cache: High-frequency calls, bursty traffic, short conversation windows - 1-hour cache: Background processing pipelines, batch jobs with longer gaps, agent loops
Batch API Pricing
The Batch API offers a flat 50% discount on both input and output tokens in exchange for asynchronous processing (results available within 24 hours, typically much sooner).
| Standard | Batch API | |
|---|---|---|
| Input | $3.00/MTok | $1.50/MTok |
| Output | $15.00/MTok | $7.50/MTok |
Best use cases for Batch API:- Content moderation pipelines - Document classification at scale - Overnight data enrichment - Generating embeddings or summaries for large datasets - Any non-interactive processing where latency doesn't matter
At $1.50/$7.50/MTok, processing one million documents each at 500 input tokens and 100 output tokens costs: - Input: 500M tokens × $1.50/MTok = $750- Output: 100M tokens × $7.50/MTok = $750- Total: $1,500 for 1 million documents (~$0.0015 per document)
Batch API: 50% Discount for Non-Real-Time Workloads
Batch processing is straightforward: submit requests, get results asynchronously at half price. The trade-off is latency—results arrive within 24 hours, though usually much faster.
import anthropic, time
client = anthropic.Anthropic()
def batch_classify(texts: list[str]) -> list[str]:
"""Classify a list of texts at Batch API rates."""
# Submit batch
requests = [
{
"custom_id": f"item-{i}",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 20,
"messages": [{
"role": "user",
"content": f"Classify as POSITIVE, NEGATIVE, or NEUTRAL. Reply with one word only.\n\n{text}"
}]
}
}
for i, text in enumerate(texts)
]
batch = client.messages.batches.create(requests=requests)
# Poll until complete
while True:
status = client.messages.batches.retrieve(batch.id)
if status.processing_status == "ended":
break
time.sleep(60)
# Collect results in order
results = {}
for result in client.messages.batches.results(batch.id):
if result.result.type == "succeeded":
results[result.custom_id] = result.result.message.content[0].text.strip()
return [results.get(f"item-{i}", "ERROR") for i in range(len(texts))]
Long Context (1M Token) Pricing
When you enable the 1M token context window via the context-1m-2025-08-07 beta header, requests exceeding 200K input tokens are charged at a higher rate.
Long Context Rate Table
| Input Tokens | Input Price | Output Price |
|---|---|---|
| ≤ 200K | $3.00/MTok | $15.00/MTok |
| > 200K | $6.00/MTok | $22.50/MTok |
The 200K threshold is based on total input tokens, which includes: - input_tokens (standard input) - cache_creation_input_tokens (if using prompt caching) - cache_read_input_tokens (if using prompt caching)
If the total exceeds 200K, all tokens in that request are charged at the higher rate.
Long Context + Batch API
The Batch API 50% discount stacks with long-context pricing:
| Scenario | Input Rate | Output Rate |
|---|---|---|
| Standard | $3.00/MTok | $15.00/MTok |
| Long context (>200K) | $6.00/MTok | $22.50/MTok |
| Batch API | $1.50/MTok | $7.50/MTok |
| Long context + Batch | $3.00/MTok | $11.25/MTok |
Processing large documents in bulk via Batch API keeps long-context costs manageable.
Tool and Feature Pricing
Several tools carry separate charges beyond token costs.
Web Search Tool
$10.00 per 1,000 searches
+ standard token costs for search-generated content
Each web search call counts as one use regardless of how many results are returned. No charge if the search errors out.
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
betas=["code-execution-web-tools-2026-02-09"],
tools=[{"type": "web_search_20260209", "name": "web_search"}],
messages=[{"role": "user", "content": "What's the latest LLM benchmark news from this week?"}]
)
usage = response.usage
search_count = getattr(usage, 'server_tool_use', {})
print(f"Web searches used: {search_count.get('web_search_requests', 0)}")
# Each search: $0.01
Code Execution Tool
Free when bundled with web search or web fetch (using the web_search_20260209 or web_fetch_20260209 tool versions).
When used standalone: - 1,550 free hours per organization per month - $0.05 per hour per container beyond the free tier - Minimum billing unit: 5 minutes
For most development and testing workloads, the free tier is more than sufficient.
Web Fetch Tool
No additional charges. You only pay standard token costs for content that enters the conversation.
| Tool | Additional Cost | Notes |
|---|---|---|
| Web search | $10/1K searches | Per-search fee |
| Web fetch | Free | Token costs only |
| Code execution (with web tools) | Free | Bundled |
| Code execution (standalone) | $0.05/hr after 1,550 free hrs/mo | Per container |
| Computer use overhead | ~735 extra input tokens | Per tool definition |
| Text editor overhead | ~700 extra input tokens | Per tool definition |
Computer Use Overhead
Computer use adds fixed token overhead: - System prompt addition: 466–499 tokens - Tool definition tokens: 735 tokens per tool (Claude 4.x models)
For a computer use session with 100 turns at 200 tokens/turn plus screenshots: - Tool overhead: 735 tokens × $3/MTok = $0.0022 (negligible) - Screenshot tokens depend on resolution; plan for ~2,000–5,000 tokens per screenshot
Claude Sonnet 4.6 vs All Models: Full Comparison
Current Model Pricing
| Model | Input | Output | Cache Read | Batch Input | Batch Output |
|---|---|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | $0.50 | $2.50 |
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 | $2.50 | $12.50 |
| Claude Opus 4.5 | $5.00 | $25.00 | $0.50 | $2.50 | $12.50 |
| Claude Opus 4.1 | $15.00 | $75.00 | $1.50 | $7.50 | $37.50 |
All prices in USD per million tokens.
Sonnet 4.6 vs Opus 4.6: The Value Question
| Claude Sonnet 4.6 | Claude Opus 4.6 | |
|---|---|---|
| Input price | $3/MTok | $5/MTok |
| Output price | $15/MTok | $25/MTok |
| Relative cost | 1× | 1.67× |
| SWE-bench Verified | 79.6% | ~80.8% |
| OSWorld (computer use) | 72.5% | 72.7% |
| User preference vs Sonnet 4.5 | 70% | N/A |
| User preference vs Opus 4.5 | 59% | N/A |
| 1M context window | Yes (beta) | Yes (beta) |
| Adaptive thinking | Yes | Yes |
| Max output | 64K tokens | 128K tokens |
For the vast majority of tasks—coding, analysis, document processing, agentic workflows—Sonnet 4.6 matches Opus performance at 60% of the price. Opus 4.6 is worth the premium when you need 128K output tokens or the absolute maximum on novel reasoning tasks.
Sonnet 4.6 vs Haiku 4.5: When to Use Each
| Use Case | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|
| Complex code generation | ✅ | ⚠️ |
| Simple classification | ⚠️ Overkill | ✅ |
| Document summarization | ✅ | ✅ |
| Multi-step agentic tasks | ✅ | ❌ |
| High-volume low-complexity | ❌ Expensive | ✅ |
| Tool calling / function use | ✅ | ✅ |
| Long reasoning chains | ✅ | ❌ |
| Latency-sensitive apps | ✅ Fast | ✅ Fastest |
The smart pattern: use Haiku 4.5 for routing, classification, and simple extraction; route complex tasks to Sonnet 4.6. This hybrid approach typically costs 60–80% less than Sonnet 4.6 for everything.
Testing Costs with Apidog Before Going Live
Before deploying to production, you want to know exactly what each request costs. Apidog's visual API client lets you test Claude Sonnet 4.6 calls, inspect the full response including the usage object, and track token counts per request.

Set Up Cost Visibility in Apidog
- Create a new POST request to
https://api.anthropic.com/v1/messages - Add headers:
x-api-key,anthropic-version: 2023-06-01,Content-Type: application/json - Set the body with your model and messages
- Run the request — the response
usageobject shows exact token counts
{
"usage": {
"input_tokens": 523,
"cache_creation_input_tokens": 5000,
"cache_read_input_tokens": 0,
"output_tokens": 312
}
}
From those numbers, calculate actual cost: - Input: 523 tokens × $3/MTok = $0.00157 - Cache write: 5,000 tokens × $3.75/MTok = $0.01875 - Output: 312 tokens × $15/MTok = $0.00468 - Total first call: $0.025 (subsequent calls with cache hit: ~$0.006)
You can save these requests as a collection in Apidog, share them with your team, and run cost estimates across different prompt variations before finalizing your production design.
Ready to start building? Download Apidog free to test Claude Sonnet 4.6 API calls visually, inspect token usage per request, and size your costs accurately before deploying.



