Claude Opus 4.8 Pricing: The Full Cost Breakdown

Claude Opus 4.8 pricing explained: $5/$25 standard and $10/$50 fast mode per million tokens, worked cost examples, and how effort control, caching, and batch mode lower costs.

Ashley Innocent

Ashley Innocent

29 May 2026

Claude Opus 4.8 Pricing: The Full Cost Breakdown

Apidog for Enterprise

On-Premises Deploy

SSO & RBAC

SOC 2 Compliant

Explore Apidog Enterprise

Claude Opus 4.8 costs $5 per million input tokens and $25 per million output tokens in standard mode. That’s the same rate as Opus 4.7, so if you’re already budgeting for 4.7, nothing changes when you upgrade. The interesting part is everything around that headline number: a faster mode, a token-spend dial, caching, and batch discounts that move your real bill far more than the base rate.

This guide breaks down what you actually pay, with worked examples. For the model overview, see what is Claude Opus 4.8. To start building, see the API guide.

The rate card

Mode Input (per 1M tokens) Output (per 1M tokens) Speed
Standard $5 $25 baseline
Fast $10 $50 2.5x faster output

Two things stand out. First, output tokens cost five times more than input tokens, so the length of Claude’s responses drives your bill, not the size of your prompts. Second, fast mode doubles the rate for 2.5x faster output. Anthropic notes that fast mode is about three times cheaper than the equivalent was on previous models, so the premium for speed has dropped generation over generation.

You can confirm current rates in Anthropic’s pricing docs.

What fast mode is for

Standard mode is the default and the right choice for most workloads. Fast mode exists for the cases where latency is the product: live coding assistants, interactive agents, anything where a user is watching the cursor. You pay double per token for output that streams 2.5x faster.

The decision is simple. If a human is waiting on the response in real time, fast mode can be worth it. If the work runs in the background, an agent loop, a batch job, a scheduled task, stay on standard and keep the money.

How effort changes your bill

This is the lever most teams miss. Opus 4.8’s effort parameter controls how many tokens the model spends across the whole response, including tool calls. Because output is the expensive half, lowering effort on work that doesn’t need deep reasoning cuts cost directly.

The five levels, from cheapest to most expensive in token terms:

A classification task at low effort might use a tenth of the output tokens it would at high. Same model, same rate, a fraction of the bill. Anthropic’s effort guidance covers where each level holds quality. The takeaway: match effort to the task instead of paying for high everywhere.

Worked cost scenarios

All figures use standard pricing ($5 input, $25 output per million tokens). They’re illustrative; your real token counts will vary.

Scenario 1: a chatbot turn. 1,000 input tokens, 500 output tokens.

At low effort the output shrinks, pulling the per-turn cost under a cent.

Scenario 2: an agentic coding task. 50,000 input tokens of repo context, 8,000 output tokens at xhigh.

If that 50K context repeats across calls, prompt caching drops the input cost to roughly $0.025, cutting the total to about $0.23.

Scenario 3: an overnight batch job. 1,000,000 input tokens, 200,000 output tokens, run through the Batch API at a 50% discount.

For comparison shopping against cheaper models, see the Gemini 3.5 Flash pricing breakdown and Xiaomi MiMo v2.5 API cost.

Prompt caching: the biggest single saving

If you send the same system prompt, document, or codebase on every call, you’re paying full input price for tokens the model has already seen. Prompt caching fixes that. Cached input reads are charged at a fraction of the normal input rate, roughly a tenth, after the initial cache write.

Long-context agents save the most. A 50K-token system prompt billed at full rate on every call is expensive; cached, the repeated portion costs almost nothing. The first call writes the cache, every call after reads it cheap.

Batch API and large outputs

The Batch API runs jobs at a discount when you don’t need a real-time answer. Submit a set of requests, get results back within the batch window, pay less per token. It also raises the output ceiling: Opus 4.8 supports up to 300K output tokens through the Batch API with the output-300k-2026-03-24 beta header, versus 128K on the synchronous endpoint.

Use it for evals, bulk summarization, data labeling, and any pipeline where minutes of latency don’t matter.

Opus pricing across generations

Opus 4.8 holds the line on price. The story is how far the line dropped two generations ago:

Model Input (per 1M) Output (per 1M)
Opus 4.1 $15 $75
Opus 4.5 $5 $25
Opus 4.6 $5 $25
Opus 4.7 $5 $25
Opus 4.8 $5 $25

Opus dropped from $15/$75 to $5/$25 at the 4.5 generation and has stayed there since, while the model behind the price keeps improving. You’re getting 4.8’s quality at 4.5’s rate. For a head-to-head against other vendors’ flagships, see Opus 4.8 vs GPT-5.5 vs Gemini 3.5.

A cost-optimization checklist

Before you scale Opus 4.8, work through this list:

Track your real spend with Apidog

Estimated cost and actual cost diverge fast once you’re in production, because real responses vary in length and tool-call count. The way to stay honest is to inspect the usage object that every Messages API response returns, which reports input and output token counts per call.

Apidog makes that visible:

Download Apidog, point a request at the Messages endpoint, and run the same prompt at low, high, and xhigh. The token counts tell you exactly what each effort level costs before you commit to it in production.

FAQ

How much does Claude Opus 4.8 cost? $5 per million input tokens and $25 per million output tokens in standard mode. Fast mode is $10 and $50 for 2.5x faster output.

Is Opus 4.8 more expensive than Opus 4.7? No. The per-token rates are identical, so upgrading from 4.7 doesn’t change your bill.

What’s the difference between standard and fast mode pricing? Fast mode doubles the per-token rate in exchange for output that streams about 2.5x faster. Use it only when latency matters to a waiting user.

How do I lower my Opus 4.8 costs? Drop the effort level on simpler tasks, cache repeated prompt content, batch non-urgent jobs, and keep max_tokens tight. Output tokens are the main cost driver.

Does prompt caching really save money? Yes. After the first call writes the cache, repeated input is read at roughly a tenth of the normal input rate. Long-context agents save the most.

How many output tokens can Opus 4.8 produce? Up to 128K on the synchronous Messages API, and up to 300K through the Batch API with the output-300k-2026-03-24 beta header.

Where do I see token usage per call? In the usage object on every Messages API response. Tools like Apidog surface it so you can compare cost across effort levels.

Explore more

What is CubeSandbox for AI Agents? Isolation Explained

What is CubeSandbox for AI Agents? Isolation Explained

What is CubeSandbox for AI agents? A clear look at Tencent's open-source KVM sandbox, why agents need isolation, and how it compares to E2B.

26 May 2026

DeepSeek V4-Pro 75% Price Cut Is Now Permanent: What It Means for Developers (2026)

DeepSeek V4-Pro 75% Price Cut Is Now Permanent: What It Means for Developers (2026)

DeepSeek V4-Pro pricing is now permanently 75% off: $0.435 input, $0.87 output, $0.003625 cache hit per 1M tokens. What it means for developers in 2026.

25 May 2026

What is an Agent2Agent (A2A) Debugger? And Why You Need One

What is an Agent2Agent (A2A) Debugger? And Why You Need One

An A2A debugger connects to an Agent2Agent agent, sends test messages, and shows the full request and response so you can debug agent integrations fast.

22 May 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs

Claude Opus 4.8 Pricing: The Full Cost Breakdown