Claude Managed Agents vs Agent SDK (2026): Which to Choose

Claude Managed Agents vs Agent SDK in 2026: compare control, cost, ops, observability, and data residency, plus how to test the APIs your agents call.

Ashley Innocent

Ashley Innocent

19 May 2026

Claude Managed Agents vs Agent SDK (2026): Which to Choose

You’ve decided to ship a production AI agent on Claude. Now you hit the first real fork in the road: do you let Anthropic run the agent loop and sandbox for you with Claude Managed Agents, or do you keep the loop inside your own process with the Claude Agent SDK? The two options look similar from a demo, but they pull your architecture, your cost model, and your on-call rotation in different directions. This guide walks through the trade-offs the way you’d actually reason about them on a whiteboard, with a payments-refund agent and a support-ticket agent as running examples.

button

TL;DR

Choose Claude Managed Agents when you want Anthropic to host the agent loop, sandbox, and session state for long-running or asynchronous work and you’d rather pay a runtime fee than run that infrastructure. Choose the Claude Agent SDK when you need the loop inside your own process, full control over tools, data residency, and cost. Both speak MCP and Claude models.

Introduction

In 2026, “build an AI agent” stopped meaning “wire up a while loop around a chat completion.” Anthropic now gives you two distinct ways to run an agent in production, and the choice shapes more than code. It decides where customer data sits, who gets paged at 2am when a tool call hangs, and how your finance team forecasts spend.

The Claude Agent SDK is a library: you import it into a Python or TypeScript service, and the agent loop, context management, and built-in tools run inside your own process and infrastructure. Claude Managed Agents is the opposite shape: a hosted REST API where Anthropic runs the loop and a per-session sandbox, and your application sends events and streams results back. Same models underneath, very different operational contracts.

Most production agents do real work by calling APIs: charging a card, creating a Zendesk ticket, querying an inventory service, hitting an internal pricing endpoint. That means the reliability of your agent is mostly the reliability of the APIs and tools it calls. Before you pick a hosting model, you need a way to design, mock, and test those endpoints under agent-shaped traffic. That’s where a platform like Apidog fits: you can mock the dependencies your agent hits, run contract tests against them, and exercise an MCP server the same way the agent will. We’ll come back to that. First, let’s get both options straight, because picking the wrong one is expensive to unwind. If you want a deeper primer on the hosted side specifically, see our Claude Managed Agents guide.

What Claude Managed Agents actually is

Claude Managed Agents is a pre-built, configurable agent harness that runs in Anthropic-managed infrastructure. Instead of writing your own agent loop, sandbox, and tool execution layer, you describe an agent and let Anthropic run it. It launched in public beta in April 2026 and currently requires the managed-agents-2026-04-01 beta header on every request, which the SDK sets for you.

The product is built around four concepts, and they map cleanly onto how you’d think about a job runner:

The flow is: create an agent, configure an environment, start a session, send user messages as events, and stream responses. You can steer the agent mid-run by sending more events, or interrupt it to change direction. The event history is stored on Anthropic’s side and you can fetch it in full, which matters for audit and debugging.

Managed Agents gives Claude a set of built-in tools out of the box: Bash, file operations (read, write, edit, glob, grep), web search and fetch, and MCP server connections for everything else. Anthropic’s framing is that this option is best for workloads that need long-running execution (minutes to hours, many tool calls), secure cloud containers with network access, minimal infrastructure on your side, and stateful sessions that persist across interactions. It’s also available on Claude Platform on AWS with some differences in feature availability and session behavior, which is worth checking if you’re constrained to a specific cloud.

Two things to keep in mind. First, custom tools work differently here: Claude decides to call a tool, but your application executes it and returns the result over the event stream. The execution still happens in your world; only the loop and sandbox are hosted. Second, certain features (outcomes and multi-agent) are gated as a research preview behind a separate access request, so don’t assume every capability is available the moment you turn it on. For the broader pattern behind all this, our write-up on agentic AI architecture covers how the loop, tools, and memory fit together.

What the Claude Agent SDK actually is

The Claude Agent SDK is a library that gives you the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript. It was previously called the Claude Code SDK; the rename signaled a broader scope than coding tasks. You pip install claude-agent-sdk or npm install @anthropic-ai/claude-agent-sdk, point it at an API key, and the loop runs inside your process.

A minimal agent is small. In Python you call query() with a prompt and an options object listing the tools the agent may use, then iterate the streamed messages. Claude reads files, runs commands, and edits code without you implementing a tool-execution loop. That’s the core difference from the plain Client SDK, where you write the while response.stop_reason == "tool_use" loop yourself and execute every tool call by hand.

The SDK ships the machinery you’d otherwise build:

Because the loop runs in your process, the SDK also reads Claude Code’s filesystem configuration: skills in .claude/skills/, slash commands, a CLAUDE.md for project context, and plugins. Authentication supports the direct Anthropic API plus Amazon Bedrock, Claude Platform on AWS, Google Vertex AI, and Azure AI Foundry, so you can keep inference inside an existing cloud contract. If you want a hands-on path, our guide on setting up the Claude Agent SDK with a Claude plan and the walkthrough on building your own Claude Code both start from a working loop.

One billing change you should plan around: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive usage limits. If your forecast assumed SDK calls shared the same pool as interactive Claude usage, revisit it. Check Anthropic’s current terms directly rather than trusting a number you read in a blog post, including this one.

Head-to-head: Managed Agents vs Agent SDK

Here’s the comparison the way it tends to come up in an architecture review. Treat the cost row as directional; confirm live numbers against Anthropic’s pricing page and the Managed Agents docs before you commit a budget.

Dimension Claude Managed Agents Claude Agent SDK
Where the loop runs Anthropic-managed infrastructure Your process, your infrastructure
Interface REST API + SSE event stream Python or TypeScript library
Control over the loop Configured, not coded; you steer via events Full: hooks, custom permissions, in-process logic
Cost model Standard Claude token rates plus a per-session-hour runtime fee for active agent time Standard Claude token rates plus the compute you run it on
Ops burden Low: no sandbox, scaling, or session store to operate Higher: you run, scale, and monitor the service and sandbox
Observability Anthropic-hosted event log, fetchable in full; built-in monitoring Whatever you instrument: hooks, your logs, your tracing stack
Latency profile Network hop to hosted runtime; tuned for long async work In-process loop; you control proximity to your data and tools
Data residency Sandbox and session state live in Anthropic infra (AWS option available) Files, state, and tool execution stay on your infrastructure
Custom tool execution Claude requests; your app executes and returns over the stream In-process Python or TypeScript functions
Best fit Long-running, asynchronous, infra-light production agents Local prototyping, agents close to your filesystem and services, strict data control

A few rows deserve a sentence of nuance.

Cost. The shapes differ, not the model price. Managed Agents charges standard token rates plus a runtime fee for active session time, so an agent that thinks for an hour costs you for that hour even between tool calls. The SDK has no per-hour Anthropic runtime fee, but you pay for the servers, autoscaling, and the engineers who keep them up. Cheaper on paper isn’t cheaper once you price an on-call rotation.

Ops burden. This is the clearest split. Managed Agents removes the sandbox, the session store, and the scaling logic from your plate. The SDK gives you control of all three, which is exactly what you want when an agent must run inside a VPC next to a private database, and exactly what you don’t want when a two-person team just needs an async worker.

Data residency. With the SDK, tool execution and session state never leave your infrastructure; only model inference goes to Claude. With Managed Agents, the sandbox and event log live in Anthropic’s environment (or AWS, with caveats). For regulated data this row often decides the whole question on its own.

Observability. Managed Agents hands you a hosted, fetchable event log for free. The SDK hands you hooks and expects you to wire them into your tracing stack. Different ergonomics, similar end state if you do the work.

Testing and debugging the APIs your agents call

Whichever hosting model you pick, your agent’s reliability is dominated by the tools and APIs it calls. A refund agent that reasons perfectly but calls a flaky payments endpoint is a flaky refund agent. So treat the dependencies as first-class test targets, not afterthoughts.

Three layers are worth testing before you ship.

The API contracts. Every tool your agent calls is an API with a schema. Mock those endpoints and assert on request and response shapes so a backend change doesn’t silently break the agent in production. With Apidog you can stand up a mock for the payments or ticketing service, define the exact schema the agent expects, and run contract tests on a schedule. When the real service drifts, the contract test fails before a customer’s refund does. For a structured approach to this, our guide on how to test AI agents that call APIs goes through the failure modes that matter.

The MCP servers. Both options route external tools through MCP. An MCP server is itself a service with tools, inputs, and outputs, and it’s a common place for agents to break: a tool returns a slightly different payload, a timeout isn’t handled, an error path returns prose instead of structured data. Test the MCP server directly, the way the agent will hit it, before you connect it to a live agent. Our walkthrough on MCP server testing with Apidog covers how to enumerate the tools a server exposes and exercise each one. Apidog also includes an AI agent and A2A debugger so you can watch the request and response traffic an agent generates, not just guess at it.

The agent’s own request behavior. Agents call APIs in patterns humans don’t: bursts of retries, partial reads, the same endpoint hit ten times in a loop while the model reasons. Replay that traffic against your mocks and watch what the agent actually sends. This is where a debugger that captures live agent and A2A traffic earns its keep; you find the off-by-one retry storm in staging instead of on the incident bridge.

The point isn’t tooling for its own sake. It’s that the hosting decision and the testing strategy are linked. Managed Agents hides the loop, so your visibility into failures comes through its event log plus your own API-level tests. The SDK exposes the loop, so you instrument it with hooks but still need the same API-level tests underneath. Either way, Download Apidog and put the agent’s dependencies under test before the agent goes near a real customer.

A decision framework

Skip the feature-by-feature agonizing and answer these in order. The first strong yes points you to an option.

Choose Claude Managed Agents if:

Choose the Claude Agent SDK if:

A common path: prototype with the Agent SDK locally because the loop is right there and the iteration cycle is tight, then move to Managed Agents for production if the operational savings outweigh the loss of control. That migration is real work, not a config flip, so make the call deliberately rather than defaulting. If you’re also weighing models or coding agents alongside this, our Claude vs Codex comparison for 2026 is a useful companion read.

Real-world use cases

A payments refund agent

A fintech support team wants an agent that processes refund requests end to end: read the ticket, look up the transaction, check the refund policy, call the payments API to issue the refund, and write a summary back to the ticket. This agent touches money, so every API call needs a tested contract and a clear audit trail.

The SDK is the natural fit here. The agent should run inside the VPC next to the payments service, session state must not leave the company’s infrastructure, and PreToolUse hooks can enforce a hard rule that any refund over a threshold requires human approval. Before launch, the team mocks the payments and ledger endpoints in Apidog, writes contract tests for the refund and lookup calls, and replays a week of historical tickets against the mocks to see exactly what the agent sends. The retry-storm bug they find (the agent re-issuing a refund call after a 504 that actually succeeded) is the entire reason this testing layer exists.

An asynchronous support-ticket triage agent

A SaaS company gets thousands of support tickets a day and wants an agent to triage them: classify, pull related logs, draft a response, and either resolve or escalate. Tickets arrive at all hours, each one takes a few minutes of tool calls, and the data involved is low-sensitivity.

Managed Agents fits this shape well. The work is long-running and asynchronous, the team is small and doesn’t want to run an autoscaling worker fleet, and the hosted event log gives them a per-ticket trace for free. They still test the dependencies: the logging API and the ticket-system MCP server get mocked and contract-tested in Apidog so a schema change in the log service doesn’t quietly degrade triage quality. The hosting is managed; the API correctness is still their job.

An internal data-ops agent behind the firewall

A platform team wants an agent that responds to internal requests like “back-fill yesterday’s failed ETL partitions” by querying an internal job API, running a remediation script, and reporting status. The internal APIs aren’t on the public internet and the data is sensitive.

The SDK wins by default. The agent must run where it can reach private services, and nothing about session state can sit in a third-party sandbox. The team connects internal services as MCP servers, tests each MCP tool in isolation first, and uses SDK hooks to log every command the agent runs to their existing audit pipeline. This is the case where the SDK’s “runs in your process” property isn’t a preference; it’s a requirement. For background on why agents are becoming primary API consumers, see our piece on AI agents as the new API consumers.

Conclusion

The Managed Agents versus Agent SDK decision is an operational and data-governance decision wearing an API-design costume. Here’s what to carry away:

Next step: before you wire an agent to anything that touches a customer, put its API and MCP dependencies under test. Download Apidog to mock those endpoints, run contract tests, and debug the agent’s actual request traffic, so the hosting model you pick is built on dependencies you’ve already proven.

FAQ

What’s the core difference between Claude Managed Agents and the Claude Agent SDK?

Managed Agents is a hosted REST API where Anthropic runs the agent loop and a per-session sandbox; you send events and stream results back. The Agent SDK is a Python or TypeScript library that runs the same loop inside your own process and infrastructure. Same Claude models, different operational ownership.

Is the Claude Agent SDK the same as the old Claude Code SDK?

Yes. The Claude Code SDK was renamed to the Claude Agent SDK to reflect a broader scope beyond coding tasks. The agent loop, built-in tools, and context management it exposes are the same machinery that powers Claude Code, now packaged as a general-purpose agent library.

Which option is cheaper?

It depends on workload shape. Managed Agents charges standard Claude token rates plus a runtime fee for active session time, so long-thinking agents accrue runtime cost. The SDK has no per-hour Anthropic runtime fee but you pay for and operate the compute. Confirm current rates on Anthropic’s pricing page; don’t budget from a number in a blog post.

Can I use MCP servers with both?

Yes. Both route external tools through the Model Context Protocol. That’s also why testing your MCP servers matters before connecting them to either option; our MCP server testing with Apidog guide walks through exercising each tool a server exposes the way an agent will hit it.

How do I keep customer data out of Anthropic’s infrastructure?

Use the Agent SDK and run the loop inside your own environment. With the SDK, tool execution and session state stay on your infrastructure and only model inference goes to Claude. With Managed Agents the sandbox and event log live in Anthropic’s environment (an AWS option exists with caveats), which may not satisfy strict residency rules.

Is Claude Managed Agents production-ready?

It launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header on every request. Core session functionality is generally available to API accounts, while some features such as outcomes and multi-agent are gated behind a separate research-preview request. Treat it as beta and check the docs for current status.

How do I test an agent before it touches real APIs?

Mock every API and MCP server the agent calls, write contract tests on the request and response schemas, and replay realistic traffic against the mocks to see what the agent actually sends. Apidog covers all three, including an AI agent and A2A debugger for inspecting live agent traffic. Our how to test AI agents that call APIs guide details the failure modes.

Can I start on one and switch to the other later?

You can, and a common path is prototyping on the Agent SDK locally then moving to Managed Agents for production. It isn’t a config switch though: the interfaces differ (library versus REST plus events), custom tool execution works differently, and session state moves from your filesystem to a hosted log. Plan it as a migration project.

button

Explore more

Cursor Composer 2.5 vs Opus 4.7 vs GPT-5.5: Which Coding Model Should You Use?

Cursor Composer 2.5 vs Opus 4.7 vs GPT-5.5: Which Coding Model Should You Use?

Composer 2.5 matches Opus 4.7 and GPT-5.5 on SWE-bench and CursorBench at a tenth of the cost. Full benchmark, speed, and cost comparison plus which to pick.

19 May 2026

Cursor Composer 2.5: What It Is, How to Use It, and How to Access It

Cursor Composer 2.5: What It Is, How to Use It, and How to Access It

Cursor Composer 2.5 matches Opus 4.7 and GPT-5.5 at under $1 per task. Benchmarks, pricing, how to access it in Cursor, and how to use it with your API workflow.

19 May 2026

7 Best API Management Tools in 2026, Ranked by G2

7 Best API Management Tools in 2026, Ranked by G2

G2 Spring 2026 named Apidog and viaSocket Leaders in API Management. Honest, hands-on comparison of the 7 ranked tools and who each one fits.

15 May 2026

Practice API Design-first in Apidog

Discover an easier way to build and use APIs