You’ve decided to ship a production AI agent on Claude. Now you hit the first real fork in the road: do you let Anthropic run the agent loop and sandbox for you with Claude Managed Agents, or do you keep the loop inside your own process with the Claude Agent SDK? The two options look similar from a demo, but they pull your architecture, your cost model, and your on-call rotation in different directions. This guide walks through the trade-offs the way you’d actually reason about them on a whiteboard, with a payments-refund agent and a support-ticket agent as running examples.
TL;DR
Choose Claude Managed Agents when you want Anthropic to host the agent loop, sandbox, and session state for long-running or asynchronous work and you’d rather pay a runtime fee than run that infrastructure. Choose the Claude Agent SDK when you need the loop inside your own process, full control over tools, data residency, and cost. Both speak MCP and Claude models.
Introduction
In 2026, “build an AI agent” stopped meaning “wire up a while loop around a chat completion.” Anthropic now gives you two distinct ways to run an agent in production, and the choice shapes more than code. It decides where customer data sits, who gets paged at 2am when a tool call hangs, and how your finance team forecasts spend.
The Claude Agent SDK is a library: you import it into a Python or TypeScript service, and the agent loop, context management, and built-in tools run inside your own process and infrastructure. Claude Managed Agents is the opposite shape: a hosted REST API where Anthropic runs the loop and a per-session sandbox, and your application sends events and streams results back. Same models underneath, very different operational contracts.
Most production agents do real work by calling APIs: charging a card, creating a Zendesk ticket, querying an inventory service, hitting an internal pricing endpoint. That means the reliability of your agent is mostly the reliability of the APIs and tools it calls. Before you pick a hosting model, you need a way to design, mock, and test those endpoints under agent-shaped traffic. That’s where a platform like Apidog fits: you can mock the dependencies your agent hits, run contract tests against them, and exercise an MCP server the same way the agent will. We’ll come back to that. First, let’s get both options straight, because picking the wrong one is expensive to unwind. If you want a deeper primer on the hosted side specifically, see our Claude Managed Agents guide.
What Claude Managed Agents actually is
Claude Managed Agents is a pre-built, configurable agent harness that runs in Anthropic-managed infrastructure. Instead of writing your own agent loop, sandbox, and tool execution layer, you describe an agent and let Anthropic run it. It launched in public beta in April 2026 and currently requires the managed-agents-2026-04-01 beta header on every request, which the SDK sets for you.
The product is built around four concepts, and they map cleanly onto how you’d think about a job runner:
- Agent: the model, system prompt, tools, MCP servers, and skills. You create it once and reference it by ID across many sessions.
- Environment: a configured container template with pre-installed packages (Python, Node.js, Go, and others) and network access rules.
- Session: a running agent instance inside an environment, doing one task and producing outputs. It has a persistent filesystem and conversation history.
- Events: the messages flowing between your app and the agent (user turns, tool results, status updates), streamed back over server-sent events and persisted server-side.
The flow is: create an agent, configure an environment, start a session, send user messages as events, and stream responses. You can steer the agent mid-run by sending more events, or interrupt it to change direction. The event history is stored on Anthropic’s side and you can fetch it in full, which matters for audit and debugging.
Managed Agents gives Claude a set of built-in tools out of the box: Bash, file operations (read, write, edit, glob, grep), web search and fetch, and MCP server connections for everything else. Anthropic’s framing is that this option is best for workloads that need long-running execution (minutes to hours, many tool calls), secure cloud containers with network access, minimal infrastructure on your side, and stateful sessions that persist across interactions. It’s also available on Claude Platform on AWS with some differences in feature availability and session behavior, which is worth checking if you’re constrained to a specific cloud.
Two things to keep in mind. First, custom tools work differently here: Claude decides to call a tool, but your application executes it and returns the result over the event stream. The execution still happens in your world; only the loop and sandbox are hosted. Second, certain features (outcomes and multi-agent) are gated as a research preview behind a separate access request, so don’t assume every capability is available the moment you turn it on. For the broader pattern behind all this, our write-up on agentic AI architecture covers how the loop, tools, and memory fit together.
What the Claude Agent SDK actually is
The Claude Agent SDK is a library that gives you the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript. It was previously called the Claude Code SDK; the rename signaled a broader scope than coding tasks. You pip install claude-agent-sdk or npm install @anthropic-ai/claude-agent-sdk, point it at an API key, and the loop runs inside your process.
A minimal agent is small. In Python you call query() with a prompt and an options object listing the tools the agent may use, then iterate the streamed messages. Claude reads files, runs commands, and edits code without you implementing a tool-execution loop. That’s the core difference from the plain Client SDK, where you write the while response.stop_reason == "tool_use" loop yourself and execute every tool call by hand.
The SDK ships the machinery you’d otherwise build:
- Built-in tools: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, a Monitor tool for watching background scripts, and an AskUserQuestion tool for clarifying questions.
- Hooks: callbacks at lifecycle points (
PreToolUse,PostToolUse,Stop,SessionStart,SessionEnd,UserPromptSubmit, and more) so you can validate, log, block, or transform behavior. This is how you build an audit trail of every file or API change. - Subagents: spawn specialized agents for focused subtasks; messages carry a
parent_tool_use_idso you can trace which subagent did what. - MCP: connect databases, browsers, and APIs over the Model Context Protocol, the same standard Managed Agents uses.
- Permissions: pre-approve safe tools, block dangerous ones, or require approval for sensitive actions. A read-only analysis agent is one option string.
- Sessions: capture a session ID, resume later with full context, or fork to explore alternatives. State is JSONL on your filesystem, so you own it.
Because the loop runs in your process, the SDK also reads Claude Code’s filesystem configuration: skills in .claude/skills/, slash commands, a CLAUDE.md for project context, and plugins. Authentication supports the direct Anthropic API plus Amazon Bedrock, Claude Platform on AWS, Google Vertex AI, and Azure AI Foundry, so you can keep inference inside an existing cloud contract. If you want a hands-on path, our guide on setting up the Claude Agent SDK with a Claude plan and the walkthrough on building your own Claude Code both start from a working loop.
One billing change you should plan around: starting June 15, 2026, Agent SDK and claude -p usage on subscription plans draws from a separate monthly Agent SDK credit, distinct from interactive usage limits. If your forecast assumed SDK calls shared the same pool as interactive Claude usage, revisit it. Check Anthropic’s current terms directly rather than trusting a number you read in a blog post, including this one.
Head-to-head: Managed Agents vs Agent SDK
Here’s the comparison the way it tends to come up in an architecture review. Treat the cost row as directional; confirm live numbers against Anthropic’s pricing page and the Managed Agents docs before you commit a budget.
| Dimension | Claude Managed Agents | Claude Agent SDK |
|---|---|---|
| Where the loop runs | Anthropic-managed infrastructure | Your process, your infrastructure |
| Interface | REST API + SSE event stream | Python or TypeScript library |
| Control over the loop | Configured, not coded; you steer via events | Full: hooks, custom permissions, in-process logic |
| Cost model | Standard Claude token rates plus a per-session-hour runtime fee for active agent time | Standard Claude token rates plus the compute you run it on |
| Ops burden | Low: no sandbox, scaling, or session store to operate | Higher: you run, scale, and monitor the service and sandbox |
| Observability | Anthropic-hosted event log, fetchable in full; built-in monitoring | Whatever you instrument: hooks, your logs, your tracing stack |
| Latency profile | Network hop to hosted runtime; tuned for long async work | In-process loop; you control proximity to your data and tools |
| Data residency | Sandbox and session state live in Anthropic infra (AWS option available) | Files, state, and tool execution stay on your infrastructure |
| Custom tool execution | Claude requests; your app executes and returns over the stream | In-process Python or TypeScript functions |
| Best fit | Long-running, asynchronous, infra-light production agents | Local prototyping, agents close to your filesystem and services, strict data control |
A few rows deserve a sentence of nuance.
Cost. The shapes differ, not the model price. Managed Agents charges standard token rates plus a runtime fee for active session time, so an agent that thinks for an hour costs you for that hour even between tool calls. The SDK has no per-hour Anthropic runtime fee, but you pay for the servers, autoscaling, and the engineers who keep them up. Cheaper on paper isn’t cheaper once you price an on-call rotation.
Ops burden. This is the clearest split. Managed Agents removes the sandbox, the session store, and the scaling logic from your plate. The SDK gives you control of all three, which is exactly what you want when an agent must run inside a VPC next to a private database, and exactly what you don’t want when a two-person team just needs an async worker.
Data residency. With the SDK, tool execution and session state never leave your infrastructure; only model inference goes to Claude. With Managed Agents, the sandbox and event log live in Anthropic’s environment (or AWS, with caveats). For regulated data this row often decides the whole question on its own.
Observability. Managed Agents hands you a hosted, fetchable event log for free. The SDK hands you hooks and expects you to wire them into your tracing stack. Different ergonomics, similar end state if you do the work.
Testing and debugging the APIs your agents call
Whichever hosting model you pick, your agent’s reliability is dominated by the tools and APIs it calls. A refund agent that reasons perfectly but calls a flaky payments endpoint is a flaky refund agent. So treat the dependencies as first-class test targets, not afterthoughts.
Three layers are worth testing before you ship.
The API contracts. Every tool your agent calls is an API with a schema. Mock those endpoints and assert on request and response shapes so a backend change doesn’t silently break the agent in production. With Apidog you can stand up a mock for the payments or ticketing service, define the exact schema the agent expects, and run contract tests on a schedule. When the real service drifts, the contract test fails before a customer’s refund does. For a structured approach to this, our guide on how to test AI agents that call APIs goes through the failure modes that matter.
The MCP servers. Both options route external tools through MCP. An MCP server is itself a service with tools, inputs, and outputs, and it’s a common place for agents to break: a tool returns a slightly different payload, a timeout isn’t handled, an error path returns prose instead of structured data. Test the MCP server directly, the way the agent will hit it, before you connect it to a live agent. Our walkthrough on MCP server testing with Apidog covers how to enumerate the tools a server exposes and exercise each one. Apidog also includes an AI agent and A2A debugger so you can watch the request and response traffic an agent generates, not just guess at it.
The agent’s own request behavior. Agents call APIs in patterns humans don’t: bursts of retries, partial reads, the same endpoint hit ten times in a loop while the model reasons. Replay that traffic against your mocks and watch what the agent actually sends. This is where a debugger that captures live agent and A2A traffic earns its keep; you find the off-by-one retry storm in staging instead of on the incident bridge.
The point isn’t tooling for its own sake. It’s that the hosting decision and the testing strategy are linked. Managed Agents hides the loop, so your visibility into failures comes through its event log plus your own API-level tests. The SDK exposes the loop, so you instrument it with hooks but still need the same API-level tests underneath. Either way, Download Apidog and put the agent’s dependencies under test before the agent goes near a real customer.
A decision framework
Skip the feature-by-feature agonizing and answer these in order. The first strong yes points you to an option.
Choose Claude Managed Agents if:
- Your agent runs long or asynchronously (minutes to hours, many tool calls) and you don’t want to operate a job runner, sandbox, and session store.
- You’re a small team and ops headcount is the binding constraint, not control.
- You want a hosted, fetchable event log without building observability from scratch.
- Your data and compliance posture allows the sandbox and session state to live in Anthropic’s (or the AWS) environment.
- You’re fine being in a beta with some features gated behind a research-preview request.
Choose the Claude Agent SDK if:
- The agent must run inside your VPC, next to a private database or internal service, with no third party holding session state.
- You need fine-grained control of the loop: custom permissions, hooks for audit and policy, in-process tool logic.
- Data residency or regulatory constraints rule out a hosted sandbox.
- You want inference billed through an existing Bedrock, Vertex, or Azure contract while keeping the loop in-house.
- You’re prototyping locally and want the agent working directly on your filesystem today.
A common path: prototype with the Agent SDK locally because the loop is right there and the iteration cycle is tight, then move to Managed Agents for production if the operational savings outweigh the loss of control. That migration is real work, not a config flip, so make the call deliberately rather than defaulting. If you’re also weighing models or coding agents alongside this, our Claude vs Codex comparison for 2026 is a useful companion read.
Real-world use cases
A payments refund agent
A fintech support team wants an agent that processes refund requests end to end: read the ticket, look up the transaction, check the refund policy, call the payments API to issue the refund, and write a summary back to the ticket. This agent touches money, so every API call needs a tested contract and a clear audit trail.
The SDK is the natural fit here. The agent should run inside the VPC next to the payments service, session state must not leave the company’s infrastructure, and PreToolUse hooks can enforce a hard rule that any refund over a threshold requires human approval. Before launch, the team mocks the payments and ledger endpoints in Apidog, writes contract tests for the refund and lookup calls, and replays a week of historical tickets against the mocks to see exactly what the agent sends. The retry-storm bug they find (the agent re-issuing a refund call after a 504 that actually succeeded) is the entire reason this testing layer exists.
An asynchronous support-ticket triage agent
A SaaS company gets thousands of support tickets a day and wants an agent to triage them: classify, pull related logs, draft a response, and either resolve or escalate. Tickets arrive at all hours, each one takes a few minutes of tool calls, and the data involved is low-sensitivity.
Managed Agents fits this shape well. The work is long-running and asynchronous, the team is small and doesn’t want to run an autoscaling worker fleet, and the hosted event log gives them a per-ticket trace for free. They still test the dependencies: the logging API and the ticket-system MCP server get mocked and contract-tested in Apidog so a schema change in the log service doesn’t quietly degrade triage quality. The hosting is managed; the API correctness is still their job.
An internal data-ops agent behind the firewall
A platform team wants an agent that responds to internal requests like “back-fill yesterday’s failed ETL partitions” by querying an internal job API, running a remediation script, and reporting status. The internal APIs aren’t on the public internet and the data is sensitive.
The SDK wins by default. The agent must run where it can reach private services, and nothing about session state can sit in a third-party sandbox. The team connects internal services as MCP servers, tests each MCP tool in isolation first, and uses SDK hooks to log every command the agent runs to their existing audit pipeline. This is the case where the SDK’s “runs in your process” property isn’t a preference; it’s a requirement. For background on why agents are becoming primary API consumers, see our piece on AI agents as the new API consumers.
Conclusion
The Managed Agents versus Agent SDK decision is an operational and data-governance decision wearing an API-design costume. Here’s what to carry away:
- Managed Agents hosts the loop and sandbox; the SDK runs them in your process. That single fact drives most of the trade-offs.
- Cost is a shape, not a number: Managed Agents adds a per-session-hour runtime fee; the SDK shifts that cost to infrastructure and on-call you operate.
- Data residency often decides it: regulated or VPC-bound data points to the SDK; low-sensitivity async work points to Managed Agents.
- Ops headcount is the other deciding factor: small teams gain the most from a managed runtime and hosted event log.
- Test the dependencies regardless of hosting: the agent is only as reliable as the APIs and MCP servers it calls.
- Prototype on the SDK, productionize on Managed Agents is a reasonable path, but treat the migration as a project.
- Verify pricing and beta status at the source before you commit; both are evolving in 2026.
Next step: before you wire an agent to anything that touches a customer, put its API and MCP dependencies under test. Download Apidog to mock those endpoints, run contract tests, and debug the agent’s actual request traffic, so the hosting model you pick is built on dependencies you’ve already proven.
FAQ
What’s the core difference between Claude Managed Agents and the Claude Agent SDK?
Managed Agents is a hosted REST API where Anthropic runs the agent loop and a per-session sandbox; you send events and stream results back. The Agent SDK is a Python or TypeScript library that runs the same loop inside your own process and infrastructure. Same Claude models, different operational ownership.
Is the Claude Agent SDK the same as the old Claude Code SDK?
Yes. The Claude Code SDK was renamed to the Claude Agent SDK to reflect a broader scope beyond coding tasks. The agent loop, built-in tools, and context management it exposes are the same machinery that powers Claude Code, now packaged as a general-purpose agent library.
Which option is cheaper?
It depends on workload shape. Managed Agents charges standard Claude token rates plus a runtime fee for active session time, so long-thinking agents accrue runtime cost. The SDK has no per-hour Anthropic runtime fee but you pay for and operate the compute. Confirm current rates on Anthropic’s pricing page; don’t budget from a number in a blog post.
Can I use MCP servers with both?
Yes. Both route external tools through the Model Context Protocol. That’s also why testing your MCP servers matters before connecting them to either option; our MCP server testing with Apidog guide walks through exercising each tool a server exposes the way an agent will hit it.
How do I keep customer data out of Anthropic’s infrastructure?
Use the Agent SDK and run the loop inside your own environment. With the SDK, tool execution and session state stay on your infrastructure and only model inference goes to Claude. With Managed Agents the sandbox and event log live in Anthropic’s environment (an AWS option exists with caveats), which may not satisfy strict residency rules.
Is Claude Managed Agents production-ready?
It launched in public beta in April 2026 and requires the managed-agents-2026-04-01 beta header on every request. Core session functionality is generally available to API accounts, while some features such as outcomes and multi-agent are gated behind a separate research-preview request. Treat it as beta and check the docs for current status.
How do I test an agent before it touches real APIs?
Mock every API and MCP server the agent calls, write contract tests on the request and response schemas, and replay realistic traffic against the mocks to see what the agent actually sends. Apidog covers all three, including an AI agent and A2A debugger for inspecting live agent traffic. Our how to test AI agents that call APIs guide details the failure modes.
Can I start on one and switch to the other later?
You can, and a common path is prototyping on the Agent SDK locally then moving to Managed Agents for production. It isn’t a config switch though: the interfaces differ (library versus REST plus events), custom tool execution works differently, and session state moves from your filesystem to a hosted log. Plan it as a migration project.



