If you’ve ever shipped an LLM feature and watched it return malformed JSON in production, PydanticAI is built for you. It’s the Python agent framework from the team behind Pydantic, and it puts type-safe, validated outputs at the center of agent development. This guide explains what PydanticAI is, why type-safety matters for agents, the core concepts you’ll actually use, and how it stacks up against other Python frameworks like LangGraph.
What PydanticAI is
PydanticAI is an open-source, provider-agnostic agent framework for Python. It’s maintained by the same team that builds Pydantic Validation and Pydantic Logfire, so it inherits a strong validation foundation and a clear design goal: bring “that FastAPI feeling” to building agents.
In plain terms, you describe what your agent should do, what tools it can call, and what shape its output must take. PydanticAI handles the model calls, validates everything against your Pydantic models, and retries when the model returns something that doesn’t fit.
The project reached a stable v2.0.0 release on June 23, 2026, after a run of betas. V2 leans into a harness-first design where an agent’s tools, hooks, instructions, and model settings compose as reusable units. You can install it with pip install pydantic-ai or uv add pydantic-ai.
Why type-safety matters for agents
LLMs are non-deterministic. Ask the same question twice and you can get two different shapes of answer. That’s fine for a chat box, but it breaks the moment you wire model output into real code: a database write, an API call, a billing calculation.
Most agent bugs come from this gap. The model “mostly” returns valid JSON, your parser works in testing, then a production response drops a field or wraps the answer in prose and your pipeline throws. You end up writing defensive parsing, regex cleanup, and retry loops by hand.
PydanticAI closes the gap by making the output contract part of the framework. You define a Pydantic model, pass it as the output type, and the framework guarantees the value you get back matches that model. If the model returns something invalid, PydanticAI sends the validation error back to the LLM and asks it to try again. Your downstream code receives typed objects, not hopeful strings.
That same idea extends to tool arguments. When the model calls one of your tools, PydanticAI validates the arguments against your function’s type hints before the function runs. Bad arguments never reach your business logic.
Core concepts
PydanticAI keeps its surface area small. Five ideas cover most of what you’ll build.
Agents
The Agent class is the main entry point. You create one with a model identifier and optional instructions. The class is generic over two type parameters: the dependencies type and the output type, which is what gives your editor and type checker real visibility into your agent.
from pydantic_ai import Agent
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Be concise, reply with one sentence.',
)
result = agent.run_sync('Where does "hello world" come from?')
print(result.output)
That model string is all you change to switch providers, which keeps your code portable.
Typed outputs
Pass a Pydantic model as the output_type and the agent’s result is validated against it. You get a typed object back, and your IDE knows every field. Here’s a structured-output sketch:
from pydantic import BaseModel
from pydantic_ai import Agent
class SupportTicket(BaseModel):
category: str
priority: int
summary: str
agent = Agent('openai:gpt-4o', output_type=SupportTicket)
result = agent.run_sync('My payment failed three times today.')
print(result.output.priority) # an int, validated, not a guess
If the model returns a priority as text or omits the summary, validation fails and the framework reprompts. You never parse the raw response yourself.
Tools
Tools let the model reach outside itself: query a database, hit a REST API, run a calculation. You register a tool with the @agent.tool decorator. PydanticAI reads the function’s type hints and docstring to build the schema the model sees, then validates every call against it.
from pydantic_ai import Agent, RunContext
agent = Agent('openai:gpt-4o', deps_type=str)
@agent.tool
async def get_user_balance(ctx: RunContext[str], account_id: str) -> float:
"""Return the current balance for an account."""
# ctx.deps holds your injected dependency
return await lookup_balance(ctx.deps, account_id)
The model decides when to call the tool. Your function only runs with arguments that already passed validation.
Dependencies
Real agents need context: a database connection, an HTTP client, the current user, an API key. PydanticAI handles this with dependency injection. You declare a deps_type on the agent, then read it through RunContext inside tools and dynamic instructions. The whole chain stays type-safe, and testing gets easier because you can swap real dependencies for fakes.
Model-agnostic providers and streaming
PydanticAI supports a long list of providers: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, Perplexity, plus cloud options like Azure AI Foundry and Amazon Bedrock and self-hosted models. Switching is usually a one-line change to the model string.
It also streams structured output with validation applied as data arrives, so you can render partial results without giving up the type guarantees. And because the team also builds Pydantic Logfire, observability is built in: tracing, debugging, and cost tracking for every run.
How PydanticAI compares to other Python agent frameworks
There’s no single “best” framework. They optimize for different things. Here’s an honest read on where PydanticAI fits.
| Framework | Core strength | Best when you want |
|---|---|---|
| PydanticAI | Type-safe, validated outputs and tool args | Production reliability and clean typed data flow |
| LangGraph | Explicit stateful graphs and control flow | Long-running, branching, multi-step workflows |
| Google ADK | Multi-agent orchestration in Google’s ecosystem | Deep Gemini and Vertex AI integration |
| OpenAI Agents SDK | Tight OpenAI integration with handoffs | An OpenAI-first stack and quick setup |
PydanticAI’s edge is the validation layer. If your agent feeds typed data into other systems, the guarantee that output matches a Pydantic model removes a whole class of runtime errors. LangGraph gives you finer control over state machines and complex flows. The OpenAI Agents SDK is a natural fit if you’re already committed to OpenAI and want features like agent handoffs and MCP server support.
You can also mix them. PydanticAI plays well as the typed-output layer inside a larger orchestration.
When to use PydanticAI
Reach for PydanticAI when:
- Your agent’s output goes into code, not just a chat window, and the shape has to be correct.
- You want your type checker and IDE to understand your agent end to end.
- You already use Pydantic in your codebase, so the model definitions feel native.
- You need provider flexibility and don’t want to rewrite your agent to switch models.
- Observability matters and Logfire’s built-in tracing is appealing.
Look elsewhere when you need heavy graph-based orchestration with complex branching, where a state-machine framework gives you more direct control.
Testing and mocking the APIs behind your agent
A PydanticAI agent is only as reliable as the APIs it depends on. Every run calls an LLM provider, and most useful agents also call your own REST endpoints or third-party tools. Those calls are where flaky behavior, surprise costs, and shape mismatches creep in. PydanticAI validates the model’s output, but it can’t validate that the upstream tool API you’re calling returns what you expect.

This is where Apidog fits, and it’s a different job from the framework. Apidog is an API platform where you test and mock the underlying APIs your agent talks to.
A few concrete uses:
- Mock the LLM or a tool endpoint. During development, point a tool at a mock API that returns deterministic responses. You stop burning tokens on every test run and you sidestep provider rate limits while iterating.
- Assert response shapes. Before you wire a REST endpoint into an
@agent.toolfunction, use API assertions to confirm the real response matches the structure your tool expects. Catch a missing field at the API layer, not deep inside an agent run. - Manage keys per environment. Keep provider keys and base URLs in separate Apidog environments so local, staging, and CI runs hit the right targets without code changes.
- Verify the LLM endpoint directly. If you call a provider over HTTP, you can test the ChatGPT API with Apidog to confirm auth, streaming, and tool-call formats before your agent depends on them.
Apidog doesn’t build or orchestrate agents, and it isn’t an alternative to PydanticAI. It’s the bench where you test and mock the API surface your agent runs on. If you want to try it, download Apidog and mock one of your tool endpoints first.
Frequently asked questions
Is PydanticAI free and open source?
Yes. PydanticAI is open source and you install it from PyPI with pip install pydantic-ai or uv add pydantic-ai. You’ll still pay for whatever LLM provider you use, since the framework calls those APIs on your behalf. To keep those provider costs down while you build, you can mock the API responses during testing instead of hitting the live model on every run.
What models does PydanticAI work with?
It’s provider-agnostic. The docs list OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, and Perplexity, plus cloud options like Azure AI Foundry and Amazon Bedrock and self-hosted models. You select a model by passing a string like 'anthropic:claude-sonnet-4-6' or 'openai:gpt-4o' to the Agent constructor, and switching is usually a one-line change.
How is PydanticAI different from LangChain or LangGraph?
PydanticAI centers on type safety: validated structured outputs and validated tool arguments backed by Pydantic models. LangGraph centers on explicit stateful graphs for multi-step, branching workflows. If your priority is guaranteed output shapes and a clean typed data flow, PydanticAI fits well. If you need fine-grained control over a complex state machine, a graph framework gives you more direct levers.
Do I need to know Pydantic to use it?
It helps, but the basics are quick to pick up. You define data shapes as classes that inherit from BaseModel, and PydanticAI uses those for outputs and tool schemas. If you’ve used Python for API testing or worked with FastAPI, the mental model will feel familiar.
Conclusion
PydanticAI brings something practical to agent development: a guarantee that your model’s output and tool calls match the types you declared. That removes a real source of production bugs and keeps your data flow clean. Pick it when reliability and typed outputs matter more than heavy graph orchestration.
Whichever framework you choose, the APIs underneath your agent still need testing. Mock your LLM and tool endpoints, assert their response shapes, and manage keys per environment in Apidog so your agent runs on a foundation you’ve actually verified.



