Building AI agents that actually work in production is a completely different animal from getting a chatbot to respond in a Jupyter notebook. You need validated outputs, type-safe tool contracts, testable dependencies, and observability from day one — not as afterthoughts you bolt on later. PydanticAI is the agent framework designed specifically for this, and honestly, it brings the same kind of developer experience that FastAPI brought to web development into the world of generative AI.
This guide walks you through building production-grade AI agents with PydanticAI, covering everything from structured outputs and tool registration to dependency injection, MCP integration, and observability with Logfire. Every example uses working code against PydanticAI v1, which was released in September 2025 and is now at v1.63.0 as of February 2026.
Why PydanticAI for Production AI Agents
Most agent frameworks prioritize getting something running quickly. PydanticAI takes the opposite approach — it prioritizes keeping things running correctly.
The framework is built on three pillars that genuinely matter once you're past the prototype stage:
- Type safety at every boundary — agent inputs, outputs, tool parameters, and dependency types are all validated by Pydantic models. Errors surface at development time, not at 3 AM in production when you're half-asleep debugging a malformed JSON response.
- Model agnosticism — swap between OpenAI, Anthropic, Google Gemini, DeepSeek, Ollama, or any of 15+ providers without changing your agent logic. Your schemas and tools stay the same.
- Production primitives built in — durable execution (via Temporal, Prefect, or DBOS), streaming with validation, MCP and A2A protocol support, and OpenTelemetry-native observability through Logfire.
In a 2026 benchmark comparison of agent frameworks, PydanticAI scored 8/10 for developer experience — ahead of LangChain (5/10) and CrewAI (6/10). Its type safety alone caught 23 bugs during development that would have reached production in other frameworks. That's not a trivial win.
Installation and Project Setup
Getting started is straightforward. Create a project and install PydanticAI with the model provider you need:
python -m venv venv
source venv/bin/activate
# Core package with OpenAI support
pip install "pydantic-ai[openai]"
# Or with Anthropic support
pip install "pydantic-ai[anthropic]"
# Or install all providers
pip install "pydantic-ai[all]"
Then set your API key as an environment variable:
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."
PydanticAI reads API keys from standard environment variables by convention, so you never end up hardcoding secrets in your agent code. Simple, but it's one less thing to mess up.
Building Your First Agent with Structured Output
The core abstraction in PydanticAI is the Agent class. Unlike raw LLM calls that return unpredictable strings, a PydanticAI agent enforces a contract on what the model returns. This is a big deal — it means your downstream code can actually trust the data it receives.
Here's a practical example. Let's say you need an agent that extracts structured vulnerability data from security advisories:
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from typing import Optional
from enum import Enum
class Severity(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
class VulnerabilityReport(BaseModel):
cve_id: Optional[str] = Field(None, description="CVE identifier if available")
severity: Severity = Field(description="Assessed severity level")
affected_component: str = Field(description="Software component affected")
summary: str = Field(max_length=200, description="Brief description")
remediation: str = Field(description="Recommended fix or mitigation")
agent = Agent(
"openai:gpt-4o",
system_prompt=(
"You are a security analyst. Extract structured vulnerability "
"information from the provided advisory text."
),
output_type=VulnerabilityReport,
)
result = agent.run_sync(
"A critical buffer overflow in OpenSSL 3.1 allows remote code "
"execution via crafted TLS handshakes. Update to 3.1.1 immediately."
)
report = result.output
print(f"{report.severity.value}: {report.affected_component}")
print(f"Fix: {report.remediation}")
What's happening under the hood: PydanticAI automatically generates a JSON schema from your VulnerabilityReport model, sends it to the LLM, and validates the response. If validation fails, it retries with the error message — the model sees exactly what went wrong and corrects itself. No manual parsing, no fragile regex extraction.
Multiple Output Types
Sometimes your agent needs to return different types depending on context. You can pass a list of types to output_type:
from pydantic import BaseModel
from pydantic_ai import Agent
class SuccessResult(BaseModel):
data: dict
confidence: float
class ErrorResult(BaseModel):
error_code: str
message: str
retryable: bool
agent = Agent(
"openai:gpt-4o",
output_type=[SuccessResult, ErrorResult, str],
system_prompt="Process the request. Return structured data on success, "
"an error object on failure, or ask for clarification.",
)
The agent picks the appropriate output type based on context, and static type checkers will correctly infer the union type on result.output. It's a small detail that makes a real difference when you're building complex pipelines.
Native vs. Tool-Based Structured Output
PydanticAI supports three modes for structured output. ToolOutput (the default) uses function-calling to extract structured data — it's the most reliable across providers. NativeOutput uses a model's built-in JSON Schema response format when available (like OpenAI's Structured Outputs). PromptedOutput works with all models but is less strict, relying on the prompt to guide the format.
My recommendation? Start with the default tool output mode. You'll only need to switch if you have a specific reason to.
Registering Tools for Real-World Actions
Tools are what turn an LLM from a text generator into an agent that can actually do things. In PydanticAI, tools are type-safe Python functions that the model can call during a conversation:
import httpx
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
@dataclass
class ApiDeps:
http_client: httpx.AsyncClient
api_base_url: str
agent = Agent(
"openai:gpt-4o",
deps_type=ApiDeps,
system_prompt="You help users check the status of their deployments.",
)
@agent.tool
async def get_deployment_status(
ctx: RunContext[ApiDeps], deployment_id: str
) -> str:
"""Get the current status of a deployment by its ID."""
response = await ctx.deps.http_client.get(
f"{ctx.deps.api_base_url}/deployments/{deployment_id}"
)
response.raise_for_status()
data = response.json()
return f"Deployment {deployment_id}: {data['status']} (last updated: {data['updated_at']})"
@agent.tool
async def list_recent_deployments(
ctx: RunContext[ApiDeps], limit: int = 5
) -> str:
"""List the most recent deployments."""
response = await ctx.deps.http_client.get(
f"{ctx.deps.api_base_url}/deployments",
params={"limit": limit, "sort": "-created_at"},
)
response.raise_for_status()
deployments = response.json()
return "\n".join(
f"- {d['id']}: {d['name']} ({d['status']})" for d in deployments
)
A few things worth noting here: tool functions receive a RunContext typed with your dependency class, parameters have type annotations that PydanticAI converts to JSON Schema for the model, and — this is important — docstrings become the tool descriptions the model uses to decide when to call them. Well-written docstrings directly improve how accurately the model uses your tools, so don't skip them.
Dependency Injection: The Key to Testable Agents
This is where PydanticAI really separates itself from the pack. Instead of hardcoding database connections, API clients, or configuration inside tools, you declare them as a typed dependency:
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
@dataclass
class SupportDeps:
customer_id: int
db_connection: any # Your database connection type
support_tier: str
agent = Agent(
"openai:gpt-4o",
deps_type=SupportDeps,
system_prompt="You are a customer support agent.",
)
@agent.instructions
async def add_customer_context(ctx: RunContext[SupportDeps]) -> str:
"""Dynamic instructions based on the customer's support tier."""
return (
f"The customer has {ctx.deps.support_tier} support. "
f"Customer ID: {ctx.deps.customer_id}."
)
@agent.tool
async def lookup_order(ctx: RunContext[SupportDeps], order_id: str) -> str:
"""Look up an order by its ID for the current customer."""
row = await ctx.deps.db_connection.fetchrow(
"SELECT * FROM orders WHERE id = $1 AND customer_id = $2",
order_id,
ctx.deps.customer_id,
)
if not row:
return "Order not found for this customer."
return f"Order {order_id}: {row['status']}, placed {row['created_at']}"
At runtime, you pass an instance of your dependency class:
result = await agent.run(
"Where is my order ORD-12345?",
deps=SupportDeps(
customer_id=42,
db_connection=db_pool,
support_tier="premium",
),
)
And here's where it gets really nice — for testing, you just swap in mock dependencies:
from unittest.mock import AsyncMock
mock_db = AsyncMock()
mock_db.fetchrow.return_value = {
"status": "shipped",
"created_at": "2026-02-20",
}
result = await agent.run(
"Where is my order ORD-12345?",
deps=SupportDeps(
customer_id=42,
db_connection=mock_db,
support_tier="basic",
),
)
Your agent code becomes deterministically testable. No monkey-patching, no global state, no test pollution. If you've ever tried to write tests for LangChain agents, you'll appreciate how much cleaner this is.
Connecting MCP Servers for External Tools
The Model Context Protocol (MCP) lets your agent connect to external tool servers using a standardized interface. PydanticAI has native MCP support with three transport options:
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio, MCPServerStreamableHTTP
# Stdio transport — runs the MCP server as a subprocess
filesystem_server = MCPServerStdio("npx", "-y", "@modelcontextprotocol/server-filesystem", "/data")
# Streamable HTTP transport — connects to a remote MCP server
search_server = MCPServerStreamableHTTP("https://mcp.example.com/search")
agent = Agent(
"openai:gpt-4o",
system_prompt="You help users analyze files and search for information.",
toolsets=[filesystem_server, search_server],
)
# Use async context manager to open server connections
async with agent:
result = await agent.run("Find all CSV files in /data and summarize them")
print(result.output)
Each MCP server registers as a toolset, and the agent discovers available tools at startup. Here's the cool part: this is the same protocol used by Claude Desktop and Cursor, so any MCP server built for those tools works with PydanticAI agents out of the box. No adapters, no glue code.
Dynamic System Prompts and Instructions
Production agents rarely use static prompts. Contexts change, users have different permissions, and timestamps matter. PydanticAI handles this with dynamic instructions that pull context from dependencies at runtime:
from datetime import datetime
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
@dataclass
class AnalystDeps:
user_timezone: str
data_access_level: str
agent = Agent(
"anthropic:claude-sonnet-4-6",
deps_type=AnalystDeps,
system_prompt="You are a data analyst assistant.",
)
@agent.instructions
def time_context(ctx: RunContext[AnalystDeps]) -> str:
return f"Current UTC time: {datetime.utcnow().isoformat()}. User timezone: {ctx.deps.user_timezone}."
@agent.instructions
def access_context(ctx: RunContext[AnalystDeps]) -> str:
if ctx.deps.data_access_level == "admin":
return "The user has full access to all datasets including PII."
return "The user has restricted access. Never include PII in responses."
Dynamic instructions are evaluated at the start of each run, so the agent always has current context. No more manually assembling prompt strings or forgetting to update the timestamp.
Streaming Structured Output
For user-facing applications, streaming gives immediate feedback while still maintaining validation. Nobody wants to stare at a blank screen waiting for a full response:
from pydantic import BaseModel
from pydantic_ai import Agent
class AnalysisResult(BaseModel):
title: str
findings: list[str]
risk_level: str
recommendation: str
agent = Agent(
"openai:gpt-4o",
output_type=AnalysisResult,
system_prompt="Analyze the provided data and return structured findings.",
)
async def stream_analysis(user_query: str):
async with agent.run_stream(user_query) as stream:
async for partial in stream.stream_output(debounce_by=0.1):
# partial is a partially validated AnalysisResult
print(f"Title: {partial.title}")
if partial.findings:
print(f"Findings so far: {len(partial.findings)}")
# After the stream completes, get the fully validated result
result = stream.result()
return result.output
The debounce_by parameter prevents flooding your UI with updates for every token — a nice touch. And the final result is always fully validated against your Pydantic model, so you get the best of both worlds: real-time feedback and data integrity.
Error Handling and Retries
Production agents must handle failures gracefully. Things break. APIs go down. Models return weird stuff. PydanticAI provides structured retry mechanisms at multiple levels:
from pydantic_ai import Agent, ModelRetry, RunContext
agent = Agent(
"openai:gpt-4o",
retries=3, # Global retry limit for output validation
)
@agent.tool(retries=2) # Tool-specific retry limit
async def query_database(ctx: RunContext, sql_query: str) -> str:
"""Execute a read-only SQL query against the analytics database."""
if "DROP" in sql_query.upper() or "DELETE" in sql_query.upper():
raise ModelRetry(
"Only SELECT queries are allowed. "
"Rewrite the query as a SELECT statement."
)
# Execute the query...
return "query results here"
ModelRetry is particularly clever — it sends your error message back to the model as feedback. The model sees exactly what went wrong and tries again with a corrected approach. This isn't blind retry logic; the model actually learns from each failure within the conversation. I've seen this catch and self-correct issues that would have required manual intervention in other frameworks.
Observability with Pydantic Logfire
You can't run production AI agents without observability. You just can't. Pydantic Logfire is an OpenTelemetry-based platform that gives you unified tracing across your entire stack — LLM calls, agent tool use, database queries, and API requests all in one trace:
import logfire
from pydantic_ai import Agent
# Initialize Logfire once at startup
logfire.configure()
agent = Agent(
"openai:gpt-4o",
system_prompt="You are a helpful assistant.",
)
# That's it — every agent.run() call now produces detailed traces
That's literally two lines to add. With Logfire enabled, every agent run creates a trace showing messages exchanged with the model, tool calls and their results, token usage and estimated cost, latency per step, and validation errors and retries.
Logfire's free tier provides 10 million spans per month, which is more than enough for development and staging environments.
For production deployments, you'll want to export through an OpenTelemetry Collector for centralized collection and sampling:
import logfire
logfire.configure(
send_to_logfire=True,
service_name="my-agent-service",
environment="production",
)
Durable Execution for Long-Running Agents
Agents that run multi-step workflows — think research tasks, data pipelines, approval chains — can't afford to lose progress when an API call fails or a server restarts. PydanticAI's durable execution integration with Temporal preserves agent state across failures:
from pydantic_ai import Agent
from pydantic_ai.toolsets.temporal import temporal_toolset
agent = Agent(
"openai:gpt-4o",
system_prompt="You orchestrate data pipeline tasks.",
)
# Wrap your agent run in a Temporal workflow
# If the agent crashes mid-execution, it resumes from the last checkpoint
async def run_durable_agent(user_request: str):
async with temporal_toolset(agent) as durable_agent:
result = await durable_agent.run(user_request)
return result.output
Durable execution also supports Prefect and DBOS as backends. The agent keeps full support for streaming and MCP while adding fault tolerance. So if your agent crashes halfway through a complex workflow, it picks up exactly where it left off. That's the kind of resilience you need for anything beyond a demo.
Putting It All Together: A Production Agent
So, let's bring everything together. Here's a complete example that combines structured output, tools, dependency injection, and observability into a production-ready code review agent:
import logfire
import httpx
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
logfire.configure()
class ReviewResult(BaseModel):
summary: str = Field(description="Brief summary of the code changes")
issues: list[str] = Field(description="List of issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
approve: bool = Field(description="Whether the changes should be approved")
@dataclass
class ReviewDeps:
github_token: str
http_client: httpx.AsyncClient
repo: str
review_agent = Agent(
"anthropic:claude-sonnet-4-6",
deps_type=ReviewDeps,
output_type=ReviewResult,
system_prompt=(
"You are a senior code reviewer. Analyze pull request diffs "
"for bugs, security issues, performance problems, and style. "
"Be specific and actionable in your feedback."
),
retries=2,
)
@review_agent.tool
async def get_pr_diff(ctx: RunContext[ReviewDeps], pr_number: int) -> str:
"""Fetch the diff for a pull request."""
response = await ctx.deps.http_client.get(
f"https://api.github.com/repos/{ctx.deps.repo}/pulls/{pr_number}",
headers={
"Authorization": f"token {ctx.deps.github_token}",
"Accept": "application/vnd.github.v3.diff",
},
)
response.raise_for_status()
return response.text[:10000] # Truncate large diffs
@review_agent.tool
async def get_file_content(
ctx: RunContext[ReviewDeps], path: str, ref: str = "main"
) -> str:
"""Fetch the content of a file from the repository."""
response = await ctx.deps.http_client.get(
f"https://api.github.com/repos/{ctx.deps.repo}/contents/{path}",
headers={"Authorization": f"token {ctx.deps.github_token}"},
params={"ref": ref},
)
response.raise_for_status()
import base64
content = base64.b64decode(response.json()["content"]).decode()
return content[:5000]
async def review_pull_request(repo: str, pr_number: int) -> ReviewResult:
async with httpx.AsyncClient() as client:
result = await review_agent.run(
f"Review pull request #{pr_number}",
deps=ReviewDeps(
github_token="ghp_...",
http_client=client,
repo=repo,
),
)
return result.output
This agent fetches real PR data, analyzes the diff, and returns a validated ReviewResult with specific issues and suggestions. The dependency injection makes it trivially testable — swap the HTTP client for a mock and you can test the agent's reasoning without ever hitting GitHub's API.
When to Choose PydanticAI Over Alternatives
PydanticAI is the right choice when you need validated, structured outputs that downstream code depends on, type-safe tool contracts where parameter correctness matters, testable agent code with injectable dependencies, production observability from day one, or model portability across providers without rewriting agent logic.
That said, it's not always the best fit.
Consider Instructor if you only need structured data extraction without agent loops or tools — it's lighter weight and focused purely on extraction. Consider LangGraph if your primary need is complex graph-based orchestration with an extensive integration ecosystem (LangChain's own docs now recommend LangGraph for agent work). And consider CrewAI if your use case centers on multi-agent role-based collaboration, though be aware of its higher failure rates in benchmarks.
Frequently Asked Questions
What Python version does PydanticAI require?
PydanticAI requires Python 3.9 or later and is built on Pydantic v2. It uses standard Python type hints extensively, so Python 3.10+ is recommended for the best developer experience with newer typing syntax like X | Y unions.
Can PydanticAI work with local models through Ollama?
Yes! PydanticAI supports Ollama as a model provider, so you can run agents against local models like Llama, Mistral, or Qwen. Install with pip install "pydantic-ai[openai]" and point the model string to your Ollama endpoint. Just keep in mind that structured output reliability varies by local model — larger models handle JSON Schema constraints better.
How does PydanticAI handle rate limits and API errors?
PydanticAI's retry mechanism handles both validation failures and transient API errors. For validation failures, the error gets sent back to the model for self-correction. For API rate limits, you'll want to combine PydanticAI with standard retry libraries like tenacity or use durable execution backends like Temporal that handle transient failures automatically.
Is PydanticAI stable enough for production use?
PydanticAI reached v1.0 in September 2025 with a commitment to API stability — no breaking changes until v2. The framework is now at v1.63.0 (February 2026) and is used in production by companies running 50,000+ AI workflows. The v1 stability guarantee means your code won't break from framework updates for at least six months after any future v2 release.
How does PydanticAI compare to using OpenAI's native Structured Outputs?
OpenAI's Structured Outputs is provider-specific — it only works with OpenAI models. PydanticAI gives you the same structured output guarantee across all providers through its tool-based output mode, plus adds validation, retries, dependency injection, and observability on top. You can even use OpenAI's native mode through PydanticAI's NativeOutput wrapper when targeting OpenAI specifically, while keeping your agent code portable to other providers.