Building Reliable PydanticAI Agents: Observability Patterns That Actually Work
PydanticAI's type-safe agent framework catches many bugs at compile time — but structured output validation failures, dependency injection bugs, and tool retry storms still slip through. Here's how distributed tracing surfaces each one and how to instrument PydanticAI agents with Nexus in under 10 lines of Python.
PydanticAI is one of the most exciting developments in the Python agent ecosystem. It brings full type safety to agent construction: your tools have typed inputs and outputs, your structured responses are validated at runtime, and your dependency injection is explicit and testable. The framework catches entire categories of bugs that LangChain and CrewAI let slide into production.
But "type-safe" doesn't mean "failure-free." Structured output validation errors, dependency injection scope bugs, and tool retry storms all still happen — and when they do, they're harder to diagnose than in loosely-typed frameworks precisely because the code looks correct. The types check out. The model returned something. The validation failed anyway. Why?
Distributed tracing fills the gap. Here are the five most common PydanticAI failure patterns, what each looks like in a trace, and how to instrument your agent with Nexus in under 10 lines.
1. Structured output validation failures
PydanticAI validates model responses against your result_type Pydantic model. When validation fails, the agent retries with a correction message. This is usually invisible in logs — you see a successful final result, not the two failed validation attempts that preceded it.
In a trace, each attempt is a separate span. You can see the raw model output that failed validation, the correction prompt that was sent, and the response that finally passed. Three spans where you expected one is the signal.
from pydantic import BaseModel
from pydantic_ai import Agent
import nexus_sdk as nexus
class AnalysisResult(BaseModel):
sentiment: str # 'positive', 'negative', or 'neutral'
confidence: float # 0.0 to 1.0
key_topics: list[str]
agent = Agent(
'openai:gpt-4o',
result_type=AnalysisResult,
system_prompt="Analyze the sentiment and topics of the given text.",
)
async def analyze_with_tracing(text: str) -> AnalysisResult:
trace = nexus.start_trace(agent_id="sentiment-analyzer", name="analyze")
span = nexus.start_span(trace_id=trace.id, name="pydantic-ai-run")
try:
result = await agent.run(text)
nexus.end_span(span.id, status="ok",
output={"sentiment": result.data.sentiment,
"confidence": result.data.confidence})
nexus.end_trace(trace.id, status="success")
return result.data
except Exception as e:
nexus.end_span(span.id, status="error",
metadata={"error_type": type(e).__name__, "error": str(e)})
nexus.end_trace(trace.id, status="error")
raise
When you see a span with 3+ retry sub-spans, check the raw model outputs in each one. PydanticAI will tell you exactly which field failed validation and why — but only if you're looking at the span's input/output data, not just the final result.
2. Dependency injection debugging
PydanticAI's dependency injection system is one of its best features: you declare what your tools need as typed deps, and the framework injects them at runtime. The bug pattern is subtle: when tests pass (because you mock the deps correctly) but production fails (because a dep is None or has a different type than expected), the error message is often cryptic.
Adding dep metadata to your spans makes this transparent:
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus
@dataclass
class AgentDeps:
db_pool: object # database connection pool
cache_client: object # Redis or similar
user_id: str
agent = Agent('openai:gpt-4o', deps_type=AgentDeps)
@agent.tool
async def get_user_history(ctx: RunContext[AgentDeps], limit: int = 10) -> list[dict]:
span = nexus.start_span(
trace_id=ctx.run_id, # use run_id as trace context
name="tool:get_user_history",
metadata={
"dep.db_pool": type(ctx.deps.db_pool).__name__,
"dep.cache_available": ctx.deps.cache_client is not None,
"dep.user_id_set": bool(ctx.deps.user_id),
}
)
try:
result = await ctx.deps.db_pool.fetch(
"SELECT * FROM history WHERE user_id = $1 LIMIT $2",
ctx.deps.user_id, limit
)
nexus.end_span(span.id, status="ok",
metadata={"row_count": len(result)})
return [dict(r) for r in result]
except Exception as e:
nexus.end_span(span.id, status="error",
metadata={"error": str(e)})
raise
When a dep is wrong in production, the span metadata shows you immediately: dep.db_pool: NoneType instead of dep.db_pool: AsyncPool. You don't need to reproduce the failure — the trace tells you exactly what was injected.
3. Tool retry storms
PydanticAI's max_retries setting applies to tool calls as well as structured outputs. When a tool is flaky — intermittent timeouts, rate limits, transient 503s — the agent retries automatically. Without tracing, you see the final success and assume everything was fine. With tracing, you see that your agent made 12 tool calls when it should have made 2.
import time
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus
agent = Agent('anthropic:claude-sonnet-4-6', max_retries=3)
@agent.tool_plain
async def fetch_market_data(symbol: str) -> dict:
# Track every attempt, not just the final result
attempt_start = time.time()
try:
data = await call_market_api(symbol)
duration_ms = int((time.time() - attempt_start) * 1000)
# Log the attempt in the active trace via a context var or thread-local
nexus.add_event(name="market_api_success",
metadata={"symbol": symbol, "duration_ms": duration_ms})
return data
except Exception as e:
duration_ms = int((time.time() - attempt_start) * 1000)
nexus.add_event(name="market_api_failure",
metadata={"symbol": symbol,
"error": str(e),
"duration_ms": duration_ms})
raise # PydanticAI will retry
A trace with 12 market_api_failure events before success tells you your retry budget is being consumed on every run. The fix might be a circuit breaker, a higher timeout, or a different API endpoint — but you can only make that call if you can see the pattern.
4. Model fallback tracing
PydanticAI supports model fallbacks via FallbackModel. When your primary model is overloaded or returns an error, the agent switches to the fallback silently. This is good for reliability but bad for cost visibility: your trace costs just doubled and you don't know it.
from pydantic_ai import Agent
from pydantic_ai.models.fallback import FallbackModel
import nexus_sdk as nexus
# Wrap FallbackModel to capture which model was actually used
class TracedFallbackModel(FallbackModel):
async def request(self, messages, model_settings):
for i, model in enumerate(self.models):
try:
result = await model.request(messages, model_settings)
nexus.add_event(name="model_used",
metadata={"model": model.model_name,
"fallback_index": i,
"was_fallback": i > 0})
return result
except Exception as e:
nexus.add_event(name="model_failed",
metadata={"model": model.model_name,
"error": str(e)})
if i == len(self.models) - 1:
raise
# Continue to next model
agent = Agent(
TracedFallbackModel(
'openai:gpt-4o',
'openai:gpt-4o-mini', # fallback
)
)
When you see was_fallback: true in a trace, you know a more expensive or lower-quality model was used. Aggregate these across a day and you can tell your team "we're falling back to gpt-4o-mini on 15% of requests — here's why."
5. Full Nexus integration walkthrough
Here's a complete PydanticAI agent with Nexus tracing. Install the SDK with pip install nexus-sdk and set NEXUS_API_KEY in your environment.
import os
from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus
nexus.configure(
api_key=os.environ["NEXUS_API_KEY"],
agent_id="pydantic-research-agent",
)
class ResearchOutput(BaseModel):
summary: str
sources: list[str]
confidence: float
@dataclass
class Deps:
search_client: object
agent = Agent(
"openai:gpt-4o",
result_type=ResearchOutput,
deps_type=Deps,
system_prompt="Research the given topic and return a structured summary.",
)
@agent.tool
async def web_search(ctx: RunContext[Deps], query: str) -> list[dict]:
span = nexus.start_span(name="tool:web_search",
input={"query": query})
try:
results = await ctx.deps.search_client.search(query)
nexus.end_span(span.id, status="ok",
output={"result_count": len(results)})
return results
except Exception as e:
nexus.end_span(span.id, status="error",
metadata={"error": str(e)})
raise
async def research(topic: str, deps: Deps) -> ResearchOutput:
trace = nexus.start_trace(name=f"research: {topic[:60]}")
try:
result = await agent.run(topic, deps=deps)
nexus.end_trace(trace.id, status="success",
metadata={"confidence": result.data.confidence,
"source_count": len(result.data.sources)})
return result.data
except Exception as e:
nexus.end_trace(trace.id, status="error",
metadata={"error": str(e)})
raise
With this in place, every research call produces a trace with: overall status, duration, and confidence score at the trace level; individual tool call durations and result counts at the span level. The Nexus dashboard shows you error rates per agent, slow traces, and token usage trends over time — without any additional instrumentation.
Why PydanticAI specifically benefits from tracing
PydanticAI's type safety catches bugs at definition time. Tracing catches bugs at runtime. They're complementary: the framework tells you the code is structurally correct; the trace tells you what actually happened when it ran. Validation retries, dep injection scope, and model fallback behavior are all runtime phenomena that static analysis can't see.
Adding Nexus to a PydanticAI agent takes under 10 lines. The visibility payoff — being able to open a trace and see exactly which validation failed, which dep was None, and which model actually ran — is immediate.
Add tracing to your PydanticAI agents
Nexus gives you distributed traces, structured output failure analysis, and token cost tracking for every PydanticAI run. Free to start — no credit card required.
Start free →