Building Reliable PydanticAI Agents: Observability Patterns That Actually Work

PydanticAI's type-safe agent framework catches many bugs at compile time — but structured output validation failures, dependency injection bugs, and tool retry storms still slip through. Here's how distributed tracing surfaces each one and how to instrument PydanticAI agents with Nexus in under 10 lines of Python.

PydanticAI is one of the most exciting developments in the Python agent ecosystem. It brings full type safety to agent construction: your tools have typed inputs and outputs, your structured responses are validated at runtime, and your dependency injection is explicit and testable. The framework catches entire categories of bugs that LangChain and CrewAI let slide into production.

But "type-safe" doesn't mean "failure-free." Structured output validation errors, dependency injection scope bugs, and tool retry storms all still happen — and when they do, they're harder to diagnose than in loosely-typed frameworks precisely because the code looks correct. The types check out. The model returned something. The validation failed anyway. Why?

Distributed tracing fills the gap. Here are the five most common PydanticAI failure patterns, what each looks like in a trace, and how to instrument your agent with Nexus in under 10 lines.

1. Structured output validation failures

PydanticAI validates model responses against your result_type Pydantic model. When validation fails, the agent retries with a correction message. This is usually invisible in logs — you see a successful final result, not the two failed validation attempts that preceded it.

In a trace, each attempt is a separate span. You can see the raw model output that failed validation, the correction prompt that was sent, and the response that finally passed. Three spans where you expected one is the signal.

from pydantic import BaseModel
from pydantic_ai import Agent
import nexus_sdk as nexus

class AnalysisResult(BaseModel):
    sentiment: str          # 'positive', 'negative', or 'neutral'
    confidence: float       # 0.0 to 1.0
    key_topics: list[str]

agent = Agent(
    'openai:gpt-4o',
    result_type=AnalysisResult,
    system_prompt="Analyze the sentiment and topics of the given text.",
)

async def analyze_with_tracing(text: str) -> AnalysisResult:
    trace = nexus.start_trace(agent_id="sentiment-analyzer", name="analyze")
    span = nexus.start_span(trace_id=trace.id, name="pydantic-ai-run")
    try:
        result = await agent.run(text)
        nexus.end_span(span.id, status="ok",
                       output={"sentiment": result.data.sentiment,
                               "confidence": result.data.confidence})
        nexus.end_trace(trace.id, status="success")
        return result.data
    except Exception as e:
        nexus.end_span(span.id, status="error",
                       metadata={"error_type": type(e).__name__, "error": str(e)})
        nexus.end_trace(trace.id, status="error")
        raise

When you see a span with 3+ retry sub-spans, check the raw model outputs in each one. PydanticAI will tell you exactly which field failed validation and why — but only if you're looking at the span's input/output data, not just the final result.

2. Dependency injection debugging

PydanticAI's dependency injection system is one of its best features: you declare what your tools need as typed deps, and the framework injects them at runtime. The bug pattern is subtle: when tests pass (because you mock the deps correctly) but production fails (because a dep is None or has a different type than expected), the error message is often cryptic.

Adding dep metadata to your spans makes this transparent:

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus

@dataclass
class AgentDeps:
    db_pool: object         # database connection pool
    cache_client: object    # Redis or similar
    user_id: str

agent = Agent('openai:gpt-4o', deps_type=AgentDeps)

@agent.tool
async def get_user_history(ctx: RunContext[AgentDeps], limit: int = 10) -> list[dict]:
    span = nexus.start_span(
        trace_id=ctx.run_id,           # use run_id as trace context
        name="tool:get_user_history",
        metadata={
            "dep.db_pool": type(ctx.deps.db_pool).__name__,
            "dep.cache_available": ctx.deps.cache_client is not None,
            "dep.user_id_set": bool(ctx.deps.user_id),
        }
    )
    try:
        result = await ctx.deps.db_pool.fetch(
            "SELECT * FROM history WHERE user_id = $1 LIMIT $2",
            ctx.deps.user_id, limit
        )
        nexus.end_span(span.id, status="ok",
                       metadata={"row_count": len(result)})
        return [dict(r) for r in result]
    except Exception as e:
        nexus.end_span(span.id, status="error",
                       metadata={"error": str(e)})
        raise

When a dep is wrong in production, the span metadata shows you immediately: dep.db_pool: NoneType instead of dep.db_pool: AsyncPool. You don't need to reproduce the failure — the trace tells you exactly what was injected.

3. Tool retry storms

PydanticAI's max_retries setting applies to tool calls as well as structured outputs. When a tool is flaky — intermittent timeouts, rate limits, transient 503s — the agent retries automatically. Without tracing, you see the final success and assume everything was fine. With tracing, you see that your agent made 12 tool calls when it should have made 2.

import time
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus

agent = Agent('anthropic:claude-sonnet-4-6', max_retries=3)

@agent.tool_plain
async def fetch_market_data(symbol: str) -> dict:
    # Track every attempt, not just the final result
    attempt_start = time.time()
    try:
        data = await call_market_api(symbol)
        duration_ms = int((time.time() - attempt_start) * 1000)
        # Log the attempt in the active trace via a context var or thread-local
        nexus.add_event(name="market_api_success",
                        metadata={"symbol": symbol, "duration_ms": duration_ms})
        return data
    except Exception as e:
        duration_ms = int((time.time() - attempt_start) * 1000)
        nexus.add_event(name="market_api_failure",
                        metadata={"symbol": symbol,
                                  "error": str(e),
                                  "duration_ms": duration_ms})
        raise  # PydanticAI will retry

A trace with 12 market_api_failure events before success tells you your retry budget is being consumed on every run. The fix might be a circuit breaker, a higher timeout, or a different API endpoint — but you can only make that call if you can see the pattern.

4. Model fallback tracing

PydanticAI supports model fallbacks via FallbackModel. When your primary model is overloaded or returns an error, the agent switches to the fallback silently. This is good for reliability but bad for cost visibility: your trace costs just doubled and you don't know it.

from pydantic_ai import Agent
from pydantic_ai.models.fallback import FallbackModel
import nexus_sdk as nexus

# Wrap FallbackModel to capture which model was actually used
class TracedFallbackModel(FallbackModel):
    async def request(self, messages, model_settings):
        for i, model in enumerate(self.models):
            try:
                result = await model.request(messages, model_settings)
                nexus.add_event(name="model_used",
                                metadata={"model": model.model_name,
                                          "fallback_index": i,
                                          "was_fallback": i > 0})
                return result
            except Exception as e:
                nexus.add_event(name="model_failed",
                                metadata={"model": model.model_name,
                                          "error": str(e)})
                if i == len(self.models) - 1:
                    raise
                # Continue to next model

agent = Agent(
    TracedFallbackModel(
        'openai:gpt-4o',
        'openai:gpt-4o-mini',  # fallback
    )
)

When you see was_fallback: true in a trace, you know a more expensive or lower-quality model was used. Aggregate these across a day and you can tell your team "we're falling back to gpt-4o-mini on 15% of requests — here's why."

5. Full Nexus integration walkthrough

Here's a complete PydanticAI agent with Nexus tracing. Install the SDK with pip install nexus-sdk and set NEXUS_API_KEY in your environment.

import os
from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
import nexus_sdk as nexus

nexus.configure(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="pydantic-research-agent",
)

class ResearchOutput(BaseModel):
    summary: str
    sources: list[str]
    confidence: float

@dataclass
class Deps:
    search_client: object

agent = Agent(
    "openai:gpt-4o",
    result_type=ResearchOutput,
    deps_type=Deps,
    system_prompt="Research the given topic and return a structured summary.",
)

@agent.tool
async def web_search(ctx: RunContext[Deps], query: str) -> list[dict]:
    span = nexus.start_span(name="tool:web_search",
                            input={"query": query})
    try:
        results = await ctx.deps.search_client.search(query)
        nexus.end_span(span.id, status="ok",
                       output={"result_count": len(results)})
        return results
    except Exception as e:
        nexus.end_span(span.id, status="error",
                       metadata={"error": str(e)})
        raise

async def research(topic: str, deps: Deps) -> ResearchOutput:
    trace = nexus.start_trace(name=f"research: {topic[:60]}")
    try:
        result = await agent.run(topic, deps=deps)
        nexus.end_trace(trace.id, status="success",
                        metadata={"confidence": result.data.confidence,
                                  "source_count": len(result.data.sources)})
        return result.data
    except Exception as e:
        nexus.end_trace(trace.id, status="error",
                        metadata={"error": str(e)})
        raise

With this in place, every research call produces a trace with: overall status, duration, and confidence score at the trace level; individual tool call durations and result counts at the span level. The Nexus dashboard shows you error rates per agent, slow traces, and token usage trends over time — without any additional instrumentation.

Why PydanticAI specifically benefits from tracing

PydanticAI's type safety catches bugs at definition time. Tracing catches bugs at runtime. They're complementary: the framework tells you the code is structurally correct; the trace tells you what actually happened when it ran. Validation retries, dep injection scope, and model fallback behavior are all runtime phenomena that static analysis can't see.

Adding Nexus to a PydanticAI agent takes under 10 lines. The visibility payoff — being able to open a trace and see exactly which validation failed, which dep was None, and which model actually ran — is immediate.