2026-04-18 · 8 min read

Tracing OpenAI Agents SDK: Observability for Swarm-Style Agent Pipelines

OpenAI's Agents SDK (formerly Swarm) makes it easy to build multi-agent pipelines with handoffs and function tools. It also makes it easy to build ones where handoff bugs, tool failures, and infinite delegation loops are invisible. Here's how to add full trace observability with Nexus.

What the Agents SDK adds

OpenAI’s Agents SDK (the evolution of the Swarm prototype) gives you two key primitives: Agents (LLM instances with instructions and function tools) and handoffs (the ability for one agent to transfer control to another). A triage agent receives user input, decides which specialist to route to, and hands off the conversation. The specialist handles the task, then optionally hands back or terminates.

This pattern is powerful and produces failure modes that don’t exist in single-agent setups:

Instrumenting Agents SDK runs

The Agents SDK doesn’t have built-in tracing hooks, so instrumentation happens at the runner level. Wrap Runner.run() in a Nexus trace and capture each handoff and tool call as a span:

import os
import asyncio
from agents import Agent, Runner, function_tool
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

# Define a function tool
@function_tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Sunny, 22°C in {city}"

# Define agents
weather_agent = Agent(
    name="weather_specialist",
    instructions="You are a weather expert. Use the get_weather tool to answer questions.",
    tools=[get_weather],
)

triage_agent = Agent(
    name="triage",
    instructions="Route user questions to the correct specialist.",
    handoffs=[weather_agent],
)

async def run_with_tracing(user_message: str, user_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "openai-agents-sdk-triage",
        "name": f"triage: {user_message[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "user_id": user_id,
            "environment": os.environ.get("APP_ENV", "dev"),
            "entry_agent": "triage",
        },
    })
    trace_id = trace["trace_id"]

    try:
        result = await Runner.run(triage_agent, user_message)
        final_output = result.final_output

        # Record the agent that produced the final output
        nexus.end_trace(trace_id, {
            "status": "success",
            "metadata": {
                "final_agent": result.last_agent.name if result.last_agent else "unknown",
                "total_turns": len(result.messages),
                "handoffs": len([m for m in result.messages if m.get("role") == "handoff"]),
            },
        })
        return final_output

    except Exception as e:
        nexus.end_trace(trace_id, {"status": "error", "metadata": {"error": str(e)}})
        raise

Tracing handoffs explicitly

The Agents SDK surfaces handoffs in the message list. Extract them for per-handoff spans:

async def run_with_handoff_tracing(user_message: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "openai-agents-sdk-triage",
        "name": f"triage: {user_message[:60]}",
        "status": "running",
        "started_at": nexus.now(),
    })
    trace_id = trace["trace_id"]
    handoff_spans = []

    try:
        result = await Runner.run(triage_agent, user_message)

        # Record each handoff as a span
        for i, msg in enumerate(result.messages):
            if hasattr(msg, 'role') and msg.role == 'tool' and 'handoff' in str(msg.content).lower():
                span = nexus.start_span(trace_id, {
                    "name": f"handoff:{i}",
                    "type": "tool",
                    "metadata": {
                        "handoff_index": i,
                        "content_preview": str(msg.content)[:200],
                    },
                })
                nexus.end_span(span["id"], {"output": str(msg.content)[:500]})
                handoff_spans.append(span)

        nexus.end_trace(trace_id, {
            "status": "success",
            "metadata": {
                "final_agent": result.last_agent.name if result.last_agent else "triage",
                "total_messages": len(result.messages),
                "handoff_count": len(handoff_spans),
            },
        })
        return result.final_output

    except Exception as e:
        nexus.end_trace(trace_id, {"status": "error", "metadata": {"error": str(e)}})
        raise

What to look for in the trace

Detecting handoff loops: If handoff_count in trace metadata is above 3-4 for a simple query, you have a loop. Check whether the triage agent and specialist have contradictory instructions about when to hand off.

Detecting wrong routing: Filter traces by final_agent metadata. If queries that should reach the weather agent are ending with final_agent: triage, the triage agent is failing to route them — the specialist handoff instruction is too narrow.

Detecting tool failures: Tool errors in the Agents SDK surface as content in the message list, not as exceptions. Check spans where the message content contains "error" or the word "sorry" — these often indicate a tool returned an error that was passed to the model rather than raised.

Five lines of instrumentation at the runner level gives you complete visibility into every Agents SDK run: who handled it, how many handoffs occurred, and which agent produced the final output.

Trace your Agents SDK pipeline with Nexus

Nexus gives you per-handoff visibility and agent-level health monitoring. Free tier, no credit card required.

Start free →