Debugging Multi-Agent Orchestration: A Practical Guide

Multi-agent systems fail in ways that single-agent debugging can't handle. When an orchestrator delegates to 5 sub-agents in parallel and one fails silently, you need distributed trace data — not just a single error message. This guide covers the 4 most common multi-agent failure modes and how to diagnose each one using trace spans.

Single-agent debugging is hard. Multi-agent debugging is a different problem entirely. When an orchestrator spawns 5 sub-agents in parallel and the final result is wrong, you can't just look at one trace — you need to see all 6 traces, understand how they relate, and identify which sub-agent's output corrupted the result.

This post covers the 4 most common multi-agent failure modes and the specific trace patterns that reveal them.

Failure mode 1: Silent sub-agent failures

The most insidious multi-agent bug: a sub-agent fails, the orchestrator doesn't check the return value, and the bad output propagates silently. The final result looks plausible but is wrong.

Trace signature: The sub-agent trace shows status: error, but the orchestrator trace shows status: success. Look for orchestrator spans that don't validate sub-agent output before using it.

Fix: Always surface sub-agent failures to the orchestrator. Log the sub-agent's trace ID in the orchestrator's span metadata:

const subAgentResult = await runSubAgent(task)

await orchestratorSpan.end({
  status: subAgentResult.ok ? 'success' : 'error',
  metadata: {
    sub_agent_trace_id: subAgentResult.traceId,
    sub_agent_status: subAgentResult.status,
    sub_agent_error: subAgentResult.error ?? null,
  },
})

Failure mode 2: Context pollution between agents

Agent A writes a result to shared state. Agent B reads it before Agent A finishes. Agent B operates on incomplete data, producing a confident but wrong answer.

Trace signature: Two agents have overlapping time windows in the waterfall view. Agent B's span starts before Agent A's span ends. The output of Agent B contains the partial data that was available at the time of the read.

Fix: Log read/write timestamps in span metadata. If Agent B reads from shared state, record what it read and when:

const data = await sharedState.read(key)

await span.end({
  status: 'success',
  metadata: {
    state_key: key,
    state_read_at: new Date().toISOString(),
    state_value_preview: JSON.stringify(data).slice(0, 200),
  },
})

Failure mode 3: Delegation loop

Agent A delegates to Agent B. Agent B decides it can't handle the task and re-delegates back to Agent A (or to a new instance of itself). The system spins until it hits a token limit or timeout.

Trace signature: You see a growing chain of traces where each one spawns another. The Nexus agent list shows the same agent with dozens of traces in rapid succession, all ending in timeout or error.

Fix: Pass a delegation depth counter in your agent context. Fail hard if depth exceeds your limit:

async function runAgent(task: Task, depth = 0) {
  const trace = await nexus.startTrace({
    name: task.name,
    metadata: { delegation_depth: depth },
  })

  if (depth > 3) {
    await trace.end({ status: 'error', metadata: { error: 'Max delegation depth exceeded' } })
    throw new Error('Delegation loop detected')
  }

  // ... agent logic that may call runAgent(subTask, depth + 1)
}

Failure mode 4: Aggregation errors

5 sub-agents return results. The orchestrator aggregates them. One sub-agent returns data in an unexpected format, and the aggregator either silently drops it or crashes.

Trace signature: 5 sub-agent traces with status: success, but the orchestrator's aggregation span shows a schema validation error or unexpectedly produces a result with N-1 items.

Fix: Log each sub-agent's output format alongside the expected schema in the aggregator span:

const results = await Promise.allSettled(subAgentPromises)
const successful = results.filter(r => r.status === 'fulfilled').map(r => r.value)
const failed = results.filter(r => r.status === 'rejected')

await aggregatorSpan.end({
  status: failed.length === 0 ? 'success' : 'error',
  metadata: {
    total_sub_agents: results.length,
    successful_count: successful.length,
    failed_count: failed.length,
    failed_reasons: failed.map(f => String(f.reason)).slice(0, 3),
  },
})

Setting up cross-agent trace correlation

The most powerful debugging tool for multi-agent systems is a shared trace ID that connects all agents in a single orchestration run. Pass an orchestration_id through all sub-agents:

const orchestrationId = crypto.randomUUID()

// Orchestrator trace
const orchTrace = await nexus.startTrace({
  name: 'orchestrator',
  metadata: { orchestration_id: orchestrationId },
})

// Sub-agent traces — all share the same orchestration_id
const subTrace = await nexus.startTrace({
  name: 'sub-agent-researcher',
  metadata: { orchestration_id: orchestrationId, parent_trace_id: orchTrace.id },
})

Now you can search your Nexus traces by orchestration_id and see every agent that participated in a single run — with timing, status, and error details for each.