How to Instrument Claude Code Agents with Nexus Observability

Claude Code agents run long, multi-step tasks — and when they fail, you want to know exactly where. Here's how to wrap Claude Code tool executions in Nexus traces so every agent run is fully observable: what happened, how long each step took, and what failed.

Claude Code agents are uniquely opaque. A single user request might trigger dozens of tool calls — bash commands, file reads, web searches — spread across minutes or hours. When something goes wrong, you're left with a wall of terminal output and no clear way to see which step took 45 seconds or where the error actually originated.

Nexus fixes this by giving you a waterfall view of every agent session: each LLM call, each tool use, each span of work — timestamped, measured, and searchable. Here's exactly how to instrument a Claude Code agent.

The core pattern: one trace per agent session

The most useful mental model is: one Nexus trace = one user request to your agent. Within that trace, you create spans for each meaningful unit of work: LLM calls, tool executions, and the agentic loop itself.

Here's a complete TypeScript example wrapping an Anthropic SDK agent loop:

import { NexusClient } from '@keylightdigital/nexus'
import Anthropic from '@anthropic-ai/sdk'

const nexus = new NexusClient({
  apiKey: process.env.NEXUS_API_KEY!,
  agentId: 'my-claude-code-agent',
})
const anthropic = new Anthropic()

async function runAgentTask(userRequest: string) {
  // One trace per agent session
  const trace = await nexus.startTrace({
    name: `agent: ${userRequest.slice(0, 60)}`,
    metadata: { user_request: userRequest },
  })

  try {
    const tools = [/* your tool definitions */]

    // Outer span for the agentic loop
    const loopSpan = await trace.addSpan({ name: 'agentic-loop' })
    let iterations = 0

    while (true) {
      iterations++
      // Span per LLM call
      const llmSpan = await trace.addSpan({
        name: `llm-call-${iterations}`,
        parentSpanId: loopSpan.id,
        input: { iteration: iterations },
      })

      const response = await anthropic.messages.create({
        model: 'claude-opus-4-6',
        max_tokens: 4096,
        tools,
        messages: conversationHistory,
      })

      await llmSpan.end({
        status: 'ok',
        output: {
          stop_reason: response.stop_reason,
          input_tokens: response.usage.input_tokens,
          output_tokens: response.usage.output_tokens,
        },
      })

      if (response.stop_reason === 'end_turn') break

      // Span per tool execution
      for (const block of response.content) {
        if (block.type !== 'tool_use') continue
        const toolSpan = await trace.addSpan({
          name: `tool:${block.name}`,
          parentSpanId: loopSpan.id,
          input: block.input,
        })
        const result = await executeTool(block.name, block.input)
        await toolSpan.end({ status: 'ok', output: result })
      }
    }

    await loopSpan.end({ status: 'ok', output: { iterations } })
    await trace.end({ status: 'success' })
  } catch (err) {
    await trace.end({ status: 'error' })
    throw err
  }
}

This gives you the full picture in Nexus: the outer trace, an "agentic-loop" span that shows total duration, individual LLM call spans with token counts, and tool spans showing what was executed.

Instrumenting individual tools

If you want finer granularity, instrument at the tool level. Here's how to wrap the bash tool (the workhorse of most Claude Code agents) so every command becomes a traceable span:

# In your CLAUDE.md or agent instructions, include:
# "Always log progress using the nexus trace context"

# Or instrument at the tool level:
from nexus_client import NexusClient
import subprocess

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="my-claude-code-agent",
)

def bash_tool_with_tracing(trace, command: str) -> str:
    """Wrap the bash tool so every shell command is a span."""
    span = trace.add_span(
        name="bash",
        input={"command": command[:200]},
    )
    try:
        result = subprocess.run(
            command, shell=True, capture_output=True, text=True, timeout=30
        )
        output = result.stdout or result.stderr
        span.end(
            status="ok",
            output={"exit_code": result.returncode, "output": output[:500]},
        )
        return output
    except subprocess.TimeoutExpired:
        span.end(status="timeout", output={"command": command})
        raise

What you'll see in your Nexus dashboard

Once instrumented, every agent run shows up as a trace in your Nexus dashboard. The waterfall view lets you instantly see:

Total duration per session — which requests take 30 seconds vs. 5 minutes
LLM call latency — is Claude being slow, or is it your tools?
Tool execution time — which bash commands are bottlenecks
Token usage per call — track context growth across iterations
Error location — the failing span is red, not buried in terminal output

Email alerts when agents fail

Pro users can configure email alerts so they're notified the moment an agent trace errors. When a Claude Code session crashes mid-task, you know immediately — with a link directly to the failing trace. No more checking terminal windows.

Getting started

Install the SDK (npm install @keylightdigital/nexus), create a free account at nexus.keylightdigital.dev, grab an API key, and you'll have traces flowing in under 5 minutes. The free tier covers 1,000 traces/month — more than enough to get started.