2026-04-25 · 10 min read

Monitoring Azure AI Agent Service: Tracing Threads, Runs, and Tool Call Steps

Azure AI Agent Service is Microsoft's managed agent runtime built on the same threads/runs/steps model as OpenAI Assistants. When a run fails silently, a code interpreter execution times out, or a function tool call returns an unexpected value, the Azure portal doesn't give you span-level visibility into what went wrong. Here's how to wrap Azure AI Agent runs in Nexus traces and get full observability.

The threads/runs/steps model

Azure AI Agent Service organizes agent execution into three levels:

This model is nearly identical to the OpenAI Assistants API. If you have Assistants observability experience, the patterns carry over directly — the difference is that Azure AI Agent Service runs in your Azure tenant, enforces your Azure RBAC policies, and routes through Azure OpenAI model deployments rather than OpenAI’s API.

Why the Azure portal isn’t enough

Azure Monitor and Application Insights capture infrastructure metrics — request counts, latency percentiles, error rates at the HTTP level. What they don’t give you is span-level visibility into agent execution:

To answer these questions, you need to instrument at the run and step level, not at the HTTP level.

Wrapping agent runs in Nexus traces (Python)

The pattern is straightforward: open a Nexus trace before creating the run, poll for run completion while recording step spans as they appear, then close the trace with the final status and usage metrics.

import os
import time
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
import requests

NEXUS_API_KEY = os.environ["NEXUS_API_KEY"]
NEXUS_BASE = "https://nexus.keylightdigital.dev"

client = AIProjectClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    project_name=os.environ["AZURE_PROJECT_NAME"],
)

def run_agent_with_trace(agent_id: str, thread_id: str, user_message: str) -> dict:
    t0 = time.time()

    # Open a Nexus trace for the full run
    trace_res = requests.post(
        f"{NEXUS_BASE}/api/traces",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "name": f"azure-agent:{agent_id}",
            "input": user_message,
            "metadata": {
                "agent_id": agent_id,
                "thread_id": thread_id,
                "platform": "azure-ai-agent-service",
            },
        },
    )
    trace_id = trace_res.json()["traceId"]

    # Add the user message to the thread
    client.agents.create_message(
        thread_id=thread_id,
        role="user",
        content=user_message,
    )

    # Create the run
    run = client.agents.create_run(agent_id=agent_id, thread_id=thread_id)

    seen_steps: set = set()
    step_span_ids: dict = {}

    # Poll until the run reaches a terminal state
    while run.status in ("queued", "in_progress", "requires_action"):
        time.sleep(0.5)
        run = client.agents.get_run(thread_id=thread_id, run_id=run.id)

        # Open a span for each new step as it appears
        steps = client.agents.list_run_steps(thread_id=thread_id, run_id=run.id)
        for step in steps.data:
            if step.id not in seen_steps:
                seen_steps.add(step.id)
                span_res = requests.post(
                    f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
                    headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                    json={
                        "name": f"step:{step.type}",
                        "input": step.type,
                        "metadata": {"step_id": step.id, "step_type": step.type},
                    },
                )
                step_span_ids[step.id] = span_res.json().get("spanId", "")

        # Submit function tool outputs when required
        if run.status == "requires_action":
            tool_outputs = handle_required_actions(run, trace_id)
            run = client.agents.submit_tool_outputs_to_run(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_outputs,
            )

    # Close all step spans with final status
    for step_id, span_id in step_span_ids.items():
        step = client.agents.get_run_step(
            thread_id=thread_id, run_id=run.id, step_id=step_id
        )
        requests.post(
            f"{NEXUS_BASE}/api/spans/{span_id}/end",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "status": step.status,
                "metadata": {
                    "last_error": str(step.last_error) if step.last_error else None,
                },
            },
        )

    # Extract final answer from thread messages
    messages = client.agents.list_messages(thread_id=thread_id)
    final_answer = ""
    for msg in messages.data:
        if msg.role == "assistant":
            for block in msg.content or []:
                if hasattr(block, "text"):
                    final_answer = block.text.value
                    break
            break

    # Close the parent trace
    requests.post(
        f"{NEXUS_BASE}/api/traces/{trace_id}/end",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "status": "success" if run.status == "completed" else "error",
            "output": final_answer[:500] if final_answer else "",
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {
                "run_id": run.id,
                "run_status": run.status,
                "step_count": len(seen_steps),
                "error_code": run.last_error.code if run.last_error else None,
                "error_message": run.last_error.message if run.last_error else None,
                "usage_prompt_tokens": run.usage.prompt_tokens if run.usage else None,
                "usage_completion_tokens": run.usage.completion_tokens if run.usage else None,
            },
        },
    )

    return {"answer": final_answer, "trace_id": trace_id, "run_status": run.status}

Tracing function tool call steps

Function tool calls are where most Azure AI Agent failures originate. When a run reaches requires_action status, you need to submit outputs for one or more tool_call entries. Recording each call as a child span gives you the diagnostic granularity to see exactly which function failed and what it returned.

def handle_required_actions(run, trace_id: str) -> list:
    """Process requires_action — record each function call as a Nexus span."""
    tool_outputs = []

    if not run.required_action or not run.required_action.submit_tool_outputs:
        return tool_outputs

    for tool_call in run.required_action.submit_tool_outputs.tool_calls:
        t0 = time.time()
        func_name = tool_call.function.name
        func_args = tool_call.function.arguments

        span_res = requests.post(
            f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "name": f"function_call:{func_name}",
                "input": func_args,
                "metadata": {
                    "tool_call_id": tool_call.id,
                    "function_name": func_name,
                    "tool_type": "function",
                },
            },
        )
        span_id = span_res.json().get("spanId", "")

        try:
            output = dispatch_function_call(func_name, func_args)
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "success",
                    "output": str(output)[:500],
                    "latency_ms": int((time.time() - t0) * 1000),
                },
            )
            tool_outputs.append({"tool_call_id": tool_call.id, "output": str(output)})
        except Exception as err:
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "error",
                    "latency_ms": int((time.time() - t0) * 1000),
                    "error": str(err),
                },
            )
            tool_outputs.append({"tool_call_id": tool_call.id, "output": f"error: {err}"})

    return tool_outputs

Tracing code interpreter executions

Code interpreter steps are different from function calls — the agent generates and executes Python code inside a sandboxed environment. You don’t control the execution, but you can record the code the agent generated and the outputs it received from the step details object.

def open_code_interpreter_span(step, trace_id: str) -> str:
    """Open a Nexus span for a code interpreter step and return the span ID."""
    code_input = ""
    code_outputs = []

    if step.step_details and hasattr(step.step_details, "tool_calls"):
        for tc in step.step_details.tool_calls or []:
            if tc.type == "code_interpreter":
                code_input = getattr(tc.code_interpreter, "input", "") or ""
                raw_outputs = getattr(tc.code_interpreter, "outputs", []) or []
                code_outputs = [
                    o.logs if hasattr(o, "logs") else str(o)
                    for o in raw_outputs
                ]

    span_res = requests.post(
        f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "name": "code_interpreter",
            "input": code_input[:1000],
            "metadata": {
                "step_id": step.id,
                "tool_type": "code_interpreter",
                "output_count": len(code_outputs),
            },
        },
    )
    span_id = span_res.json().get("spanId", "")

    if step.status in ("completed", "failed", "cancelled"):
        requests.post(
            f"{NEXUS_BASE}/api/spans/{span_id}/end",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "status": step.status,
                "output": "; ".join(code_outputs)[:500] if code_outputs else "",
                "metadata": {
                    "last_error": str(step.last_error) if step.last_error else None,
                },
            },
        )

    return span_id

TypeScript wrapper for Next.js apps

If your application layer is TypeScript, the same pattern works using the Azure AI SDK for JavaScript:

import { AIProjectsClient } from '@azure/ai-projects'
import { DefaultAzureCredential } from '@azure/identity'

const client = new AIProjectsClient(
  process.env.AZURE_AI_ENDPOINT!,
  new DefaultAzureCredential(),
)

const NEXUS_API_KEY = process.env.NEXUS_API_KEY!
const NEXUS_BASE = 'https://nexus.keylightdigital.dev'

async function runAzureAgent(
  agentId: string,
  threadId: string,
  userMessage: string,
): Promise<{ answer: string; traceId: string }> {
  const t0 = Date.now()

  const traceRes = await fetch(`${NEXUS_BASE}/api/traces`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      name: `azure-agent:${agentId}`,
      input: userMessage,
      metadata: { agent_id: agentId, thread_id: threadId, platform: 'azure-ai-agent-service' },
    }),
  })
  const { traceId } = await traceRes.json()

  await client.agents.createMessage(threadId, { role: 'user', content: userMessage })
  let run = await client.agents.createRun(threadId, { assistantId: agentId })
  const seenSteps = new Set<string>()

  while (['queued', 'in_progress', 'requires_action'].includes(run.status)) {
    await new Promise(r => setTimeout(r, 500))
    run = await client.agents.getRun(threadId, run.id)

    const stepsPage = await client.agents.listRunSteps(threadId, run.id)
    for (const step of stepsPage.data ?? []) {
      if (!seenSteps.has(step.id)) {
        seenSteps.add(step.id)
        await fetch(`${NEXUS_BASE}/api/traces/${traceId}/spans`, {
          method: 'POST',
          headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
          body: JSON.stringify({
            name: `step:${step.type}`,
            input: step.type,
            metadata: { step_id: step.id, step_type: step.type },
          }),
        })
      }
    }
  }

  const messages = await client.agents.listMessages(threadId)
  let answer = ''
  for (const msg of messages.data ?? []) {
    if (msg.role === 'assistant') {
      for (const block of msg.content ?? []) {
        if ('text' in block) { answer = (block as any).text.value; break }
      }
      break
    }
  }

  await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      status: run.status === 'completed' ? 'success' : 'error',
      output: answer.slice(0, 500),
      latency_ms: Date.now() - t0,
      metadata: {
        run_id: run.id,
        run_status: run.status,
        step_count: seenSteps.size,
        error_code: (run as any).lastError?.code ?? null,
      },
    }),
  })

  return { answer, traceId }
}

Azure AI Agents vs raw Azure OpenAI: when each makes sense

If you’re deciding between Azure AI Agent Service and direct Azure OpenAI calls, trace data helps you answer the question empirically:

What to monitor in production

Once traces are flowing, four metrics give you the most actionable signal for Azure AI Agent workloads:

Next steps

Azure AI Agent Service handles thread management, code interpreter sandboxing, and orchestration — but that abstraction makes failures harder to diagnose when they happen. Wrapping runs in Nexus traces, recording step spans as they appear, and instrumenting function tool calls gives you the full timeline from user request to final answer. Sign up for a free Nexus account to start capturing traces from your Azure AI Agents today.

Add observability to Azure AI Agents

Free tier, no credit card required. Full trace visibility in under 5 minutes.