Monitoring Azure AI Agent Service: Tracing Threads, Runs, and Tool Call Steps

Azure AI Agent Service is Microsoft's managed agent runtime built on the same threads/runs/steps model as OpenAI Assistants. When a run fails silently, a code interpreter execution times out, or a function tool call returns an unexpected value, the Azure portal doesn't give you span-level visibility into what went wrong. Here's how to wrap Azure AI Agent runs in Nexus traces and get full observability.

The threads/runs/steps model

Azure AI Agent Service organizes agent execution into three levels:

Threads persist conversation context. A thread holds the message history for a user session and can be resumed across multiple runs.
Runs execute the agent against a thread. Each run invokes the underlying model with the current thread context and a set of tools. A run has a status field (queued, in_progress, completed, failed, cancelled, expired) and a last_error field when it fails.
Steps are the individual actions within a run: tool_calls (function calls, code interpreter, file search, Azure AI Search) and message_creation (final response generation). Steps carry their own status and error fields.

This model is nearly identical to the OpenAI Assistants API. If you have Assistants observability experience, the patterns carry over directly — the difference is that Azure AI Agent Service runs in your Azure tenant, enforces your Azure RBAC policies, and routes through Azure OpenAI model deployments rather than OpenAI’s API.

Why the Azure portal isn’t enough

Azure Monitor and Application Insights capture infrastructure metrics — request counts, latency percentiles, error rates at the HTTP level. What they don’t give you is span-level visibility into agent execution:

Which step in a multi-step run caused the failure?
Did the code interpreter time out, or did it complete but return an unexpected output format?
Which function tool call returned an error the agent silently swallowed?
How many runs completed successfully versus failed with rate_limit_exceeded this hour?

To answer these questions, you need to instrument at the run and step level, not at the HTTP level.

Wrapping agent runs in Nexus traces (Python)

The pattern is straightforward: open a Nexus trace before creating the run, poll for run completion while recording step spans as they appear, then close the trace with the final status and usage metrics.

import os
import time
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
import requests

NEXUS_API_KEY = os.environ["NEXUS_API_KEY"]
NEXUS_BASE = "https://nexus.keylightdigital.dev"

client = AIProjectClient(
    credential=DefaultAzureCredential(),
    subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"],
    resource_group_name=os.environ["AZURE_RESOURCE_GROUP"],
    project_name=os.environ["AZURE_PROJECT_NAME"],
)

def run_agent_with_trace(agent_id: str, thread_id: str, user_message: str) -> dict:
    t0 = time.time()

    # Open a Nexus trace for the full run
    trace_res = requests.post(
        f"{NEXUS_BASE}/api/traces",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "name": f"azure-agent:{agent_id}",
            "input": user_message,
            "metadata": {
                "agent_id": agent_id,
                "thread_id": thread_id,
                "platform": "azure-ai-agent-service",
            },
        },
    )
    trace_id = trace_res.json()["traceId"]

    # Add the user message to the thread
    client.agents.create_message(
        thread_id=thread_id,
        role="user",
        content=user_message,
    )

    # Create the run
    run = client.agents.create_run(agent_id=agent_id, thread_id=thread_id)

    seen_steps: set = set()
    step_span_ids: dict = {}

    # Poll until the run reaches a terminal state
    while run.status in ("queued", "in_progress", "requires_action"):
        time.sleep(0.5)
        run = client.agents.get_run(thread_id=thread_id, run_id=run.id)

        # Open a span for each new step as it appears
        steps = client.agents.list_run_steps(thread_id=thread_id, run_id=run.id)
        for step in steps.data:
            if step.id not in seen_steps:
                seen_steps.add(step.id)
                span_res = requests.post(
                    f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
                    headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                    json={
                        "name": f"step:{step.type}",
                        "input": step.type,
                        "metadata": {"step_id": step.id, "step_type": step.type},
                    },
                )
                step_span_ids[step.id] = span_res.json().get("spanId", "")

        # Submit function tool outputs when required
        if run.status == "requires_action":
            tool_outputs = handle_required_actions(run, trace_id)
            run = client.agents.submit_tool_outputs_to_run(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_outputs,
            )

    # Close all step spans with final status
    for step_id, span_id in step_span_ids.items():
        step = client.agents.get_run_step(
            thread_id=thread_id, run_id=run.id, step_id=step_id
        )
        requests.post(
            f"{NEXUS_BASE}/api/spans/{span_id}/end",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "status": step.status,
                "metadata": {
                    "last_error": str(step.last_error) if step.last_error else None,
                },
            },
        )

    # Extract final answer from thread messages
    messages = client.agents.list_messages(thread_id=thread_id)
    final_answer = ""
    for msg in messages.data:
        if msg.role == "assistant":
            for block in msg.content or []:
                if hasattr(block, "text"):
                    final_answer = block.text.value
                    break
            break

    # Close the parent trace
    requests.post(
        f"{NEXUS_BASE}/api/traces/{trace_id}/end",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "status": "success" if run.status == "completed" else "error",
            "output": final_answer[:500] if final_answer else "",
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {
                "run_id": run.id,
                "run_status": run.status,
                "step_count": len(seen_steps),
                "error_code": run.last_error.code if run.last_error else None,
                "error_message": run.last_error.message if run.last_error else None,
                "usage_prompt_tokens": run.usage.prompt_tokens if run.usage else None,
                "usage_completion_tokens": run.usage.completion_tokens if run.usage else None,
            },
        },
    )

    return {"answer": final_answer, "trace_id": trace_id, "run_status": run.status}

Tracing function tool call steps

Function tool calls are where most Azure AI Agent failures originate. When a run reaches requires_action status, you need to submit outputs for one or more tool_call entries. Recording each call as a child span gives you the diagnostic granularity to see exactly which function failed and what it returned.

def handle_required_actions(run, trace_id: str) -> list:
    """Process requires_action — record each function call as a Nexus span."""
    tool_outputs = []

    if not run.required_action or not run.required_action.submit_tool_outputs:
        return tool_outputs

    for tool_call in run.required_action.submit_tool_outputs.tool_calls:
        t0 = time.time()
        func_name = tool_call.function.name
        func_args = tool_call.function.arguments

        span_res = requests.post(
            f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "name": f"function_call:{func_name}",
                "input": func_args,
                "metadata": {
                    "tool_call_id": tool_call.id,
                    "function_name": func_name,
                    "tool_type": "function",
                },
            },
        )
        span_id = span_res.json().get("spanId", "")

        try:
            output = dispatch_function_call(func_name, func_args)
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "success",
                    "output": str(output)[:500],
                    "latency_ms": int((time.time() - t0) * 1000),
                },
            )
            tool_outputs.append({"tool_call_id": tool_call.id, "output": str(output)})
        except Exception as err:
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "error",
                    "latency_ms": int((time.time() - t0) * 1000),
                    "error": str(err),
                },
            )
            tool_outputs.append({"tool_call_id": tool_call.id, "output": f"error: {err}"})

    return tool_outputs

Tracing code interpreter executions

Code interpreter steps are different from function calls — the agent generates and executes Python code inside a sandboxed environment. You don’t control the execution, but you can record the code the agent generated and the outputs it received from the step details object.

def open_code_interpreter_span(step, trace_id: str) -> str:
    """Open a Nexus span for a code interpreter step and return the span ID."""
    code_input = ""
    code_outputs = []

    if step.step_details and hasattr(step.step_details, "tool_calls"):
        for tc in step.step_details.tool_calls or []:
            if tc.type == "code_interpreter":
                code_input = getattr(tc.code_interpreter, "input", "") or ""
                raw_outputs = getattr(tc.code_interpreter, "outputs", []) or []
                code_outputs = [
                    o.logs if hasattr(o, "logs") else str(o)
                    for o in raw_outputs
                ]

    span_res = requests.post(
        f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "name": "code_interpreter",
            "input": code_input[:1000],
            "metadata": {
                "step_id": step.id,
                "tool_type": "code_interpreter",
                "output_count": len(code_outputs),
            },
        },
    )
    span_id = span_res.json().get("spanId", "")

    if step.status in ("completed", "failed", "cancelled"):
        requests.post(
            f"{NEXUS_BASE}/api/spans/{span_id}/end",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "status": step.status,
                "output": "; ".join(code_outputs)[:500] if code_outputs else "",
                "metadata": {
                    "last_error": str(step.last_error) if step.last_error else None,
                },
            },
        )

    return span_id

TypeScript wrapper for Next.js apps

If your application layer is TypeScript, the same pattern works using the Azure AI SDK for JavaScript:

import { AIProjectsClient } from '@azure/ai-projects'
import { DefaultAzureCredential } from '@azure/identity'

const client = new AIProjectsClient(
  process.env.AZURE_AI_ENDPOINT!,
  new DefaultAzureCredential(),
)

const NEXUS_API_KEY = process.env.NEXUS_API_KEY!
const NEXUS_BASE = 'https://nexus.keylightdigital.dev'

async function runAzureAgent(
  agentId: string,
  threadId: string,
  userMessage: string,
): Promise<{ answer: string; traceId: string }> {
  const t0 = Date.now()

  const traceRes = await fetch(`${NEXUS_BASE}/api/traces`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      name: `azure-agent:${agentId}`,
      input: userMessage,
      metadata: { agent_id: agentId, thread_id: threadId, platform: 'azure-ai-agent-service' },
    }),
  })
  const { traceId } = await traceRes.json()

  await client.agents.createMessage(threadId, { role: 'user', content: userMessage })
  let run = await client.agents.createRun(threadId, { assistantId: agentId })
  const seenSteps = new Set<string>()

  while (['queued', 'in_progress', 'requires_action'].includes(run.status)) {
    await new Promise(r => setTimeout(r, 500))
    run = await client.agents.getRun(threadId, run.id)

    const stepsPage = await client.agents.listRunSteps(threadId, run.id)
    for (const step of stepsPage.data ?? []) {
      if (!seenSteps.has(step.id)) {
        seenSteps.add(step.id)
        await fetch(`${NEXUS_BASE}/api/traces/${traceId}/spans`, {
          method: 'POST',
          headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
          body: JSON.stringify({
            name: `step:${step.type}`,
            input: step.type,
            metadata: { step_id: step.id, step_type: step.type },
          }),
        })
      }
    }
  }

  const messages = await client.agents.listMessages(threadId)
  let answer = ''
  for (const msg of messages.data ?? []) {
    if (msg.role === 'assistant') {
      for (const block of msg.content ?? []) {
        if ('text' in block) { answer = (block as any).text.value; break }
      }
      break
    }
  }

  await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      status: run.status === 'completed' ? 'success' : 'error',
      output: answer.slice(0, 500),
      latency_ms: Date.now() - t0,
      metadata: {
        run_id: run.id,
        run_status: run.status,
        step_count: seenSteps.size,
        error_code: (run as any).lastError?.code ?? null,
      },
    }),
  })

  return { answer, traceId }
}

Azure AI Agents vs raw Azure OpenAI: when each makes sense

If you’re deciding between Azure AI Agent Service and direct Azure OpenAI calls, trace data helps you answer the question empirically:

Use Azure AI Agent Service when you need persistent thread context across sessions, built-in file search over uploaded documents, code interpreter for data analysis, or when you want Microsoft to manage the orchestration loop. The managed runtime reduces code you maintain but adds latency per step.
Use raw Azure OpenAI when your agent logic is simple (single-turn or stateless), you need sub-500ms response times, or you want full control over prompt construction and tool dispatch. You write more orchestration code but eliminate the runs/steps round-trip overhead.
The signal in your traces: If step_count is consistently 1 (one message_creation step, no tool calls), you’re paying the Agent Service overhead without using its capabilities — that’s a strong signal to simplify to direct API calls.

What to monitor in production

Once traces are flowing, four metrics give you the most actionable signal for Azure AI Agent workloads:

Run failure rate by error code: Break down status: error traces by error_code. Rate limit failures call for quota increases or retry logic; context length failures call for thread pruning or summarization.
Mean step count per run: A healthy single-function-call agent should average 2 steps (one tool call + one message creation). If mean step count climbs above 4, the agent is struggling to complete tasks in one loop — usually a function output format issue or schema mismatch.
Function tool call error rate: Filter spans where name starts with function_call: and status: error. High error rates on a specific function name point directly to a broken handler.
P95 run latency: Azure AI Agent Service adds 1–3s of managed infrastructure overhead per run. A P95 above 15s for a simple agent usually indicates a cold start on your function backend, a code interpreter timeout, or a thread that’s grown too large for the context window.

Next steps

Azure AI Agent Service handles thread management, code interpreter sandboxing, and orchestration — but that abstraction makes failures harder to diagnose when they happen. Wrapping runs in Nexus traces, recording step spans as they appear, and instrumenting function tool calls gives you the full timeline from user request to final answer. Sign up for a free Nexus account to start capturing traces from your Azure AI Agents today.