How to Add Observability to HuggingFace Smolagents

HuggingFace's Smolagents framework is compact by design — a minimal API for tool-calling and code-executing agents. That minimalism extends to debugging: when a Smolagents run fails, there's almost no built-in visibility. Here's how to add full distributed tracing to CodeAgent and ToolCallingAgent runs with Nexus in under 15 lines of Python.

HuggingFace's Smolagents is intentionally minimal. The whole framework is a few hundred lines. A CodeAgent writes and executes Python to solve tasks; a ToolCallingAgent calls structured tools. Both run multi-step loops until the task is complete or the step budget is exhausted.

That simplicity is the appeal. The downside: when a Smolagents run fails — and they do fail, in interesting ways — you have almost no built-in visibility. Did the code execution fail on step 3 or step 7? Was the tool called with the right arguments? Did the agent retry unnecessarily? You won't know from logs alone.

Distributed tracing fills the gap. Each agent run becomes a trace. Each step, tool call, and code execution becomes a span. You can see exactly what happened, in order, with timing.

Install

pip install smolagents nexus-agent

Basic ToolCallingAgent tracing

The simplest integration wraps the entire agent run in a single Nexus trace. You get start time, end time, final status, and any error — enough to see which runs succeed and which fail at a glance.

from smolagents import ToolCallingAgent, HfApiModel, tool
from nexus_agent import NexusClient
import os

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-assistant",
)

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # Your weather API call here
    return f"Sunny, 22C in {location}"

model = HfApiModel(model_id="Qwen/Qwen2.5-72B-Instruct")
agent = ToolCallingAgent(tools=[get_weather], model=model)

task = "What is the weather in Paris and Tokyo?"

trace = nexus.start_trace(
    name=f"agent: {task[:60]}",
    metadata={"task": task, "model": "Qwen2.5-72B-Instruct"},
)
try:
    result = agent.run(task)
    nexus.end_trace(trace["id"], status="success", output={"result": result})
except Exception as e:
    nexus.end_trace(trace["id"], status="error", error=str(e))
    raise

Step-level spans for CodeAgent

For CodeAgent, the interesting unit is each reasoning step — the code the agent writes and executes. You can add per-step spans by subclassing CodeAgent and overriding step().

from smolagents import CodeAgent, HfApiModel
from nexus_agent import NexusClient
import os

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-code-agent",
)

class TracedCodeAgent(CodeAgent):
    def __init__(self, *args, trace_id=None, **kwargs):
        super().__init__(*args, **kwargs)
        self._trace_id = trace_id
        self._step_count = 0

    def step(self, memory):
        self._step_count += 1
        step_num = self._step_count
        span = nexus.add_span(
            trace_id=self._trace_id,
            name=f"step-{step_num}",
            metadata={"step": step_num},
        ) if self._trace_id else None

        try:
            result = super().step(memory)
            if span:
                nexus.end_span(
                    span["id"],
                    status="success",
                    output={"step": step_num, "completed": True},
                )
            return result
        except Exception as e:
            if span:
                nexus.end_span(span["id"], status="error", error=str(e))
            raise

model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")
task = "Write a Python function to calculate the nth Fibonacci number iteratively."

trace = nexus.start_trace(
    name=f"code-agent: {task[:60]}",
    metadata={"task": task},
)
agent = TracedCodeAgent(
    tools=[],
    model=model,
    trace_id=trace["id"],
)

try:
    result = agent.run(task)
    nexus.end_trace(trace["id"], status="success", output={"result": result})
except Exception as e:
    nexus.end_trace(trace["id"], status="error", error=str(e))
    raise

Tracing tool calls with ToolCallingAgent

For ToolCallingAgent, the high-value spans are individual tool invocations. You can wrap each tool call by decorating the tool's __call__ or by wrapping the tool function itself.

from smolagents import tool
from nexus_agent import NexusClient
import os, functools

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-tool-agent",
)

def traced_tool(trace_id):
    """Decorator that wraps a smolagents tool with a Nexus span."""
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            span = nexus.add_span(
                trace_id=trace_id,
                name=f"tool:{fn.__name__}",
                input={"args": args, "kwargs": kwargs},
                metadata={"tool_name": fn.__name__},
            )
            try:
                result = fn(*args, **kwargs)
                nexus.end_span(span["id"], status="success", output={"result": result})
                return result
            except Exception as e:
                nexus.end_span(span["id"], status="error", error=str(e))
                raise
        return wrapper
    return decorator

# Usage:
trace = nexus.start_trace(name="weather-lookup", metadata={"task": "weather query"})

@tool
@traced_tool(trace["id"])
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"Sunny, 22C in {location}"

What you see in Nexus

Once traces are flowing, the Nexus dashboard shows:

Trace list — every agent run, with status, duration, and start time
Step waterfall — each step or tool call as a span, with timing and nesting
Error highlighting — failed steps turn red in the waterfall; click to see the exception
Agent health cards — 7-day trace volume and error rate per agent on the overview page

Useful metadata fields for Smolagents

Metadata makes traces searchable and comparable across runs:

nexus.start_trace(
    name=f"agent: {task[:60]}",
    metadata={
        "task": task,
        "model_id": "Qwen/Qwen2.5-72B-Instruct",
        "agent_type": "ToolCallingAgent",   # or CodeAgent
        "step_budget": 10,                  # max_steps passed to agent
        "tool_count": len(tools),
    },
)

With agent_type and model_id in metadata, you can filter traces in the Nexus dashboard to compare CodeAgent vs. ToolCallingAgent runs, or Qwen2.5-72B vs. Llama3.1-70B on the same tasks.

Common Smolagents failure patterns in traces

Step budget exhaustion: The agent hits max_steps without finishing. In the trace, you see N step spans all succeeding but no final answer span. This usually means the task is too broad — break it up.

Code execution errors: For CodeAgent, a step span shows status: error with the Python exception. The agent will often retry — which you'll see as the next step span attempting different code.

Tool input hallucination: A tool span shows an input that doesn't match the schema — the LLM misread the tool description. The tool returns an error or empty result, and the agent's subsequent steps try to work around it. Visible as a tool span with unexpected input followed by additional LLM reasoning steps.

Smolagents' minimalism is a feature. Adding Nexus tracing keeps your debugging just as minimal: install the SDK, wrap the run in a trace, add spans for steps and tool calls. You get full execution visibility without modifying the framework itself.