2026-04-18 · 8 min read

How to Add Observability to HuggingFace Smolagents

HuggingFace's Smolagents framework is compact by design — a minimal API for tool-calling and code-executing agents. That minimalism extends to debugging: when a Smolagents run fails, there's almost no built-in visibility. Here's how to add full distributed tracing to CodeAgent and ToolCallingAgent runs with Nexus in under 15 lines of Python.

HuggingFace's Smolagents is intentionally minimal. The whole framework is a few hundred lines. A CodeAgent writes and executes Python to solve tasks; a ToolCallingAgent calls structured tools. Both run multi-step loops until the task is complete or the step budget is exhausted.

That simplicity is the appeal. The downside: when a Smolagents run fails — and they do fail, in interesting ways — you have almost no built-in visibility. Did the code execution fail on step 3 or step 7? Was the tool called with the right arguments? Did the agent retry unnecessarily? You won't know from logs alone.

Distributed tracing fills the gap. Each agent run becomes a trace. Each step, tool call, and code execution becomes a span. You can see exactly what happened, in order, with timing.

Install

pip install smolagents nexus-agent

Basic ToolCallingAgent tracing

The simplest integration wraps the entire agent run in a single Nexus trace. You get start time, end time, final status, and any error — enough to see which runs succeed and which fail at a glance.

from smolagents import ToolCallingAgent, HfApiModel, tool
from nexus_agent import NexusClient
import os

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-assistant",
)

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # Your weather API call here
    return f"Sunny, 22C in {location}"

model = HfApiModel(model_id="Qwen/Qwen2.5-72B-Instruct")
agent = ToolCallingAgent(tools=[get_weather], model=model)

task = "What is the weather in Paris and Tokyo?"

trace = nexus.start_trace(
    name=f"agent: {task[:60]}",
    metadata={"task": task, "model": "Qwen2.5-72B-Instruct"},
)
try:
    result = agent.run(task)
    nexus.end_trace(trace["id"], status="success", output={"result": result})
except Exception as e:
    nexus.end_trace(trace["id"], status="error", error=str(e))
    raise

Step-level spans for CodeAgent

For CodeAgent, the interesting unit is each reasoning step — the code the agent writes and executes. You can add per-step spans by subclassing CodeAgent and overriding step().

from smolagents import CodeAgent, HfApiModel
from nexus_agent import NexusClient
import os

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-code-agent",
)

class TracedCodeAgent(CodeAgent):
    def __init__(self, *args, trace_id=None, **kwargs):
        super().__init__(*args, **kwargs)
        self._trace_id = trace_id
        self._step_count = 0

    def step(self, memory):
        self._step_count += 1
        step_num = self._step_count
        span = nexus.add_span(
            trace_id=self._trace_id,
            name=f"step-{step_num}",
            metadata={"step": step_num},
        ) if self._trace_id else None

        try:
            result = super().step(memory)
            if span:
                nexus.end_span(
                    span["id"],
                    status="success",
                    output={"step": step_num, "completed": True},
                )
            return result
        except Exception as e:
            if span:
                nexus.end_span(span["id"], status="error", error=str(e))
            raise

model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")
task = "Write a Python function to calculate the nth Fibonacci number iteratively."

trace = nexus.start_trace(
    name=f"code-agent: {task[:60]}",
    metadata={"task": task},
)
agent = TracedCodeAgent(
    tools=[],
    model=model,
    trace_id=trace["id"],
)

try:
    result = agent.run(task)
    nexus.end_trace(trace["id"], status="success", output={"result": result})
except Exception as e:
    nexus.end_trace(trace["id"], status="error", error=str(e))
    raise

Tracing tool calls with ToolCallingAgent

For ToolCallingAgent, the high-value spans are individual tool invocations. You can wrap each tool call by decorating the tool's __call__ or by wrapping the tool function itself.

from smolagents import tool
from nexus_agent import NexusClient
import os, functools

nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="smolagents-tool-agent",
)

def traced_tool(trace_id):
    """Decorator that wraps a smolagents tool with a Nexus span."""
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            span = nexus.add_span(
                trace_id=trace_id,
                name=f"tool:{fn.__name__}",
                input={"args": args, "kwargs": kwargs},
                metadata={"tool_name": fn.__name__},
            )
            try:
                result = fn(*args, **kwargs)
                nexus.end_span(span["id"], status="success", output={"result": result})
                return result
            except Exception as e:
                nexus.end_span(span["id"], status="error", error=str(e))
                raise
        return wrapper
    return decorator

# Usage:
trace = nexus.start_trace(name="weather-lookup", metadata={"task": "weather query"})

@tool
@traced_tool(trace["id"])
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"Sunny, 22C in {location}"

What you see in Nexus

Once traces are flowing, the Nexus dashboard shows:

Useful metadata fields for Smolagents

Metadata makes traces searchable and comparable across runs:

nexus.start_trace(
    name=f"agent: {task[:60]}",
    metadata={
        "task": task,
        "model_id": "Qwen/Qwen2.5-72B-Instruct",
        "agent_type": "ToolCallingAgent",   # or CodeAgent
        "step_budget": 10,                  # max_steps passed to agent
        "tool_count": len(tools),
    },
)

With agent_type and model_id in metadata, you can filter traces in the Nexus dashboard to compare CodeAgent vs. ToolCallingAgent runs, or Qwen2.5-72B vs. Llama3.1-70B on the same tasks.

Common Smolagents failure patterns in traces

Step budget exhaustion: The agent hits max_steps without finishing. In the trace, you see N step spans all succeeding but no final answer span. This usually means the task is too broad — break it up.

Code execution errors: For CodeAgent, a step span shows status: error with the Python exception. The agent will often retry — which you'll see as the next step span attempting different code.

Tool input hallucination: A tool span shows an input that doesn't match the schema — the LLM misread the tool description. The tool returns an error or empty result, and the agent's subsequent steps try to work around it. Visible as a tool span with unexpected input followed by additional LLM reasoning steps.

Smolagents' minimalism is a feature. Adding Nexus tracing keeps your debugging just as minimal: install the SDK, wrap the run in a trace, add spans for steps and tool calls. You get full execution visibility without modifying the framework itself.

Add tracing to your Smolagents in minutes

Nexus gives you distributed traces for every agent run — step waterfall, tool call spans, error highlighting. Free tier, no credit card required.

Start free →