2026-04-18 · 8 min read

Debugging LangGraph Agents: Tracing Node Execution and State Transitions

LangGraph makes it easy to build stateful, cyclic agent workflows — and equally easy to build ones that infinite-loop, route incorrectly, or corrupt state silently. Here's how distributed tracing surfaces each failure mode and how to instrument LangGraph StateGraph nodes with Nexus spans.

What LangGraph adds — and where it breaks

LangGraph extends LangChain with a graph-based execution model. Instead of a linear chain, you define a StateGraph where nodes are Python functions and edges are routing rules. This enables cycles — agents that loop until a condition is met — which are essential for tool-use and self-reflection patterns.

That power comes with failure modes that don't exist in linear chains:

All of these are invisible without per-node spans. Here's how to add them.

Instrumenting a LangGraph StateGraph

The key insight is that every LangGraph node is just a Python function that takes state and returns updated state. Wrap each node in a Nexus span:

import os
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

class AgentState(TypedDict):
    messages: list[dict]
    next_action: str
    tool_result: str | None

def node_span(trace_id: str, node_name: str):
    """Decorator that wraps a LangGraph node in a Nexus span."""
    def decorator(fn):
        def wrapper(state: AgentState) -> AgentState:
            span = nexus.start_span(trace_id, {
                "name": f"node:{node_name}",
                "type": "llm" if "llm" in node_name else "tool",
                "metadata": {
                    "node": node_name,
                    "input_message_count": len(state["messages"]),
                    "next_action_before": state.get("next_action"),
                },
            })
            try:
                result = fn(state)
                nexus.end_span(span["id"], {
                    "output": str(result.get("next_action", "")),
                    "metadata": {
                        "node": node_name,
                        "next_action_after": result.get("next_action"),
                        "output_message_count": len(result.get("messages", state["messages"])),
                    },
                })
                return result
            except Exception as e:
                nexus.end_span(span["id"], {"error": str(e)})
                raise
        return wrapper
    return decorator

Building a traced StateGraph

from openai import OpenAI
from langgraph.graph import StateGraph, END

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def run_agent_with_tracing(user_query: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "langgraph-research-agent",
        "name": f"query: {user_query[:60]}",
        "status": "running",
        "started_at": nexus.now(),
    })
    trace_id = trace["trace_id"]

    try:
        # Wrap nodes with tracing
        @node_span(trace_id, "llm_router")
        def router_node(state: AgentState) -> AgentState:
            response = openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "Decide: 'search', 'answer', or 'clarify'"},
                    *state["messages"],
                ],
            )
            action = response.choices[0].message.content.strip().lower()
            return {**state, "next_action": action}

        @node_span(trace_id, "tool_search")
        def search_node(state: AgentState) -> AgentState:
            # Simulate a search tool
            result = f"Search results for: {state['messages'][-1]['content']}"
            return {
                **state,
                "tool_result": result,
                "messages": [*state["messages"], {"role": "tool", "content": result}],
            }

        @node_span(trace_id, "llm_answer")
        def answer_node(state: AgentState) -> AgentState:
            response = openai_client.chat.completions.create(
                model="gpt-4o",
                messages=state["messages"],
            )
            answer = response.choices[0].message.content
            return {
                **state,
                "messages": [*state["messages"], {"role": "assistant", "content": answer}],
            }

        def route_from_router(state: AgentState) -> str:
            action = state.get("next_action", "answer")
            if action == "search":
                return "search"
            return "answer"

        # Build graph
        graph = StateGraph(AgentState)
        graph.add_node("router", router_node)
        graph.add_node("search", search_node)
        graph.add_node("answer", answer_node)

        graph.set_entry_point("router")
        graph.add_conditional_edges("router", route_from_router, {
            "search": "search",
            "answer": "answer",
        })
        graph.add_edge("search", "answer")
        graph.add_edge("answer", END)

        app = graph.compile()
        initial_state: AgentState = {
            "messages": [{"role": "user", "content": user_query}],
            "next_action": "",
            "tool_result": None,
        }
        final_state = app.invoke(initial_state)
        final_answer = final_state["messages"][-1]["content"]

        nexus.end_trace(trace_id, {"status": "success"})
        return final_answer

    except Exception as e:
        nexus.end_trace(trace_id, {"status": "error", "metadata": {"error": str(e)}})
        raise

What to look for in the trace

Detecting infinite loops: If the same node name appears 10+ times in the span waterfall, you have a cycle. Check the router node's next_action_after metadata — it'll show the same value repeating, which means the conditional edge is stuck.

Detecting wrong routing: Compare next_action_after values across router spans for successful vs failed runs. If the router outputs "search" when the input clearly needs "answer", the system prompt or the output parser has a bug.

Detecting state corruption: Log input_message_count and output_message_count on each node. If a node shows fewer messages out than in, it's overwriting instead of appending — a common TypedDict merge mistake.

Detecting tool failures: Tool node spans that end with an error field are tool failures. If the span shows no error but the tool_result metadata is empty or None, the tool silently returned nothing — add a validation check inside the tool node.

Adding three metadata fields per node — the node name, input state shape, and output action — gives you a complete picture of how your LangGraph runs succeed and fail. Five minutes of instrumentation saves hours of print-statement debugging.

Debug LangGraph agents with Nexus

Nexus stores span metadata alongside traces, giving you per-node visibility into your LangGraph execution. Free tier, no credit card required.

Start free →