LlamaIndex Observability with Nexus

Add full observability to your LlamaIndex RAG pipelines and agents. Track every retrieval, LLM call, re-ranker, and synthesizer step in a span waterfall — no code changes needed for standard pipelines.

Why use Nexus with LlamaIndex?

✓ Callback-based auto-tracing — plug in one handler to trace the entire pipeline
✓ RAG span visibility — see retrieval, embedding, re-ranking, and synthesis as separate spans
✓ Agent step tracing — track every tool call and LLM turn in ReAct and OpenAI agents
✓ LlamaIndex monitoring — latency, errors, and token usage in one dashboard
✓ No vendor lock-in — works alongside any LLM provider LlamaIndex supports

Step 1 — Install dependencies

LlamaIndex is primarily a Python framework. Install both packages:

pip install keylightdigital-nexus llama-index

Requires Python 3.9+ and llama-index ≥ 0.10.0

Step 2 — Create an API key

Go to /dashboard/keys and create a new API key. Add it to your environment:

export NEXUS_API_KEY="nxs_your_api_key_here"

Step 3 — Configure the callback handler

Nexus integrates with LlamaIndex via the built-in CallbackManager. Configure it once via Settings and all subsequent query engines and agents will emit traces automatically.

from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
import os

# Initialize Nexus client
nexus = NexusClient(
    api_key=os.environ["NEXUS_API_KEY"],
    agent_id="llamaindex-rag-app",
)

# Create the callback handler
nexus_handler = NexusCallbackHandler(nexus_client=nexus)

# Register globally — all LlamaIndex pipelines will use this
Settings.callback_manager = CallbackManager([nexus_handler])

The handler automatically captures LLMStartEvent, LLMEndEvent, RetrieveStartEvent, and RetrieveEndEvent from LlamaIndex's event system.

Step 4 — Trace a query engine

With the callback handler configured, build your index and query engine as normal. Every query automatically produces a trace with retrieval and LLM spans.

from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager
import os

# --- Setup (run once at startup) ---
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="rag-pipeline")
Settings.callback_manager = CallbackManager([NexusCallbackHandler(nexus_client=nexus)])

# --- Index your documents ---
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# --- Query: each call creates a trace automatically ---
query_engine = index.as_query_engine(similarity_top_k=3)

response = query_engine.query("What is the capital of France?")
print(response)

# Nexus dashboard will show:
#   trace: query
#   +-- retrieve (3 nodes fetched, latency, similarity scores)
#   +-- llm-call (prompt tokens, completion tokens, model)
#   +-- synthesize (output text)

Step 5 — Trace a LlamaIndex agent

For agents (ReAct, OpenAI function-calling), wrap the agent run in a manual trace to capture the full multi-step loop alongside the automatic callback spans.

from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.core.tools import FunctionTool
from llama_index.agent.openai import OpenAIAgent
import os

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="llamaindex-agent")
Settings.callback_manager = CallbackManager([NexusCallbackHandler(nexus_client=nexus)])

# --- Define tools ---
def search_docs(query: str) -> str:
    """Search internal documentation for relevant content."""
    return "Found relevant content for: " + query

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely."""
    return "42"

search_tool = FunctionTool.from_defaults(fn=search_docs)
calc_tool = FunctionTool.from_defaults(fn=calculate)
agent = OpenAIAgent.from_tools([search_tool, calc_tool], verbose=True)

# --- Run with a manual trace for the outer agent loop ---
def run_agent(user_query: str) -> str:
    trace = nexus.start_trace(
        name="agent: " + user_query[:60],
        metadata={"agent_type": "openai", "tools": ["search_docs", "calculate"]},
    )
    try:
        result = agent.chat(user_query)
        nexus.end_trace(trace_id=trace["id"], status="success")
        return str(result)
    except Exception as e:
        nexus.end_trace(trace_id=trace["id"], status="error")
        raise

answer = run_agent("How many tokens did our last 10 queries use in total?")
print(answer)

# Nexus dashboard will show:
#   agent: How many tokens... (outer trace)
#   +-- llm-call (initial planning)
#   +-- tool-search_docs
#   +-- llm-call (reasoning)
#   +-- tool-calculate
#   +-- llm-call (final answer)

Step 6 — View traces in Nexus

Run your pipeline and navigate to /dashboard/traces. Each query or agent run appears as a trace. The span waterfall shows retrieval, LLM, and tool calls in sequence — with latency and token counts at a glance.

View demo with sample RAG traces →

More resources

Start monitoring your LlamaIndex pipelines

Free plan: 1,000 traces/month. No credit card needed. Callback handler setup in under 5 minutes.

Start free → View demo