Docs › LlamaIndex
Integration Guide
LlamaIndex Observability with Nexus
Add full observability to your LlamaIndex RAG pipelines and agents. Track every retrieval, LLM call, re-ranker, and synthesizer step in a span waterfall — no code changes needed for standard pipelines.
Why use Nexus with LlamaIndex?
- ✓ Callback-based auto-tracing — plug in one handler to trace the entire pipeline
- ✓ RAG span visibility — see retrieval, embedding, re-ranking, and synthesis as separate spans
- ✓ Agent step tracing — track every tool call and LLM turn in ReAct and OpenAI agents
- ✓ LlamaIndex monitoring — latency, errors, and token usage in one dashboard
- ✓ No vendor lock-in — works alongside any LLM provider LlamaIndex supports
Step 1 — Install dependencies
LlamaIndex is primarily a Python framework. Install both packages:
pip install keylightdigital-nexus llama-index
Requires Python 3.9+ and llama-index ≥ 0.10.0
Step 2 — Create an API key
Go to /dashboard/keys and create a new API key. Add it to your environment:
export NEXUS_API_KEY="nxs_your_api_key_here"
Step 3 — Configure the callback handler
Nexus integrates with LlamaIndex via the built-in
CallbackManager.
Configure it once via Settings
and all subsequent query engines and agents will emit traces automatically.
from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
import os
# Initialize Nexus client
nexus = NexusClient(
api_key=os.environ["NEXUS_API_KEY"],
agent_id="llamaindex-rag-app",
)
# Create the callback handler
nexus_handler = NexusCallbackHandler(nexus_client=nexus)
# Register globally — all LlamaIndex pipelines will use this
Settings.callback_manager = CallbackManager([nexus_handler])
The handler automatically captures LLMStartEvent,
LLMEndEvent, RetrieveStartEvent,
and RetrieveEndEvent from LlamaIndex's event system.
Step 4 — Trace a query engine
With the callback handler configured, build your index and query engine as normal. Every query automatically produces a trace with retrieval and LLM spans.
from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.callbacks import CallbackManager
import os
# --- Setup (run once at startup) ---
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="rag-pipeline")
Settings.callback_manager = CallbackManager([NexusCallbackHandler(nexus_client=nexus)])
# --- Index your documents ---
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# --- Query: each call creates a trace automatically ---
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What is the capital of France?")
print(response)
# Nexus dashboard will show:
# trace: query
# +-- retrieve (3 nodes fetched, latency, similarity scores)
# +-- llm-call (prompt tokens, completion tokens, model)
# +-- synthesize (output text)
Step 5 — Trace a LlamaIndex agent
For agents (ReAct, OpenAI function-calling), wrap the agent run in a manual trace to capture the full multi-step loop alongside the automatic callback spans.
from nexus_client import NexusClient, NexusCallbackHandler
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from llama_index.core.tools import FunctionTool
from llama_index.agent.openai import OpenAIAgent
import os
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="llamaindex-agent")
Settings.callback_manager = CallbackManager([NexusCallbackHandler(nexus_client=nexus)])
# --- Define tools ---
def search_docs(query: str) -> str:
"""Search internal documentation for relevant content."""
return "Found relevant content for: " + query
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression safely."""
return "42"
search_tool = FunctionTool.from_defaults(fn=search_docs)
calc_tool = FunctionTool.from_defaults(fn=calculate)
agent = OpenAIAgent.from_tools([search_tool, calc_tool], verbose=True)
# --- Run with a manual trace for the outer agent loop ---
def run_agent(user_query: str) -> str:
trace = nexus.start_trace(
name="agent: " + user_query[:60],
metadata={"agent_type": "openai", "tools": ["search_docs", "calculate"]},
)
try:
result = agent.chat(user_query)
nexus.end_trace(trace_id=trace["id"], status="success")
return str(result)
except Exception as e:
nexus.end_trace(trace_id=trace["id"], status="error")
raise
answer = run_agent("How many tokens did our last 10 queries use in total?")
print(answer)
# Nexus dashboard will show:
# agent: How many tokens... (outer trace)
# +-- llm-call (initial planning)
# +-- tool-search_docs
# +-- llm-call (reasoning)
# +-- tool-calculate
# +-- llm-call (final answer)
Step 6 — View traces in Nexus
Run your pipeline and navigate to /dashboard/traces. Each query or agent run appears as a trace. The span waterfall shows retrieval, LLM, and tool calls in sequence — with latency and token counts at a glance.
View demo with sample RAG traces →More resources
Start monitoring your LlamaIndex pipelines
Free plan: 1,000 traces/month. No credit card needed. Callback handler setup in under 5 minutes.