Docs › DSPy
Integration Guide
DSPy Observability with Nexus
Add production observability to DSPy programs. Trace every module forward pass, monitor optimizer compilations, and track evaluation runs — all in a single dashboard.
Why use Nexus with DSPy?
- ✓ Module tracing — capture every
forward()call with input, output, and latency - ✓ Optimizer observability — monitor BootstrapFewShot and MIPRO compilation time and outcomes
- ✓ Evaluation monitoring — track metric scores, per-example pass/fail, and regression across runs
- ✓ Zero DSPy internals — wraps the public API, no monkey-patching or private hooks needed
- ✓ $9/mo — no enterprise contracts, free plan for prototyping
Step 1 — Install dependencies
DSPy is a Python framework. Install the Nexus Python SDK alongside DSPy:
pip install keylightdigital-nexus dspy-ai
Requires Python 3.9+ and dspy-ai ≥ 2.4.0
Step 2 — Create an API key
Go to /dashboard/keys and create a new API key. Add it to your environment:
export NEXUS_API_KEY="nxs_your_api_key_here"
Step 3 — Configure DSPy and Nexus
Configure your language model and Nexus client at startup. The Nexus client is lightweight and safe to initialize globally — all methods fail silently if the API is unreachable.
import dspy
import os
from nexus_client import NexusClient
# Configure your LM as usual
lm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)
# Initialize Nexus client
nexus = NexusClient(
api_key=os.environ["NEXUS_API_KEY"],
agent_id="dspy-app",
)
Step 4 — Trace DSPy modules
Wrap your dspy.Module.forward()
method with a Nexus trace. Add child spans for each distinct step (retrieval, prediction, tool call).
This gives you a full waterfall view per module run.
import dspy
from nexus_client import NexusClient
import os
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="dspy-rag")
# Define your DSPy signatures
class GenerateAnswer(dspy.Signature):
"""Answer a question given supporting context."""
context: str = dspy.InputField(desc="Relevant passages from the knowledge base")
question: str = dspy.InputField()
answer: str = dspy.OutputField(desc="Concise answer based on the context")
class RAGModule(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought(GenerateAnswer)
def forward(self, question: str) -> dspy.Prediction:
# Create a trace for the full module run
trace = nexus.start_trace(
name="rag: " + question[:60],
metadata={"module": "RAGModule", "retrieve_k": 3},
)
try:
# Span for retrieval step
retrieve_span = nexus.add_span(
trace_id=trace["id"],
name="retrieve",
input={"question": question, "k": 3},
)
passages = self.retrieve(question)
nexus.end_span(
trace_id=trace["id"],
span_id=retrieve_span["span_id"],
output={"num_passages": len(passages.passages)},
status="ok",
)
# Span for generation step
gen_span = nexus.add_span(
trace_id=trace["id"],
name="chain-of-thought",
input={"context_length": sum(len(p) for p in passages.passages)},
)
prediction = self.generate(
context=passages.passages,
question=question,
)
nexus.end_span(
trace_id=trace["id"],
span_id=gen_span["span_id"],
output={"answer_length": len(prediction.answer)},
status="ok",
)
nexus.end_trace(trace_id=trace["id"], status="success")
return prediction
except Exception as e:
nexus.end_trace(trace_id=trace["id"], status="error")
raise
rag = RAGModule()
result = rag("What is DSPy and how does it differ from LangChain?")
print(result.answer)
# Nexus dashboard shows:
# trace: rag: What is DSPy...
# +-- retrieve (3 passages fetched)
# +-- chain-of-thought (answer generated)
Step 5 — Trace optimizer runs
DSPy optimizers (teleprompts) like BootstrapFewShot
and MIPRO can run
for minutes or hours. Wrapping them in a trace lets you compare compilation cost across
different configurations and track regressions.
import dspy
from dspy.teleprompt import BootstrapFewShot
from nexus_client import NexusClient
import os
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="dspy-optimizer")
# Define metric
def answer_exact_match(example, pred, trace=None):
return example.answer.lower() == pred.answer.lower()
# Training data
trainset = [
dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
dspy.Example(question="Who wrote Hamlet?", answer="Shakespeare").with_inputs("question"),
# ... more examples
]
# Trace the entire optimization run
opt_trace = nexus.start_trace(
name="bootstrap-few-shot optimization",
metadata={"metric": "exact_match", "trainset_size": len(trainset), "max_bootstrapped_demos": 3},
)
try:
teleprompter = BootstrapFewShot(metric=answer_exact_match, max_bootstrapped_demos=3)
optimized_program = teleprompter.compile(RAGModule(), trainset=trainset)
# Log a compilation-complete span
nexus.add_span(
trace_id=opt_trace["id"],
name="compilation-complete",
output={"status": "compiled", "demos_generated": 3},
status="ok",
)
nexus.end_trace(trace_id=opt_trace["id"], status="success")
except Exception as e:
nexus.end_trace(trace_id=opt_trace["id"], status="error")
raise
# Save and reuse the optimized program
optimized_program.save("optimized_rag.json")
# Nexus dashboard shows:
# trace: bootstrap-few-shot optimization
# +-- compilation-complete
Step 6 — Monitor evaluations
Trace dspy.Evaluate calls
to track metric scores over time. Compare pre-optimization vs post-optimization performance
in the Nexus dashboard without digging through terminal output.
import dspy
from nexus_client import NexusClient
import os
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"], agent_id="dspy-evaluator")
# Evaluation dataset
devset = [
dspy.Example(question="What is RAG?", answer="Retrieval-Augmented Generation").with_inputs("question"),
dspy.Example(question="Define gradient descent", answer="An optimization algorithm").with_inputs("question"),
# ... more examples
]
def answer_f1(example, pred, trace=None):
pred_tokens = set(pred.answer.lower().split())
gold_tokens = set(example.answer.lower().split())
if not pred_tokens or not gold_tokens:
return 0.0
precision = len(pred_tokens & gold_tokens) / len(pred_tokens)
recall = len(pred_tokens & gold_tokens) / len(gold_tokens)
return (2 * precision * recall / (precision + recall)) if (precision + recall) > 0 else 0.0
# Trace the evaluation run
eval_trace = nexus.start_trace(
name="evaluation run",
metadata={"metric": "f1", "devset_size": len(devset), "program": "RAGModule"},
)
evaluate = dspy.Evaluate(devset=devset, metric=answer_f1, num_threads=4)
score = evaluate(optimized_program)
nexus.add_span(
trace_id=eval_trace["id"],
name="evaluation-result",
output={"f1_score": score, "devset_size": len(devset)},
status="ok",
)
nexus.end_trace(trace_id=eval_trace["id"], status="success")
print(f"Evaluation score: {score:.2f}")
# Nexus dashboard shows:
# trace: evaluation run
# +-- evaluation-result (f1_score: 0.78)
Step 7 — View traces in Nexus
Navigate to /dashboard/traces to see every module run, optimizer compilation, and evaluation as a trace with span waterfall, latency breakdown, and status indicators.
View demo with sample traces →More resources
Start monitoring your DSPy programs
Free plan: 1,000 traces/month. No credit card needed. Add tracing in under 10 minutes.