Using Metadata to Make AI Agent Traces Searchable and Debuggable
Most teams record traces but never add metadata. That's a missed opportunity: metadata fields like model version, user ID, environment, and feature flag turn a trace from a raw log into a queryable record. Here's what to capture, how to name it, and how to use it to debug production incidents.
What to capture as metadata
The most useful metadata falls into five categories. Not every trace needs all of them — pick what's relevant to your debugging workflow:
| Field | Example value | Why it matters |
|---|---|---|
| model | gpt-4o-mini | Filter by model to compare costs and latency |
| model_version | 2025-04-01 | Detect regressions across API snapshot versions |
| user_id | usr_8f3k9 | Find all traces for a user reporting a bug |
| environment | production | Filter out staging noise when debugging prod incidents |
| feature_flag | new-prompt-v2 | Compare error rates between A/B variants |
| session_id | sess_abc123 | Group traces from the same multi-turn conversation |
Naming conventions that survive the long run
Metadata key names become search terms. Bad names become permanent technical debt. Three rules:
Use snake_case consistently. Don’t mix userId and user_id across agents — you'll lose the ability to filter across them.
Prefix custom fields. Use app_ or your service name as a prefix for application-specific fields. This prevents collisions with standard fields (model, status) and makes it obvious which fields are yours.
Keep values short and categorical. environment: "production" is useful. environment: "Production server — us-east-1 — ECS task 8f3a" is a log entry that can't be filtered. Save verbose data for the span input/output fields.
Recording metadata in Python
import os
from nexus_sdk import NexusClient
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])
def run_agent(user_id: str, query: str, feature_flag: str = "control") -> str:
trace = nexus.start_trace({
"agent_id": "support-agent",
"name": f"support: {query[:60]}",
"status": "running",
"started_at": nexus.now(),
"metadata": {
"user_id": user_id,
"environment": os.environ.get("APP_ENV", "development"),
"feature_flag": feature_flag,
"model": "gpt-4o",
"model_version": "2025-04-01",
"session_id": f"sess_{user_id[:8]}",
},
})
# ... agent logic ...
nexus.end_trace(trace["trace_id"], {"status": "success"})
return "Agent response here"
Recording metadata in TypeScript
import { NexusClient } from '@nexus/sdk'
const nexus = new NexusClient({ apiKey: process.env.NEXUS_API_KEY! })
async function runAgent(userId: string, query: string, featureFlag = 'control'): Promise<string> {
const trace = await nexus.startTrace({
agentId: 'support-agent',
name: `support: ${query.slice(0, 60)}`,
status: 'running',
startedAt: new Date().toISOString(),
metadata: {
user_id: userId,
environment: process.env.APP_ENV ?? 'development',
feature_flag: featureFlag,
model: 'gpt-4o',
model_version: '2025-04-01',
session_id: `sess_${userId.slice(0, 8)}`,
},
})
// ... agent logic ...
await nexus.endTrace(trace.traceId, { status: 'success' })
return 'Agent response here'
}
Incident debugging workflow with metadata
Here’s the practical workflow when a production incident lands:
Step 1: Filter by environment. Search environment:production to exclude staging traces from the error count. If your incident started at 14:30 UTC, filter to the last 30 minutes.
Step 2: Check if it's model-specific. Filter by model:gpt-4o vs model:gpt-4o-mini. If only one model is erroring, the issue is the model endpoint, not your code.
Step 3: Check if it's feature-flag-specific. If you recently launched a prompt variant, filter by feature_flag:new-prompt-v2. A spike in errors only on the new variant means the prompt regression is real — roll it back.
Step 4: Identify affected users. Filter by the error status and check the user_id metadata on failing traces. If 90% of errors come from a handful of user IDs, the issue is user-specific data, not a platform-wide bug.
Each of these steps takes 30 seconds with metadata filtering. Without it, you're manually reading through hundreds of raw traces looking for patterns — a process that takes hours and still misses edge cases.
The upfront cost is three lines of code when you start a trace. The payoff is every future debugging session you'll ever do on that agent.
Filter traces by metadata in Nexus
Nexus stores metadata alongside traces and lets you filter by key/value in the dashboard. Record model, environment, and user_id on every trace — debug incidents in minutes, not hours.
Start free →