Observability for AWS Bedrock Agents: Tracing InvokeAgent, Action Groups, and Knowledge Bases
AWS Bedrock Agents orchestrate multi-step tasks using action groups (Lambda functions) and knowledge bases (RAG retrieval). When an action group Lambda throws silently, a knowledge base returns zero chunks, or the agent loops unexpectedly, Bedrock's built-in logs don't tell you which step failed or why. Here's how to add full trace observability to Bedrock Agents using Nexus.
What AWS Bedrock Agents are
AWS Bedrock Agents are managed orchestrators that break a user request into sub-tasks, call action groups (Lambda functions) to execute those tasks, query knowledge bases (OpenSearch-backed RAG retrieval), and synthesize a final response — all without you writing orchestration logic.
The architecture has three moving parts:
- Orchestration layer: The Bedrock managed agent model (Claude or Titan) that decides which action group to call next and when retrieval is needed.
- Action groups: Lambda functions that implement specific capabilities — booking a flight, looking up an order, calling an external API. The agent passes parameters; the Lambda returns a structured response.
- Knowledge bases: Embeddings stored in OpenSearch that the agent queries for factual grounding. A knowledge base lookup returns one or more chunks that the orchestrator folds into its reasoning.
The problem: none of these layers expose unified trace data. CloudWatch logs show Lambda execution separately from Bedrock invocation logs. There is no built-in way to correlate a single user request through orchestration, action group calls, and knowledge base lookups into one timeline.
The three silent failure modes
Before adding observability, it helps to know what you are actually trying to catch. Bedrock Agents fail in three ways that are invisible without tracing:
- Action group errors the agent hides: A Lambda throws an exception or returns a malformed response. The agent reinterprets this as a negative result and generates a plausible-sounding "I wasn’t able to complete that" instead of surfacing the error. From the user’s perspective, the agent just didn’t work.
- Zero-chunk knowledge base lookups: The agent queries a knowledge base and gets back zero chunks. It then answers from training data, which may be stale or incorrect. The response looks confident; there is no indication that retrieval failed.
- Orchestration loops: The agent makes the same action group call multiple times with the same parameters before giving up. This inflates cost and latency without producing a better answer. It is only visible in the step trace, which is buried in the InvokeAgent response stream.
Wrapping InvokeAgent in a Nexus trace
The foundation of Bedrock Agent observability is a thin wrapper around invoke_agent that creates a parent trace, streams the response, collects step traces, and records the outcome.
import boto3
import requests
import time
import os
bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
NEXUS_API_KEY = os.environ['NEXUS_API_KEY']
NEXUS_BASE = 'https://nexus.keylightdigital.dev'
def invoke_bedrock_agent(agent_id: str, alias_id: str, session_id: str, user_input: str) -> dict:
t0 = time.time()
trace_id = requests.post(
f'{NEXUS_BASE}/api/traces',
json={
'name': f'bedrock-agent:{agent_id}',
'input': user_input,
'metadata': {
'agent_id': agent_id,
'alias_id': alias_id,
'session_id': session_id,
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
).json()['traceId']
try:
response = bedrock_agent.invoke_agent(
agentId=agent_id,
agentAliasId=alias_id,
sessionId=session_id,
inputText=user_input,
enableTrace=True,
)
final_answer = ''
action_group_calls = 0
kb_lookups = 0
orchestration_steps = 0
for event in response['completion']:
if 'chunk' in event:
final_answer += event['chunk']['bytes'].decode('utf-8')
if 'trace' in event:
trace_data = event['trace'].get('trace', {})
_record_trace_step(trace_id, trace_data)
if 'orchestrationTrace' in trace_data:
orchestration_steps += 1
orch = trace_data['orchestrationTrace']
if 'invocationInput' in orch:
inv = orch['invocationInput']
if inv.get('invocationType') == 'ACTION_GROUP':
action_group_calls += 1
elif inv.get('invocationType') == 'KNOWLEDGE_BASE':
kb_lookups += 1
requests.post(
f'{NEXUS_BASE}/api/traces/{trace_id}/end',
json={
'status': 'success',
'output': final_answer[:500],
'latency_ms': int((time.time() - t0) * 1000),
'metadata': {
'action_group_calls': action_group_calls,
'kb_lookups': kb_lookups,
'orchestration_steps': orchestration_steps,
'answer_length': len(final_answer),
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
return {'answer': final_answer, 'trace_id': trace_id}
except Exception as e:
requests.post(
f'{NEXUS_BASE}/api/traces/{trace_id}/end',
json={
'status': 'error',
'latency_ms': int((time.time() - t0) * 1000),
'error': str(e),
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
raise
The key parameter is enableTrace=True. Without it, the response stream contains only final chunks and no step data. With it, each event in the completion stream may include a trace field that describes what the orchestrator just did.
Recording action group spans
Action group invocations are the highest-signal events in a Bedrock trace. Each invocation has an input (the parameters the orchestrator chose to pass), and an observation (the Lambda’s response). These should be separate spans so you can see latency and errors at the action group level.
def _record_trace_step(parent_trace_id: str, trace_data: dict):
"""Parse a single Bedrock trace event and record it as a Nexus span."""
if 'orchestrationTrace' not in trace_data:
return
orch = trace_data['orchestrationTrace']
# Action group invocation input — a Lambda is about to be called
if 'invocationInput' in orch:
inv = orch['invocationInput']
if inv.get('invocationType') == 'ACTION_GROUP':
ag = inv.get('actionGroupInvocationInput', {})
requests.post(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans',
json={
'name': f"action-group:{ag.get('actionGroupName', 'unknown')}",
'input': str(ag.get('parameters', {})),
'metadata': {
'action_group': ag.get('actionGroupName'),
'api_path': ag.get('apiPath'),
'http_method': ag.get('verb'),
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
# Action group observation — the Lambda responded
if 'observation' in orch:
obs = orch['observation']
if obs.get('type') == 'ACTION_GROUP':
ag_result = obs.get('actionGroupInvocationOutput', {})
raw_text = ag_result.get('text', '')
# Treat non-2xx or error keywords as failures
is_error = any(kw in raw_text.lower() for kw in ['error', 'exception', 'failed', 'not found'])
requests.post(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans',
json={
'name': 'action-group:response',
'output': raw_text[:300],
'metadata': {
'status': 'error' if is_error else 'success',
'response_length': len(raw_text),
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
The error detection heuristic in the observation handler is important. Bedrock Agents do not propagate Lambda exceptions to the caller — they convert them to string observations. Scanning the observation text for error keywords is the only way to catch silent action group failures from outside the Lambda.
Tracing knowledge base lookups
Knowledge base retrievals appear in the trace stream similarly to action group calls. The invocation contains the query the orchestrator chose; the observation contains the retrieved chunks and their source locations.
# Knowledge base lookup input
if 'invocationInput' in orch:
inv = orch['invocationInput']
if inv.get('invocationType') == 'KNOWLEDGE_BASE':
kb = inv.get('knowledgeBaseLookupInput', {})
requests.post(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans',
json={
'name': f"kb-lookup:{kb.get('knowledgeBaseId', 'unknown')}",
'input': kb.get('text', ''),
'metadata': {
'knowledge_base_id': kb.get('knowledgeBaseId'),
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
# Knowledge base observation — chunks returned
if 'observation' in orch:
obs = orch['observation']
if obs.get('type') == 'KNOWLEDGE_BASE':
kb_output = obs.get('knowledgeBaseLookupOutput', {})
retrieved = kb_output.get('retrievedReferences', [])
chunk_count = len(retrieved)
sources = list({ref.get('location', {}).get('s3Location', {}).get('uri', '')
for ref in retrieved if ref.get('location')})
requests.post(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans',
json={
'name': 'kb-lookup:response',
'output': f'{chunk_count} chunks retrieved',
'metadata': {
'chunk_count': chunk_count,
'zero_retrieval': chunk_count == 0,
'source_uris': sources[:5],
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
The zero_retrieval: true flag is the metric to alert on. When a knowledge base returns zero chunks, the agent answers from its training data — and that answer is often wrong or stale. A zero-retrieval rate above 5% is a signal your embedding index needs refreshing or your chunk size is too large.
Tracing action group Lambda invocations from inside the Lambda
The wrapper above gives you visibility at the Bedrock side. For Lambda-level visibility — what the function actually did, what external APIs it called, how long each step took — you need instrumentation inside the Lambda itself.
The trick is passing the parent trace_id through the session attributes that Bedrock forwards to every action group Lambda.
# When invoking the agent, include trace_id in sessionState
response = bedrock_agent.invoke_agent(
agentId=agent_id,
agentAliasId=alias_id,
sessionId=session_id,
inputText=user_input,
enableTrace=True,
sessionState={
'sessionAttributes': {
'nexus_trace_id': trace_id, # passed to every Lambda
}
},
)
# Inside the action group Lambda (Python)
import boto3
import requests
import time
import os
NEXUS_API_KEY = os.environ['NEXUS_API_KEY']
NEXUS_BASE = 'https://nexus.keylightdigital.dev'
def lambda_handler(event, context):
# Bedrock passes sessionAttributes on every invocation
session_attrs = event.get('sessionAttributes', {})
parent_trace_id = session_attrs.get('nexus_trace_id')
action_group = event.get('actionGroup', 'unknown')
api_path = event.get('apiPath', 'unknown')
params = event.get('parameters', [])
param_map = {p['name']: p['value'] for p in params}
t0 = time.time()
span_id = None
if parent_trace_id:
span_id = requests.post(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans',
json={
'name': f'lambda:{action_group}:{api_path}',
'input': str(param_map),
'metadata': {
'action_group': action_group,
'api_path': api_path,
'lambda_request_id': context.aws_request_id,
},
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
).json().get('spanId')
try:
# Your action group logic here
result = execute_action(api_path, param_map)
if parent_trace_id and span_id:
requests.patch(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans/{span_id}',
json={
'status': 'success',
'output': str(result)[:300],
'latency_ms': int((time.time() - t0) * 1000),
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
return {
'actionGroup': action_group,
'apiPath': api_path,
'httpStatusCode': 200,
'responseBody': {'application/json': {'body': str(result)}},
}
except Exception as e:
if parent_trace_id and span_id:
requests.patch(
f'{NEXUS_BASE}/api/traces/{parent_trace_id}/spans/{span_id}',
json={
'status': 'error',
'latency_ms': int((time.time() - t0) * 1000),
'error': str(e),
},
headers={'Authorization': f'Bearer {NEXUS_API_KEY}'},
)
# Return error in Bedrock format — agent will see this as an observation
return {
'actionGroup': action_group,
'apiPath': api_path,
'httpStatusCode': 500,
'responseBody': {'application/json': {'body': f'error: {str(e)}'}},
}
This pattern gives you correlated traces. The outer wrapper creates the parent trace. The Lambda creates a child span under the same trace_id using the session attributes channel. In Nexus you can see the full timeline: orchestration step → Lambda execution → external API call → response → orchestrator synthesis.
Debugging silent agent failures with span metadata
The metadata fields you record are what make debugging tractable. Here are the four patterns most useful for diagnosing Bedrock Agent failures:
Pattern 1: Zero-chunk retrieval with confident answer. Filter traces where kb_lookups > 0 AND all knowledge base spans have chunk_count: 0 but status: success. These are the answers the agent invented because retrieval returned nothing.
Pattern 2: Action group error the agent swallowed. Filter traces where an action group span has status: error but the parent trace has status: success. These are the failures the agent masked with a graceful response.
Pattern 3: Orchestration loops. Filter traces where orchestration_steps > 5. High step counts with action_group_calls == 0 mean the orchestrator couldn’t decide which action to call — usually a prompt or schema issue. High step counts with many repeated action group calls mean the Lambda response isn’t satisfying the orchestrator’s requirements.
Pattern 4: Lambda latency spikes. Filter Lambda spans where latency_ms > 5000. Cross-reference with lambda_request_id to find the CloudWatch log stream for that specific cold start or timeout.
TypeScript wrapper for Next.js and Node.js callers
If your application layer is TypeScript, the same pattern works via the AWS SDK v3 and the Nexus REST API:
import { BedrockAgentRuntimeClient, InvokeAgentCommand } from '@aws-sdk/client-bedrock-agent-runtime'
const client = new BedrockAgentRuntimeClient({ region: 'us-east-1' })
const NEXUS_API_KEY = process.env.NEXUS_API_KEY!
const NEXUS_BASE = 'https://nexus.keylightdigital.dev'
async function invokeBedrockAgent(
agentId: string,
aliasId: string,
sessionId: string,
userInput: string,
): Promise<{ answer: string; traceId: string }> {
const t0 = Date.now()
const traceRes = await fetch(`${NEXUS_BASE}/api/traces`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
name: `bedrock-agent:${agentId}`,
input: userInput,
metadata: { agent_id: agentId, alias_id: aliasId, session_id: sessionId },
}),
})
const { traceId } = await traceRes.json()
try {
const command = new InvokeAgentCommand({
agentId,
agentAliasId: aliasId,
sessionId,
inputText: userInput,
enableTrace: true,
sessionState: { sessionAttributes: { nexus_trace_id: traceId } },
})
const response = await client.send(command)
let answer = ''
let actionGroupCalls = 0
let kbLookups = 0
for await (const event of response.completion ?? []) {
if (event.chunk?.bytes) {
answer += new TextDecoder().decode(event.chunk.bytes)
}
if (event.trace?.trace?.orchestrationTrace) {
const orch = event.trace.trace.orchestrationTrace
const invType = orch.invocationInput?.invocationType
if (invType === 'ACTION_GROUP') actionGroupCalls++
if (invType === 'KNOWLEDGE_BASE') kbLookups++
}
}
await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
status: 'success',
output: answer.slice(0, 500),
latency_ms: Date.now() - t0,
metadata: { action_group_calls: actionGroupCalls, kb_lookups: kbLookups },
}),
})
return { answer, traceId }
} catch (err) {
await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ status: 'error', latency_ms: Date.now() - t0, error: String(err) }),
})
throw err
}
}
What to monitor in production
Once traces are flowing from your Bedrock Agent integration, four metrics give you the most actionable signal:
- Zero-retrieval rate: The percentage of traces where
kb_lookups > 0but all knowledge base spans havechunk_count: 0. Alert when this exceeds 5% — it means your embedding index is stale or your queries aren’t matching your document structure. - Hidden action group error rate: The percentage of traces where an action group span has
status: errorbut the parent trace hasstatus: success. Alert at 2% — these are broken integrations the agent is actively hiding from your users. - Mean orchestration steps per trace: A baseline of 2–4 steps is normal for a simple action + synthesis. If mean steps creep above 6, the orchestrator is struggling — usually a schema mismatch between your OpenAPI spec and the Lambda’s actual parameters.
- P95 end-to-end latency: Bedrock Agents are inherently multi-step, so P95 latency of 5–15s is common. A spike above 30s usually indicates a Lambda cold start or a knowledge base index that’s too large for the configured chunk size.
Next steps
AWS Bedrock Agents handle the orchestration complexity — but that same managed abstraction makes silent failures harder to diagnose. Wrapping invoke_agent with a Nexus trace, recording action group spans from inside the Lambda via session attributes, and instrumenting knowledge base observations gives you the full timeline from user request to final answer. Sign up for a free Nexus account to start capturing traces from your Bedrock Agents today.
Add observability to AWS Bedrock Agents
Free tier, no credit card required. Full trace visibility in under 5 minutes.