Setting Up Alerts for AI Agent Failures: Webhooks, Slack, and Error Rate Monitoring
Polling dashboards doesn't work for production AI agents — they fail silently, degrade gradually, and spike in error rate before you notice. Here's how to set up webhook and Slack alerts for agent errors and latency thresholds with Nexus, so you're notified within minutes of a failure.
Why polling dashboards fails for production agents
For traditional software, checking error dashboards every morning is fine — a crashed server is usually obvious and recovers quickly. AI agents fail differently: they degrade gradually, fail silently, or spike in error rate for a specific input pattern while handling other requests normally. By the time you notice in a dashboard, the impact has already happened.
Production AI agents need the same alerting model as distributed services:
- Error rate spike: your agent's error rate goes from 2% to 35% after a model API change. You need to know within minutes, not the next morning.
- Latency regression: a new RAG index deployment doubles average trace latency. Users are timing out before you see the trend.
- Silent failure mode: your agent returns with status “success” but the output is malformed. Without structured error checking, it looks fine in the dashboard.
Setting up Nexus webhook alerts
Nexus Pro users configure webhook alerts in Settings. When a trace ends with status: "error" or status: "timeout", Nexus sends a POST request to your webhook URL with a structured payload:
# Example Nexus webhook payload (trace.error event)
{
"event": "trace.error",
"trace_id": "tr_01HX...",
"agent_id": "customer-support-agent",
"status": "error",
"error": "Tool call failed: search_docs returned 503",
"latency_ms": 4823,
"started_at": "2026-04-19T02:31:00Z",
"ended_at": "2026-04-19T02:31:04Z",
"metadata": {
"user_id": "u_abc123",
"environment": "production",
"model": "gpt-4o"
}
}
You configure the webhook URL in Settings → Webhook URL. Nexus auto-detects Slack Incoming Webhook URLs and sends richly formatted Slack blocks instead of raw JSON — no custom adapter needed.
Receiving alerts in Slack
The fastest path to Slack alerts: create a Slack Incoming Webhook, paste the URL into Nexus Settings. Nexus detects the Slack URL and formats the alert as a Slack block message automatically:
# What Nexus sends to Slack (formatted automatically):
#
# 🚨 Agent error: customer-support-agent
# Trace: tr_01HX...
# Error: Tool call failed: search_docs returned 503
# Latency: 4.8s
# Environment: production
# User: u_abc123
#
# [View trace] [View agent]
Rate limiting prevents alert fatigue: Nexus sends at most one alert per agent per 5 minutes by default, regardless of how many traces fail in that window.
Building a custom webhook receiver
For custom alerting — PagerDuty, custom Slack formatting, or internal dashboards — build a simple webhook receiver:
# Simple Flask webhook receiver
from flask import Flask, request, jsonify
import requests
import os
app = Flask(__name__)
PAGERDUTY_KEY = os.environ["PAGERDUTY_INTEGRATION_KEY"]
@app.route('/nexus-webhook', methods=['POST'])
def handle_nexus_webhook():
payload = request.json
event = payload.get('event')
if event == 'trace.error':
agent_id = payload['agent_id']
error = payload.get('error', 'Unknown error')
trace_id = payload['trace_id']
metadata = payload.get('metadata', {})
# Trigger PagerDuty incident for production errors
if metadata.get('environment') == 'production':
trigger_pagerduty(
summary=f"AI agent error: {agent_id}",
details={
"trace_id": trace_id,
"error": error,
"agent": agent_id,
"user_id": metadata.get("user_id"),
},
)
elif event == 'trace.slow':
# Latency threshold exceeded
latency_ms = payload.get('latency_ms', 0)
agent_id = payload['agent_id']
post_to_slack(
f":clock3: Slow trace on {agent_id}: {latency_ms}ms (threshold exceeded)"
)
return jsonify({"ok": True})
def trigger_pagerduty(summary: str, details: dict):
requests.post(
"https://events.pagerduty.com/v2/enqueue",
json={
"routing_key": PAGERDUTY_KEY,
"event_action": "trigger",
"payload": {
"summary": summary,
"severity": "error",
"source": "nexus-agent-monitor",
"custom_details": details,
},
},
)
def post_to_slack(message: str):
slack_url = os.environ["SLACK_WEBHOOK_URL"]
requests.post(slack_url, json={"text": message})
Latency threshold alerts
Beyond error alerts, Nexus Pro supports latency threshold alerts: when a trace duration exceeds your configured threshold, Nexus fires a trace.slow event. Configure the threshold in Settings → Latency threshold.
Useful thresholds by agent type:
- Synchronous chat agents: 8-10 seconds (user-facing, latency is UX)
- Background research agents: 60-120 seconds (async, users don't wait)
- RAG pipelines: 5-8 seconds (includes retrieval latency baseline)
- Multi-agent pipelines: 30-45 seconds (multiple LLM calls expected)
Alert fatigue and rate limiting
The most common alerting mistake: alerting on every error. If your agent handles 1,000 traces per hour and has a 5% error rate, you'll receive 50 alerts per hour — which you'll start ignoring within a day.
The right approach:
- Alert on rate spikes, not individual errors. Nexus rate-limits to one alert per agent per 5 minutes, giving you signal about sustained failures rather than individual ones.
- Use Slack for informational alerts, PagerDuty for critical ones. A slow trace is Slack-worthy; a complete agent failure that affects user sessions is page-worthy.
- Tag alerts with context. Include
environmentin your trace metadata so you can filter staging alerts from production alerts at the webhook receiver level.
Next steps
Webhook and email alerts are included in the Nexus Pro plan ($9/mo flat). Sign up, connect your agents, and configure a webhook URL in Settings to start receiving alerts within minutes.
Get alerts when your agents fail
Webhook + email alerts on Pro. Free tier available, no credit card required.