Changelog
The full development history of Nexus — shipped in days, not months. Every milestone here was implemented by Ralph, an AI agent at Keylight Digital.
RSS feedThe full development history of Nexus — shipped in days, not months. Every milestone here was implemented by Ralph, an AI agent at Keylight Digital.
RSS feedNew guide at /docs/ollama covering Nexus observability for local LLM agents powered by Ollama — wrapping both direct REST API calls and OpenAI-compatible endpoint calls in Nexus spans, recording eval_count tokens, detecting empty responses, and tracing multi-turn loops.
New guide at /docs/cloudflare-workers-ai covering how to wrap every env.AI.run() call in a Nexus span — recording model name, latency, and token counts in Workers AI responses, detecting empty responses, and running parallel multi-model comparisons with child spans.
New comparison page at /vs/openllmetry for teams evaluating the OpenLLMetry OTel wrapper vs. Nexus purpose-built agent observability. Covers auto-instrumentation vs. explicit span control, OTel backend flexibility, agent-native features, and pricing.
New post at /blog/groq-agent-observability covering Groq LPU inference observability — wrapping groq.chat.completions.create() in Nexus spans, recording token usage and sub-second latency, detecting rate limit errors, and streaming response patterns.
New post at /blog/together-ai-agent-observability covering Together AI inference observability — tracing open-source model calls (Llama 3, Mistral, Qwen, DBRX) via the OpenAI-compatible API with Nexus spans, recording token costs per model, and streaming patterns.
New post at /blog/ai-model-ab-testing covering structured A/B testing between AI models using Nexus span metadata — assigning variants by request, recording latency, token cost, and quality scores per variant, and querying results via the Nexus REST API.
New post at /blog/modal-agent-observability covering serverless GPU compute observability — wrapping Modal functions with Nexus spans, detecting cold starts via MODAL_IS_COLD_START, classifying CUDA OOM errors, and passing trace context through async fan-out workers.
New step-by-step guide at /docs/mistral covering how to instrument Mistral AI agents with Nexus — wrapping mistralai.chat() calls in Nexus traces, recording token usage and model names, tracing multi-turn agent loops, and monitoring function/tool call sequences.
New guide at /docs/mcp-server covering Nexus observability for Model Context Protocol (MCP) servers — wrapping tool handlers in Nexus spans in both Python (FastMCP) and TypeScript (@modelcontextprotocol/sdk), tracing resource read and prompt calls, and monitoring tool error rates.
New guide at /docs/n8n covering Nexus observability for n8n AI Agent workflows — adding HTTP Request nodes for trace/span lifecycle, extracting token counts from n8n AI Agent output expressions, handling error branches, and monitoring multi-agent workflows.
New comparison page at /vs/logfire for teams evaluating Pydantic Logfire vs. Nexus. Covers Logfire's Pydantic/Python-first model vs. Nexus's agent-first tracing, OpenTelemetry exporter support, pricing differences, and when each tool is the right fit.
New comparison page at /vs/grafana targeting infrastructure teams evaluating Grafana Cloud vs. purpose-built AI agent observability. Covers Grafana's dashboards and alerting ecosystem vs. Nexus's AI-native trace model, setup complexity, and cost.
New post at /blog/ollama-agent-observability covering how to add Nexus observability to agents powered by local LLMs via Ollama — both direct REST API and OpenAI-compatible endpoint patterns, recording eval_count tokens, detecting empty responses, and multi-turn agent loop tracing.
New post at /blog/cloudflare-workers-ai-observability covering how to wrap every env.AI.run() call in a Nexus span — recording model name, latency, and token counts, detecting empty responses, multi-model comparison with parallel spans, and AI Gateway compatibility.
New post at /blog/ai-agent-cost-attribution covering how to tag Nexus spans with user_id, tenant_id, and feature, query the Nexus REST API to aggregate token spend per tag, and enforce per-tenant token budgets. Python and TypeScript examples.
New post at /blog/crewai-flows-observability covering CrewAI Flows (v0.60+) observability — one span per @start/@listen/@router method, capturing flow state at each transition, detecting routing failures before the flow silently halts, and cross-linking to /docs/crewai.
Restored missing sidebar links for integration guides (Flowise, AWS Bedrock, Azure AI, and others) that were live but not reachable from the docs navigation. Guides were deployed but not wired into the sidebar nav.
New step-by-step guide covering how to instrument Azure AI Agent Service agents with Nexus — wrapping agent runs in Nexus traces, recording tool call steps and code interpreter runs, and monitoring run failure rates and token spend.
New guide showing how to add Nexus observability to AWS Bedrock Agents — wrapping InvokeAgent calls in Nexus traces, tracing action group Lambda invocations, monitoring retrieval from knowledge bases, and debugging silent agent failures.
New comparison page at /vs/deepeval for teams evaluating DeepEval (offline LLM testing framework) vs. Nexus (runtime agent observability). Covers eval-time vs. trace-time visibility, G-Eval metrics, pricing, and when each tool is the right fit.
New guide covering Nexus observability for Flowise — the popular no-code LangChain-based visual agent builder. Shows how to add HTTP Request nodes to capture agent run traces, debug chatflow failures, and monitor token usage across chatflows.
New post at /blog/mcp-server-observability covering how to wrap MCP tool handlers with Nexus spans in both Python (FastMCP) and TypeScript (@modelcontextprotocol/sdk) for full trace-level visibility into every tool call.
New post at /blog/n8n-agent-observability showing how to add Nexus traces to n8n AI Agent workflows using HTTP Request nodes and n8n expressions — no code required. Covers span-per-agent, error branches, multi-agent workflows, and token budget alerts.
New post at /blog/openai-agents-sdk-observability covering how to instrument OpenAI Agents SDK runs with Nexus. Covers handoff span tracing, triage agent observability, function tool call failures, and infinite delegation loop detection.
New post at /blog/llamaindex-agent-observability — QueryPipeline step tracing, AgentWorker/ReAct loop instrumentation, per-hop retrieval spans for multi-hop RAG, and failure pattern diagnosis. Python examples throughout.
New post at /blog/dspy-agent-observability — program.forward() tracing, multi-hop retrieval chain spans, MIPRO optimizer iteration metadata, and production vs. compiled-program divergence detection.
New post at /blog/mastra-agent-observability — TypeScript instrumentation for Mastra Agent.generate(), Workflow step spans, tool call wrapping, and metadata for filtering by workflow version and user plan.
New post at /blog/vercel-ai-sdk-observability — streamText onFinish tracing, generateObject structured output spans, multi-step tool call loops, and cost attribution per user and feature. TypeScript examples for Next.js App Router.
New comparison page at /vs/sentry targeting developers evaluating Sentry’s LLM monitoring add-on vs. purpose-built agent observability. Covers agent-first vs. general APM, session replay, token cost tracking, and pricing.
New comparison page at /vs/literal-ai targeting Chainlit developers evaluating Literal AI vs. Nexus. Covers thread/step model vs. agent-first tracing, human annotation workflows, and alerting support.
Fixed SEO gap: smolagents, openai-assistants, ag2-autogen, debugging-langgraph, ai-agent-trace-metadata, openai-agents-sdk, llamaindex, and dspy blog posts were live but not in sitemap.xml, preventing Google from indexing them.
New post at /blog/flowise-agent-observability covering how to add observability to AI agents built with Flowise — the popular no-code LangChain-based visual agent builder. Covers tracing custom tool nodes, debugging chatflow failures in production, and using the Nexus SDK alongside Flowise chatflows.
New post at /blog/aws-bedrock-agent-observability covering how to instrument AWS Bedrock Agents with Nexus. Covers wrapping InvokeAgent calls in Nexus traces, tracing action group Lambda invocations, monitoring retrieval from knowledge bases, and debugging silent agent failures using span metadata.
New comparison page at /vs/uptrace comparing Nexus vs Uptrace — OTel-native distributed tracing vs. AI-first agent observability. Covers self-hosting complexity, AI-specific span attributes vs generic OTel spans, pricing, and when Uptrace’s OTel compliance wins.
New comparison page at /vs/jaeger comparing Nexus vs Jaeger — CNCF open-source distributed tracing vs. purpose-built AI agent observability. Targets developers coming from Kubernetes/microservices backgrounds now building AI agents.
New guide at /docs/agno showing how to instrument Agno agents and teams with Nexus — wrapping Agent.run() in startTrace/addSpan/endTrace, tracing Team routing decisions, and monitoring tool calls within agent runs.
New guide at /docs/haystack covering how to instrument Haystack pipelines with Nexus — per-component spans for Embedder, Retriever, PromptBuilder, and Generator, plus retrieval quality monitoring and debugging failed pipeline runs.
New guide at /docs/typescript-quickstart (parallel to the Python quickstart) covering npm install @keylightdigital/nexus, NexusClient initialization, startTrace, addSpan, endTrace, error recording, and a complete Next.js App Router example.
New post at /blog/azure-ai-agent-observability covering how to add observability to agents built with Azure AI Agent Service — wrapping agent runs in Nexus traces, tracing tool call steps and code interpreter runs, and monitoring run failure rates.
New post at /blog/multi-model-agent-observability covering observability when agents route between multiple LLM providers — tracking model selection in span metadata, comparing cost and latency per provider, and detecting quality regressions after model switches.
New post at /blog/haystack-agent-observability covering how each Haystack pipeline component maps to a Nexus span — Embedder, Retriever, PromptBuilder, Generator. Covers retrieval quality monitoring, generation quality tracking, and debugging failed pipeline runs.
New post at /blog/agno-agent-observability covering Agno framework observability — wrapping agent runs in Nexus traces, tracing team/member agent hierarchies, and monitoring tool calls and model responses.
New post at /blog/semantic-kernel-agent-observability covering how to instrument Semantic Kernel agents with Nexus — kernel invocation spans, plugin and planner tracing, and monitoring function calls across sync and async kernels.
New guide at /docs/semantic-kernel covering NexusClient integration, wrapping KernelFunction invocations as Nexus spans, and tracing kernel plugins and planner steps in Python.
New comparison page at /vs/confident-ai for developers evaluating Confident AI (the cloud platform built by the DeepEval team) vs. Nexus. Covers LLM evaluation and regression testing vs. runtime observability, pricing, and when each tool is the right choice.
New comparison page at /vs/athina comparing Nexus vs Athina AI — Athina’s eval-first guardrails and prompt management vs. Nexus runtime trace/span observability. Includes TL;DR grid, feature table, and honest take on when each wins.
Added Sentry, Literal AI, Opik (Comet), Humanloop, Langtrace, Logfire, Galileo, and Confident AI to the /alternatives page — both the at-a-glance summary table and detailed comparison sections with descriptions, pricing, and links to full /vs/ pages.
Trace detail pages now show the trace ID with a one-click Copy button. Clicking copies the ID to the clipboard using the Clipboard API, with a 1.5-second “Copied!” confirmation. Includes a fallback for browsers without Clipboard API support.
New comparison page at /vs/mlflow: “Nexus vs MLflow for AI Agent Observability.” Covers real-time tracing vs. experiment tracking, pricing (self-hosted vs. $9/mo flat), setup complexity, and the honest take on when MLflow’s model registry wins.
New posts targeting high-intent developer searches: “How Prompt Caching Can Cut Your AI Agent Costs by 80%”, “Debugging Multi-Agent Orchestration: A Practical Guide”, and “How to Write Tests for LLM-Based AI Agents”. Each post includes code examples and a CTA to /register.
New guide at /docs/metadata explaining how to use structured metadata in traces: naming conventions, searchable fields, debugging patterns, and cost attribution. Linked from the /docs sidebar.
Pro users receive a weekly summary email every Sunday at 9am UTC via Cloudflare Workers cron. Digest includes: traces this week, error count, top agents by volume, and a link back to the dashboard.
Settings page now includes a “Delete account” section requiring typed confirmation. Deletion cascades to all traces, spans, API keys, agents, and cancels any active Stripe subscription. Satisfies GDPR right-to-erasure requirements.
Agent detail pages now show a webhook shortcut card: Pro users see a “Set up webhook for this agent →” link to /dashboard/settings; Free users see a dimmed version with an upgrade prompt.
Trace detail pages now show a summary header with total span count, error span count, and average span duration — visible before scrolling the waterfall. Helps identify noisy or failing traces at a glance.
New guide at /docs/openai-realtime covering how to instrument the OpenAI Realtime API (WebSocket sessions, audio turns, function calls) with Nexus. Includes TypeScript code examples for streaming session tracing.
Overview dashboard now shows two additional stat cards: total traces this calendar month and 30-day average trace latency. Gives a longer-horizon view alongside the existing 7-day chart.
New post at /blog/debugging-crewai-agents showing how to instrument CrewAI multi-agent workflows with Nexus. Covers agent-to-agent span propagation, tool call tracing, and reading the waterfall when agents call other agents.
New comparison page at /vs/new-relic targeting developers evaluating enterprise APM vs. purpose-built AI observability. Covers pricing (New Relic free tier limits vs. $9/mo flat), setup complexity, and AI-specific feature gaps.
Traces page now supports filtering by status (success, error, running, timeout) via query parameter. Filter pills update the URL so filtered views are bookmarkable and shareable.
Agent detail pages now show a 30-day error rate sparkline bar chart above the 7-day trace volume chart. Each bar represents one day; red bars indicate error/timeout traces. Helps identify when agent reliability changed.
New comparison page at /vs/honeycomb targeting developers coming from distributed tracing backgrounds. Positions Nexus as the AI-specific, affordable alternative to Honeycomb’s general-purpose observability platform.
New post at /blog/debugging-langchain-agents — a step-by-step guide to instrumenting LangChain chains and agents with Nexus, reading the trace waterfall, and diagnosing common failures like hallucinations, tool errors, and runaway loops.
Added a social proof section to the landing page with developer quotes from indie builders using Nexus. Each testimonial includes avatar, name, and role for credibility.
New guide at /docs/mastra covering how to instrument Mastra agents and workflows with Nexus. Mastra is a TypeScript-native agent framework; the guide shows step tracing, tool call spans, and workflow-level observability.
New comparison page at /vs/traceloop covering Traceloop’s OpenLLMetry open-source SDK vs. Nexus hosted observability. Honest take on when self-hosted OpenTelemetry wins vs. when a managed backend saves time.
The /alternatives page now includes MLflow in the comparison table. Covers MLflow’s model registry and experiment tracking strengths alongside its limitations for real-time agent observability.
New post at /blog/ag2-autogen-observability — covers ConversableAgent tracing, GroupChat span propagation, and how to instrument multi-agent conversations where agents hand off to each other.
New comparison page at /vs/comet comparing Nexus vs Comet Opik — eval-first LLM platform vs. agent-runtime observability. Honest take on when Opik’s evaluation datasets and annotation workflows win.
New guide at /docs/smolagents covering HuggingFace Smolagents CodeAgent and ToolCallingAgent instrumentation with Nexus — tool call spans, agent step tracing, and metadata best practices.
New comparison page at /vs/humanloop comparing Nexus vs Humanloop — prompt management and evaluation platform vs. agent runtime tracing. Targets developers evaluating Humanloop for production observability.
New post at /blog/openai-assistants-observability — thread creation, run lifecycle, and step-level tracing for the OpenAI Assistants API. Shows how to surface tool call failures and silent run errors with Nexus.
Agent cards on the overview dashboard now show a 7-day error rate — red for >10% errors, green for clean agents. Gives immediate health visibility without clicking into each agent.
Blog post at /blog/ag2-autogen-observability expanded with GroupChat coordination patterns, orchestrator vs. worker agent span hierarchy, and Python code examples for full multi-agent traces.
New comparison page at /vs/langtrace comparing Nexus vs Langtrace — OpenTelemetry-native open-source observability vs. agent-first managed tracing. Covers OTel compatibility, self-hosting, and SDK breadth.
New guide at /docs/vercel-ai-sdk covering how to instrument Vercel AI SDK streamText, generateText, and useChat flows with Nexus. TypeScript examples with streaming span lifecycle.
New post at /blog/ai-agent-token-cost-tracking — records prompt_tokens, completion_tokens, and estimated_cost_usd as span metadata for per-agent cost attribution. Python and TypeScript examples with a model pricing reference table.
New comparison page at /vs/logfire comparing Nexus vs Pydantic Logfire — OTel-native Python-first platform vs. agent-runtime observability. Covers PydanticAI zero-config vs. multi-framework support.
New post at /blog/debugging-langgraph-agents — StateGraph node instrumentation, infinite loop detection, routing debugging, and state corruption patterns. Includes Python examples with node_span decorator.
New comparison page at /vs/galileo comparing Nexus vs Galileo AI — LLM evaluation and hallucination detection vs. agent runtime health monitoring. Honest take on using both in different phases of the pipeline.
Traces page now has a client-side “Filter by metadata” input. Type any metadata key or value to instantly filter the displayed rows — no page reload. Works with environment, model, user_id, and any custom metadata fields.
New post at /blog/ai-agent-trace-metadata — what metadata to capture (model, user_id, environment, feature_flag), naming conventions, and the incident debugging workflow using metadata filters.
Pro users can set a per-account latency threshold in Settings. When a trace duration exceeds the threshold, Nexus fires an email alert and webhook (trace.slow event) — separate from error/timeout alerts with its own rate limit.
Nexus auto-detects Slack Incoming Webhook URLs and sends richly formatted Slack blocks instead of generic JSON. Non-Slack URLs continue to receive the existing payload. Settings page shows a live "Slack Incoming Webhook detected" badge.
Free users see a persistent upgrade prompt on all dashboard pages (overview, traces, agents, keys, settings) — "Upgrade to Pro — $9/mo for 50k traces, email alerts, and unlimited agents." Hidden for Pro users.
New guide at /docs/github-actions shows how to instrument AI agents running inside GitHub Actions workflows. Includes complete YAML workflow with NEXUS_API_KEY secret, and TypeScript + Python SDK examples.
Added posts targeting high-commercial-intent queries: "How to Instrument Claude Code Agents", "AI Agent Reliability Patterns: Retry, Timeout, Circuit Breaker", and "How Trace Analysis Cut Our AI Agent Costs by 60%".
All 16 blog posts now have dynamic og:title, og:description, and article:published_time meta tags derived from the post data. Previously most posts shared a generic OG title, reducing click-through from social sharing.
Fixed a site ID mismatch that caused Beam pageview tracking to silently fail on all pages (wrong ID rejected CORS preflight). Confirmed Beam script is present on all public and dashboard pages with the correct site ID.
New users with zero traces see an interactive quickstart: masked API key with reveal/copy, tabbed Python/TypeScript/curl code snippets, and a live polling indicator that reloads with a celebration on first trace.
Full-featured demo dashboard loaded from real D1 data — 53 traces across 5 agents with realistic error rates and latencies. No sign-up required.
D1 seed script with idempotent INSERT OR IGNORE populates the demo dashboard: 53 traces, 5 agents, 11 spans, ~10% error rate, and 7-day trace spread for realistic chart data.
Playwright smoke tests for all core user flows: auth, API keys, trace ingestion, billing, dashboard, agents, settings, and public pages. Runs against local, staging, and production.
Isolated staging deployment at nexus-staging.stevencolecobb.workers.dev with Stripe test mode configured. Enables safe end-to-end billing validation before production deploys.
Legally required pages at /privacy and /terms. Full content covering data collection, retention, payment processing, user rights, and service terms.
Subscribe to the Nexus blog via RSS. Auto-discovery link in <head> means readers like Feedly and Reeder detect the feed automatically.
Added Strict-Transport-Security, Permissions-Policy, X-Frame-Options, and X-Content-Type-Options headers. Reduces attack surface on all routes.
Cache-Control headers on all public HTML pages: max-age=300 with stale-while-revalidate. Dramatically reduces D1 reads and improves Time to First Byte on cached edges.
Wrangler deploy runs automatically on every push to main via GitHub Actions. Deploy time: ~45 seconds from push to live.
Systematic pass over every page for 375px viewport: fixed horizontal overflow on traces table, improved tap target sizes, tightened nav padding, fixed chart overflow.
Added skip-to-content links, aria-labels on icon buttons, semantic landmark elements, and improved color contrast ratios across dashboard and auth pages.
Eight new integration guides covering the most popular AI agent frameworks. Each guide includes a complete working example, environment setup, and framework-specific tips.
Added /vs/helicone, /vs/braintrust, /vs/datadog, /vs/wandb, /vs/portkey, and /vs/arize-phoenix. Honest, technical comparisons with up-to-date pricing.
Switched from runtime Tailwind CDN to build-time CSS bundle. Page weight cut by ~300KB. CSS is now bundled at deploy time with only the classes actually used.
Any trace can be shared as a public read-only URL. Useful for AI agent debugging sessions with teammates or attaching to bug reports.
Search traces by agent name, status, error message, or metadata. Full-text search backed by SQLite FTS on D1. Results update as you type.
Pro users can configure a webhook URL in Settings. Nexus POSTs a JSON payload on every trace that ends with status error or timeout, in real time.
New users see a step-by-step checklist: create API key → send first trace → view in dashboard. Dismissible, with individual step completion tracking.
Visual waterfall chart on trace detail pages — shows span start times, durations, and nesting at a glance. Rendered entirely in CSS with no JS charting library.
Dashboard metrics refresh every 30 seconds. A pulsing dot in the corner shows the last-updated time. Keeps the overview useful during long-running agent sessions.
7-day bar chart on the overview dashboard shows trace volume and error count side by side. Helps identify when an agent deployment caused a spike.
Cloudflare Workers cron pings /health every minute and writes status to KV. Failed pings trigger an alert email via Resend.
Accept OTLP/HTTP JSON format at POST /v1/traces. Any developer with existing OpenTelemetry instrumentation can point their exporter at Nexus with a one-line config change — no SDK required.
Filter traces by status, agent, and date range on the traces page. Supports query parameters so filters persist across page loads. Default view: last 7 days.
Full Python SDK for AI agent developers. Mirrors the TypeScript API: NexusClient, start_trace(), add_span(), end(). Zero external dependencies — pure stdlib only.
Honest, technical comparison pages targeting developer search queries. Acknowledges competitor strengths while positioning Nexus for its niche: indie developers who need simplicity, not enterprise features.
Explore the full trace viewer and dashboard with realistic sample data — no sign-up required. Three sample agents with spans, waterfall views, and error cases.
Full REST API reference with request/response examples, authentication guide, and SDK quickstarts for both TypeScript and Python. Linked from the nav and landing page.
Published "How I Monitor My AI Agents for $9/Month" — a technical, honest walkthrough of the Nexus architecture, SDK integration, and pricing rationale.
Every public page now has og:title, og:description, og:image, twitter:card. GET /og-image.png returns a branded SVG card for rich link previews on HN, Twitter, and Slack.
SEO foundations: robots.txt allows all crawlers, sitemap.xml lists all public pages with priority weights, landing page includes Organization + SoftwareApplication JSON-LD.
Manage your API keys, view account plan, and delete your account. Account deletion cascades to all data and cancels any active Stripe subscription.
Upgrade to Pro ($9/mo) via Stripe Checkout. Webhooks handle subscription lifecycle events to keep plan status current. Billing portal lets Pro users manage their subscription.
Pro users receive email alerts via Resend when a trace ends with status error or timeout. Rate-limited to 1 alert per agent per 5 minutes to prevent alert fatigue.
See all registered agents, their health status, and per-agent trace history. Agents are auto-created on first trace ingestion — no manual setup required.
Overview page showing traces this month, error rate, average latency, per-agent health cards, and a 7-day CSS bar chart. All metrics from live D1 queries.
Browse your traces with status color-coding, durations, and pagination. Trace detail shows all spans in waterfall order with collapsible input/output/error — zero JavaScript.
Open-source npm package for agent instrumentation. NexusClient → startTrace() → addSpan() → end(). All methods handle network errors gracefully and never throw.
Capture individual LLM calls, tool uses, and sub-agent invocations with timing, input, output, and error data. Nested spans via parent_span_id.
Core API endpoint for agent observability. API key auth, plan limit enforcement (1K traces/month Free), auto-creates agent records on first use. Returns trace_id in 201ms.
Generate nxs_-prefixed API keys (SHA-256 hashed, shown once), list active keys, revoke compromised ones. Keys identify your agents in the ingestion API.
No passwords. Enter your email, click the link. Sessions stored in Cloudflare KV (7-day TTL). Rate-limited to prevent abuse.
Server-rendered HTML. Hero, pricing table (Free vs Pro), feature comparison, how-it-works, SDK code example, FAQ, meta-narrative. No JavaScript frameworks.
D1 (SQLite at edge) schema with all core tables, indexes, and IF NOT EXISTS guards. ON DELETE CASCADE throughout for clean account deletion.
Nexus born. Cloudflare Workers runtime, Hono framework, D1 database, KV namespace, TypeScript, wrangler. GET /health returns {status: "ok"}. Build time: ~2 hours.