2026-04-29 · 9 min read

Observability for Model Context Protocol (MCP) Servers: Tracing Tool Calls with Nexus

The Model Context Protocol (MCP) lets AI hosts like Claude Desktop and Cursor call your server's tools over a standard JSON-RPC transport — but when a tool call returns the wrong result, takes 10 seconds, or throws a silent exception, the host LLM has no way to surface which tool failed or why. Here's how to wrap MCP tool handlers with Nexus spans in both Python (FastMCP) and TypeScript (@modelcontextprotocol/sdk) to get full trace-level visibility into every tool call your server handles.

What the Model Context Protocol is

The Model Context Protocol (MCP) is an open standard introduced by Anthropic that lets AI hosts — applications like Claude Desktop, Cursor, Cline, and any other LLM-powered client — call tools exposed by external servers over a standardized JSON-RPC transport. You write an MCP server that advertises a list of tools (functions with typed input schemas), and any compatible host can discover and invoke them.

A typical MCP server exposes tools like these:

The host LLM decides when to call each tool, passes typed arguments, and integrates the tool’s response into its next generation. MCP decouples tool implementation from the AI host — the same server can be used by Claude Desktop, Cursor, and your own custom agent with no changes.

Why MCP servers need observability

From the host LLM’s perspective, a tool call is atomic: it sends arguments and receives a result. If the tool takes five seconds, returns garbage, or throws a non-descriptive error, the host sees only the failure — not which internal step caused it. Three failure modes are invisible without external instrumentation:

MCP servers are the interface between your infrastructure and an AI host you don’t control. Instrumenting them at the tool-call level gives you the only complete picture of what’s happening inside that interface.

Install the SDK

Python (FastMCP):

pip install nexus-agent mcp fastmcp

TypeScript (@modelcontextprotocol/sdk):

npm install @keylightdigital/nexus @modelcontextprotocol/sdk zod

Basic setup: Python with FastMCP

FastMCP is the official Python library for building MCP servers. Initialize the Nexus client once at module level alongside your FastMCP app:

import nexus_agent
from mcp.server.fastmcp import FastMCP

nexus = nexus_agent.Nexus(api_key="YOUR_API_KEY", agent_id="my-mcp-server")
app = FastMCP("my-server")

Then wrap each tool handler with a start_trace / start_span pair. Record the tool name and input in metadata:

@app.tool()
def search_docs(query: str) -> str:
    """Search documentation for a query."""
    trace = nexus.start_trace(name="mcp_tool_call")
    span = nexus.start_span(
        trace_id=trace["trace_id"],
        name="search_docs",
        metadata={"tool": "search_docs", "input": query},
    )
    try:
        result = _do_search(query)
        nexus.end_span(
            span_id=span["id"],
            status="success",
            metadata={"output_length": len(result)},
        )
        nexus.end_trace(trace_id=trace["trace_id"], status="success")
        return result
    except Exception as e:
        nexus.end_span(
            span_id=span["id"],
            status="error",
            metadata={"error": str(e)},
        )
        nexus.end_trace(trace_id=trace["trace_id"], status="error")
        raise

Every tool call now appears in Nexus as a trace with a single span. The tool field in metadata lets you filter the Nexus dashboard by tool name to see per-tool latency and error distributions.

Scaling to multiple tools: a Python context manager

Repeating the start_trace / end_trace block in every tool handler adds noise. A small context manager centralizes the span lifecycle and adds latency tracking automatically:

import time

def traced_tool(tool_name: str, inputs: dict):
    """Context manager that wraps any MCP tool call with a Nexus span."""
    import contextlib

    @contextlib.contextmanager
    def _ctx():
        trace = nexus.start_trace(name="mcp_tool_call")
        span = nexus.start_span(
            trace_id=trace["trace_id"],
            name=tool_name,
            metadata={"tool": tool_name, **inputs},
        )
        started = time.monotonic()
        try:
            yield trace, span
            latency_ms = (time.monotonic() - started) * 1000
            nexus.end_span(
                span_id=span["id"],
                status="success",
                metadata={"latency_ms": round(latency_ms, 1)},
            )
            nexus.end_trace(trace_id=trace["trace_id"], status="success")
        except Exception as exc:
            latency_ms = (time.monotonic() - started) * 1000
            nexus.end_span(
                span_id=span["id"],
                status="error",
                metadata={"error": str(exc), "latency_ms": round(latency_ms, 1)},
            )
            nexus.end_trace(trace_id=trace["trace_id"], status="error")
            raise

    return _ctx()


@app.tool()
def get_weather(city: str) -> str:
    with traced_tool("get_weather", {"city": city}):
        return _fetch_weather(city)


@app.tool()
def run_sql(query: str) -> list:
    with traced_tool("run_sql", {"query": query}):
        return _execute_query(query)

The latency_ms field recorded on every span lets you build a P95 latency view per tool in Nexus without any additional instrumentation. If run_sql spikes to 4× normal latency after a schema migration, you’ll see it immediately.

TypeScript: basic setup with @modelcontextprotocol/sdk

The official TypeScript SDK uses McpServer with Zod schema definitions. Initialize Nexus alongside your server:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
import { z } from 'zod'
import Nexus from '@keylightdigital/nexus'

const nexus = new Nexus({ apiKey: 'YOUR_API_KEY', agentId: 'my-mcp-server' })
const server = new McpServer({ name: 'my-server', version: '1.0.0' })

Then instrument each tool registration with startTrace and startSpan:

server.tool(
  'search_docs',
  { query: z.string() },
  async ({ query }) => {
    const trace = await nexus.startTrace({ name: 'mcp_tool_call' })
    const span = await nexus.startSpan({
      traceId: trace.traceId,
      name: 'search_docs',
      metadata: { tool: 'search_docs', input: query },
    })
    try {
      const result = await doSearch(query)
      await nexus.endSpan({
        spanId: span.id,
        status: 'success',
        metadata: { outputLength: result.length },
      })
      await nexus.endTrace({ traceId: trace.traceId, status: 'success' })
      return { content: [{ type: 'text', text: result }] }
    } catch (err) {
      await nexus.endSpan({
        spanId: span.id,
        status: 'error',
        metadata: { error: String(err) },
      })
      await nexus.endTrace({ traceId: trace.traceId, status: 'error' })
      throw err
    }
  }
)

TypeScript: a reusable tool wrapper

Just as with Python, a wrapper function eliminates the repeated span boilerplate and ensures every tool gets latency tracking:

function tracedTool<TArgs extends Record<string, unknown>>(
  toolName: string,
  handler: (args: TArgs) => Promise<{ content: Array<{ type: string; text: string }> }>
) {
  return async (args: TArgs) => {
    const started = Date.now()
    const trace = await nexus.startTrace({ name: 'mcp_tool_call' })
    const span = await nexus.startSpan({
      traceId: trace.traceId,
      name: toolName,
      metadata: { tool: toolName, ...args },
    })
    try {
      const result = await handler(args)
      await nexus.endSpan({
        spanId: span.id,
        status: 'success',
        metadata: { latencyMs: Date.now() - started },
      })
      await nexus.endTrace({ traceId: trace.traceId, status: 'success' })
      return result
    } catch (err) {
      await nexus.endSpan({
        spanId: span.id,
        status: 'error',
        metadata: { error: String(err), latencyMs: Date.now() - started },
      })
      await nexus.endTrace({ traceId: trace.traceId, status: 'error' })
      throw err
    }
  }
}

server.tool('get_weather', { city: z.string() }, tracedTool('get_weather', async ({ city }) => {
  const data = await fetchWeather(city)
  return { content: [{ type: 'text', text: JSON.stringify(data) }] }
}))

server.tool('run_sql', { query: z.string() }, tracedTool('run_sql', async ({ query }) => {
  const rows = await executeQuery(query)
  return { content: [{ type: 'text', text: JSON.stringify(rows) }] }
}))

The tracedTool wrapper accepts any tool handler and returns a span-wrapped version with the same type signature. Add it to your server registration once and every tool call lands in Nexus automatically.

What to record in metadata

The most useful fields to capture at the span level for MCP tools:

Tracking which tools get called most often

After a few days of production traffic, the Nexus trace list gives you a natural frequency distribution by tool name. Filter by metadata.tool in the Nexus dashboard to see call volume per tool. High-frequency tools are your optimization priority — a 50ms latency improvement on a tool called 500 times per day saves more than the same improvement on a tool called 5 times.

Low-frequency tools that the LLM consistently selects with wrong arguments indicate a tool description problem. If run_sql is called with malformed queries, update the tool’s JSON schema description and example to steer the LLM toward valid inputs.

Alerting when a tool returns an error

Use the Nexus API to check error rates per tool and send an alert before users notice:

# In your MCP server startup or a separate monitoring process
import requests

NEXUS_BASE = "https://api.nexus.keylightdigital.dev"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

def check_tool_error_rate(tool_name: str, window_minutes: int = 60) -> float:
    """Return the error rate for a specific MCP tool over the last N minutes."""
    resp = requests.get(
        f"{NEXUS_BASE}/v1/spans",
        headers=HEADERS,
        params={"name": tool_name, "limit": 200},
    )
    spans = resp.json().get("spans", [])
    if not spans:
        return 0.0
    errors = sum(1 for s in spans if s.get("status") == "error")
    return errors / len(spans)


# Alert if any tool exceeds 10% error rate
TOOLS = ["search_docs", "get_weather", "run_sql"]
for tool in TOOLS:
    rate = check_tool_error_rate(tool)
    if rate > 0.10:
        print(f"ALERT: {tool} error rate is {rate:.0%} — check Nexus traces")

Run this as a cron job every five minutes. When any tool exceeds a 10% error rate, you get an alert immediately — not when a user complains that the AI assistant “isn’t working.”

You can also use Nexus webhook alerts to post directly to Slack whenever an error rate threshold is crossed, without any polling infrastructure.

What to monitor in production

Next steps

MCP servers are the edge of your AI infrastructure — the boundary between your code and an AI host you don’t control. Wrapping each tool handler with a Nexus span gives you the input, output, latency, and error data you need to debug failures, optimize slow tools, and catch regressions before your users do. Sign up for a free Nexus account to start tracing your MCP server tool calls today, or read the Flowise integration guide if you’re building on top of a no-code AI workflow platform.

Trace every MCP tool call

Free tier, no credit card required. Full span-level visibility in under 5 minutes.