Monitoring Mistral AI Agents: Tracing Function Calls, Token Costs, and Rate Limits

Mistral AI's function-calling API lets you build agents that route between tools using mistral-large or mistral-small. When a tool schema validation fails silently, a rate limit error gets swallowed, or your agent burns through tokens on a loop, the Mistral API response gives you an error code but no execution timeline. Here's how to wrap Mistral chat completions in Nexus traces and get full span-level observability.

How Mistral function-calling agents work

Mistral AI’s function-calling API follows the same basic pattern as OpenAI’s tool use, but with its own conventions around tool_choice and how tool results are fed back into the conversation. An agent loop looks like this:

Send a chat completion request with tools defined and tool_choice="auto"
If finish_reason == "tool_calls", extract the tool_calls from the response
Execute each function, collect outputs
Append the assistant message and tool result messages to the conversation
Repeat until finish_reason == "stop"

Failures can happen at any step: the model may generate a tool call with arguments that don’t match your schema, a function handler may throw, the rate limiter may reject a request mid-loop, or the agent may loop more times than expected before reaching stop. Without instrumentation, all you see is the final error or an unexpected response — not where the breakdown happened.

Wrapping Mistral chat completions in Nexus traces

The pattern: open a Nexus trace at the start of the agent loop, record each completion call as a span (including token counts), record each function execution as a child span, then close the trace when the loop terminates.

import os
import time
import json
from mistralai import Mistral
import requests

MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]
NEXUS_API_KEY = os.environ["NEXUS_API_KEY"]
NEXUS_BASE = "https://nexus.keylightdigital.dev"

client = Mistral(api_key=MISTRAL_API_KEY)

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    }
]

def run_agent_with_trace(user_message: str) -> dict:
    t0 = time.time()
    messages = [{"role": "user", "content": user_message}]

    # Open a Nexus trace for the full agent loop
    trace_res = requests.post(
        f"{NEXUS_BASE}/api/traces",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "name": "mistral-agent:get_weather",
            "input": user_message,
            "metadata": {
                "model": "mistral-large-latest",
                "platform": "mistral-ai",
            },
        },
    )
    trace_id = trace_res.json()["traceId"]

    total_prompt_tokens = 0
    total_completion_tokens = 0
    loop_count = 0
    final_answer = ""

    try:
        while loop_count < 10:
            loop_count += 1
            call_t0 = time.time()

            response = client.chat.complete(
                model="mistral-large-latest",
                messages=messages,
                tools=TOOLS,
                tool_choice="auto",
            )

            usage = response.usage
            total_prompt_tokens += usage.prompt_tokens
            total_completion_tokens += usage.completion_tokens

            # Record the completion call as a span
            span_res = requests.post(
                f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "name": f"chat.complete:loop_{loop_count}",
                    "input": messages[-1]["content"] if messages else "",
                    "metadata": {
                        "model": "mistral-large-latest",
                        "loop": loop_count,
                        "prompt_tokens": usage.prompt_tokens,
                        "completion_tokens": usage.completion_tokens,
                        "finish_reason": response.choices[0].finish_reason,
                    },
                },
            )
            span_id = span_res.json().get("spanId", "")

            choice = response.choices[0]
            assistant_msg = choice.message

            if choice.finish_reason == "stop":
                final_answer = assistant_msg.content or ""
                requests.post(
                    f"{NEXUS_BASE}/api/spans/{span_id}/end",
                    headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                    json={
                        "status": "success",
                        "output": final_answer[:500],
                        "latency_ms": int((time.time() - call_t0) * 1000),
                    },
                )
                break

            # Process tool calls
            messages.append({"role": "assistant", "content": assistant_msg.content, "tool_calls": [
                {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
                for tc in (assistant_msg.tool_calls or [])
            ]})

            tool_results = execute_tool_calls(assistant_msg.tool_calls or [], trace_id)
            for result in tool_results:
                messages.append({
                    "role": "tool",
                    "tool_call_id": result["tool_call_id"],
                    "content": result["output"],
                })

            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "success",
                    "output": f"tool_calls:{len(assistant_msg.tool_calls or [])}",
                    "latency_ms": int((time.time() - call_t0) * 1000),
                },
            )

    except Exception as err:
        requests.post(
            f"{NEXUS_BASE}/api/traces/{trace_id}/end",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "status": "error",
                "error": str(err),
                "latency_ms": int((time.time() - t0) * 1000),
                "metadata": {
                    "total_prompt_tokens": total_prompt_tokens,
                    "total_completion_tokens": total_completion_tokens,
                    "loop_count": loop_count,
                },
            },
        )
        raise

    # Close the parent trace with cumulative token usage
    requests.post(
        f"{NEXUS_BASE}/api/traces/{trace_id}/end",
        headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
        json={
            "status": "success",
            "output": final_answer[:500],
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {
                "total_prompt_tokens": total_prompt_tokens,
                "total_completion_tokens": total_completion_tokens,
                "loop_count": loop_count,
                "model": "mistral-large-latest",
            },
        },
    )

    return {"answer": final_answer, "trace_id": trace_id}

Recording tool call spans

Each function the model invokes should get its own span with the arguments it received and the output it returned. This gives you the granularity to debug schema validation errors and see exactly which function call caused a failure.

def execute_tool_calls(tool_calls: list, trace_id: str) -> list:
    """Execute each tool call and record a Nexus span per function."""
    results = []

    for tc in tool_calls:
        func_name = tc.function.name
        t0 = time.time()

        # Parse arguments — may fail if model generates invalid JSON
        try:
            args = json.loads(tc.function.arguments)
        except json.JSONDecodeError as e:
            span_res = requests.post(
                f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "name": f"function_call:{func_name}",
                    "input": tc.function.arguments,
                    "metadata": {"tool_call_id": tc.id, "function_name": func_name},
                },
            )
            span_id = span_res.json().get("spanId", "")
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "error",
                    "error": f"JSONDecodeError: {e}",
                    "latency_ms": int((time.time() - t0) * 1000),
                },
            )
            results.append({"tool_call_id": tc.id, "output": f"error: invalid JSON arguments: {e}"})
            continue

        span_res = requests.post(
            f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
            headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
            json={
                "name": f"function_call:{func_name}",
                "input": json.dumps(args),
                "metadata": {
                    "tool_call_id": tc.id,
                    "function_name": func_name,
                    "arg_keys": list(args.keys()),
                },
            },
        )
        span_id = span_res.json().get("spanId", "")

        try:
            output = dispatch_function(func_name, args)
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "success",
                    "output": str(output)[:500],
                    "latency_ms": int((time.time() - t0) * 1000),
                },
            )
            results.append({"tool_call_id": tc.id, "output": str(output)})
        except Exception as err:
            requests.post(
                f"{NEXUS_BASE}/api/spans/{span_id}/end",
                headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
                json={
                    "status": "error",
                    "error": str(err),
                    "latency_ms": int((time.time() - t0) * 1000),
                },
            )
            results.append({"tool_call_id": tc.id, "output": f"error: {err}"})

    return results

Monitoring token costs: prompt vs completion tokens

Mistral AI pricing charges per million tokens, with prompt tokens and completion tokens billed at different rates depending on the model. For function-calling agents, prompt tokens grow with each loop iteration as the conversation history accumulates — making cost awareness critical for long-running agents.

Recording cumulative token usage on the parent trace lets you identify expensive traces at a glance and set budget thresholds:

# Token cost estimates per million tokens (as of early 2026)
MISTRAL_COSTS = {
    "mistral-large-latest": {"prompt": 2.00, "completion": 6.00},
    "mistral-small-latest": {"prompt": 0.20, "completion": 0.60},
}

def estimate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    """Return estimated USD cost for a Mistral API call."""
    rates = MISTRAL_COSTS.get(model, {"prompt": 0.0, "completion": 0.0})
    return (prompt_tokens * rates["prompt"] + completion_tokens * rates["completion"]) / 1_000_000

# In your trace close call, include estimated cost:
estimated_cost_usd = estimate_cost(
    "mistral-large-latest",
    total_prompt_tokens,
    total_completion_tokens,
)
requests.post(
    f"{NEXUS_BASE}/api/traces/{trace_id}/end",
    headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
    json={
        "status": "success",
        "output": final_answer[:500],
        "latency_ms": int((time.time() - t0) * 1000),
        "metadata": {
            "total_prompt_tokens": total_prompt_tokens,
            "total_completion_tokens": total_completion_tokens,
            "estimated_cost_usd": round(estimated_cost_usd, 6),
            "loop_count": loop_count,
        },
    },
)

With estimated_cost_usd in your trace metadata, you can sort traces by cost in the Nexus dashboard and immediately spot the outliers: agents looping more than expected, long conversation histories inflating prompt tokens, or model size mismatches (using mistral-large for tasks that mistral-small handles equally well).

Common failure patterns and how to spot them

Tool schema validation errors

Mistral sometimes generates tool call arguments that don’t match your JSON schema — missing required fields, wrong types, or extra properties. These show up as JSONDecodeError or validation errors in your function handler, not as an API-level error.

In your traces, these look like: a chat.complete span with finish_reason: tool_calls followed immediately by a function_call span with status: error. If you see this pattern on a specific function across many traces, your schema description likely needs clarification or stricter typing.

Rate limit errors

Mistral’s rate limits apply per API key at the request and token level. A rate limit error mid-loop surfaces as an exception in your client.chat.complete call. Handle it with exponential backoff and record the retry in your span metadata:

import time
from mistralai.models.sdkerror import SDKError

def chat_with_retry(messages: list, tools: list, model: str, max_retries: int = 3):
    """Call Mistral chat complete with exponential backoff on rate limits."""
    for attempt in range(max_retries):
        try:
            return client.chat.complete(
                model=model,
                messages=messages,
                tools=tools,
                tool_choice="auto",
            )
        except SDKError as e:
            if e.status_code == 429 and attempt < max_retries - 1:
                wait = 2 ** attempt
                time.sleep(wait)
                continue
            raise  # re-raise on final attempt or non-429 errors

Record the retry count and wait time in your span metadata so you can correlate rate limit events with time-of-day or request volume patterns.

Agent loops that don’t terminate

An agent that loops more than 4–5 times without reaching finish_reason: stop is almost always stuck in one of two failure modes: the function is returning an output the model can’t interpret, or the tool schema description is ambiguous and the model keeps generating subtly different arguments hoping for a different result.

Your loop_count metadata field makes this visible immediately: sort traces by loop_count descending and inspect the span timeline for the high-count outliers.

TypeScript wrapper for Node.js apps

If your backend is TypeScript, the same pattern works with the official Mistral TypeScript SDK:

import Mistral from '@mistralai/mistralai'

const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY! })
const NEXUS_API_KEY = process.env.NEXUS_API_KEY!
const NEXUS_BASE = 'https://nexus.keylightdigital.dev'

const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'get_weather',
      description: 'Get current weather for a city',
      parameters: {
        type: 'object',
        properties: {
          city: { type: 'string', description: 'City name' },
          units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
        },
        required: ['city'],
      },
    },
  },
]

async function runMistralAgent(userMessage: string): Promise<{ answer: string; traceId: string }> {
  const t0 = Date.now()
  const messages: any[] = [{ role: 'user', content: userMessage }]

  const traceRes = await fetch(`${NEXUS_BASE}/api/traces`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      name: 'mistral-agent:get_weather',
      input: userMessage,
      metadata: { model: 'mistral-large-latest', platform: 'mistral-ai' },
    }),
  })
  const { traceId } = await traceRes.json()

  let totalPromptTokens = 0
  let totalCompletionTokens = 0
  let loopCount = 0
  let finalAnswer = ''

  while (loopCount < 10) {
    loopCount++
    const callT0 = Date.now()

    const response = await client.chat.complete({
      model: 'mistral-large-latest',
      messages,
      tools,
      toolChoice: 'auto',
    })

    const usage = response.usage!
    totalPromptTokens += usage.promptTokens
    totalCompletionTokens += usage.completionTokens

    const choice = response.choices![0]
    const assistantMsg = choice.message

    const spanRes = await fetch(`${NEXUS_BASE}/api/traces/${traceId}/spans`, {
      method: 'POST',
      headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({
        name: `chat.complete:loop_${loopCount}`,
        input: messages[messages.length - 1]?.content ?? '',
        metadata: {
          loop: loopCount,
          promptTokens: usage.promptTokens,
          completionTokens: usage.completionTokens,
          finishReason: choice.finishReason,
        },
      }),
    })
    const { spanId } = await spanRes.json()

    if (choice.finishReason === 'stop') {
      finalAnswer = typeof assistantMsg.content === 'string' ? assistantMsg.content : ''
      await fetch(`${NEXUS_BASE}/api/spans/${spanId}/end`, {
        method: 'POST',
        headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
        body: JSON.stringify({ status: 'success', output: finalAnswer.slice(0, 500), latency_ms: Date.now() - callT0 }),
      })
      break
    }

    const toolCalls = assistantMsg.toolCalls ?? []
    messages.push({ role: 'assistant', content: assistantMsg.content, toolCalls })

    for (const tc of toolCalls) {
      const funcName = tc.function.name
      const args = JSON.parse(tc.function.arguments as string)
      const output = await dispatchFunction(funcName, args)
      messages.push({ role: 'tool', toolCallId: tc.id, content: JSON.stringify(output) })
    }

    await fetch(`${NEXUS_BASE}/api/spans/${spanId}/end`, {
      method: 'POST',
      headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ status: 'success', output: `tool_calls:${toolCalls.length}`, latency_ms: Date.now() - callT0 }),
    })
  }

  await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({
      status: 'success',
      output: finalAnswer.slice(0, 500),
      latency_ms: Date.now() - t0,
      metadata: { totalPromptTokens, totalCompletionTokens, loopCount, model: 'mistral-large-latest' },
    }),
  })

  return { answer: finalAnswer, traceId }
}

Choosing between mistral-large and mistral-small

One of the most actionable decisions trace data supports is model rightsizing. mistral-large costs 10× more per token than mistral-small, but for agents with simple, well-defined tools, the smaller model often reaches stop in the same number of loops with comparable output quality.

A/B test across models by logging the model name in each trace’s metadata:

Compare loop_count across model variants — if both reach stop in the same number of turns, the smaller model is equivalent for this task
Compare function_call error rates — if the smaller model generates more schema mismatches, your tool descriptions may need more precision
Compare estimated_cost_usd — the delta is the cost of the quality difference, if any

What to monitor in production

Once traces are flowing from your Mistral agents, these four metrics give you the most actionable signal:

Function call error rate by function name: Filter function_call:* spans with status: error. High error rates on a specific function usually mean a schema description issue or a broken handler.
Mean loop count per trace: A healthy single-tool agent should average 2 loops (one tool call loop + one stop loop). Loop counts above 4 indicate the agent is struggling — check function outputs and tool descriptions.
Prompt token growth across loops: In multi-turn agents, prompt tokens grow with each loop. If a P95 trace is spending 80% of its tokens on conversation history, add a summarization step before the context window fills.
Rate limit retry frequency: Track spans with retries > 0 in metadata. Consistent rate limiting at specific hours points to a quota upgrade or request-batching opportunity.

Next steps

Mistral AI’s function-calling API is a lightweight alternative to OpenAI for developers who want competitive reasoning capability at lower cost. Instrumenting each completion call, each tool span, and cumulative token usage gives you the data to debug failures, optimize costs, and choose the right model tier for your workload. Sign up for a free Nexus account to start capturing traces from your Mistral AI agents today.