Monitoring Mistral AI Agents: Tracing Function Calls, Token Costs, and Rate Limits
Mistral AI's function-calling API lets you build agents that route between tools using mistral-large or mistral-small. When a tool schema validation fails silently, a rate limit error gets swallowed, or your agent burns through tokens on a loop, the Mistral API response gives you an error code but no execution timeline. Here's how to wrap Mistral chat completions in Nexus traces and get full span-level observability.
How Mistral function-calling agents work
Mistral AI’s function-calling API follows the same basic pattern as OpenAI’s tool use, but with its own conventions around tool_choice and how tool results are fed back into the conversation. An agent loop looks like this:
- Send a chat completion request with
toolsdefined andtool_choice="auto" - If
finish_reason == "tool_calls", extract thetool_callsfrom the response - Execute each function, collect outputs
- Append the assistant message and tool result messages to the conversation
- Repeat until
finish_reason == "stop"
Failures can happen at any step: the model may generate a tool call with arguments that don’t match your schema, a function handler may throw, the rate limiter may reject a request mid-loop, or the agent may loop more times than expected before reaching stop. Without instrumentation, all you see is the final error or an unexpected response — not where the breakdown happened.
Wrapping Mistral chat completions in Nexus traces
The pattern: open a Nexus trace at the start of the agent loop, record each completion call as a span (including token counts), record each function execution as a child span, then close the trace when the loop terminates.
import os
import time
import json
from mistralai import Mistral
import requests
MISTRAL_API_KEY = os.environ["MISTRAL_API_KEY"]
NEXUS_API_KEY = os.environ["NEXUS_API_KEY"]
NEXUS_BASE = "https://nexus.keylightdigital.dev"
client = Mistral(api_key=MISTRAL_API_KEY)
TOOLS = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["city"],
},
},
}
]
def run_agent_with_trace(user_message: str) -> dict:
t0 = time.time()
messages = [{"role": "user", "content": user_message}]
# Open a Nexus trace for the full agent loop
trace_res = requests.post(
f"{NEXUS_BASE}/api/traces",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"name": "mistral-agent:get_weather",
"input": user_message,
"metadata": {
"model": "mistral-large-latest",
"platform": "mistral-ai",
},
},
)
trace_id = trace_res.json()["traceId"]
total_prompt_tokens = 0
total_completion_tokens = 0
loop_count = 0
final_answer = ""
try:
while loop_count < 10:
loop_count += 1
call_t0 = time.time()
response = client.chat.complete(
model="mistral-large-latest",
messages=messages,
tools=TOOLS,
tool_choice="auto",
)
usage = response.usage
total_prompt_tokens += usage.prompt_tokens
total_completion_tokens += usage.completion_tokens
# Record the completion call as a span
span_res = requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"name": f"chat.complete:loop_{loop_count}",
"input": messages[-1]["content"] if messages else "",
"metadata": {
"model": "mistral-large-latest",
"loop": loop_count,
"prompt_tokens": usage.prompt_tokens,
"completion_tokens": usage.completion_tokens,
"finish_reason": response.choices[0].finish_reason,
},
},
)
span_id = span_res.json().get("spanId", "")
choice = response.choices[0]
assistant_msg = choice.message
if choice.finish_reason == "stop":
final_answer = assistant_msg.content or ""
requests.post(
f"{NEXUS_BASE}/api/spans/{span_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "success",
"output": final_answer[:500],
"latency_ms": int((time.time() - call_t0) * 1000),
},
)
break
# Process tool calls
messages.append({"role": "assistant", "content": assistant_msg.content, "tool_calls": [
{"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
for tc in (assistant_msg.tool_calls or [])
]})
tool_results = execute_tool_calls(assistant_msg.tool_calls or [], trace_id)
for result in tool_results:
messages.append({
"role": "tool",
"tool_call_id": result["tool_call_id"],
"content": result["output"],
})
requests.post(
f"{NEXUS_BASE}/api/spans/{span_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "success",
"output": f"tool_calls:{len(assistant_msg.tool_calls or [])}",
"latency_ms": int((time.time() - call_t0) * 1000),
},
)
except Exception as err:
requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "error",
"error": str(err),
"latency_ms": int((time.time() - t0) * 1000),
"metadata": {
"total_prompt_tokens": total_prompt_tokens,
"total_completion_tokens": total_completion_tokens,
"loop_count": loop_count,
},
},
)
raise
# Close the parent trace with cumulative token usage
requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "success",
"output": final_answer[:500],
"latency_ms": int((time.time() - t0) * 1000),
"metadata": {
"total_prompt_tokens": total_prompt_tokens,
"total_completion_tokens": total_completion_tokens,
"loop_count": loop_count,
"model": "mistral-large-latest",
},
},
)
return {"answer": final_answer, "trace_id": trace_id}
Recording tool call spans
Each function the model invokes should get its own span with the arguments it received and the output it returned. This gives you the granularity to debug schema validation errors and see exactly which function call caused a failure.
def execute_tool_calls(tool_calls: list, trace_id: str) -> list:
"""Execute each tool call and record a Nexus span per function."""
results = []
for tc in tool_calls:
func_name = tc.function.name
t0 = time.time()
# Parse arguments — may fail if model generates invalid JSON
try:
args = json.loads(tc.function.arguments)
except json.JSONDecodeError as e:
span_res = requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"name": f"function_call:{func_name}",
"input": tc.function.arguments,
"metadata": {"tool_call_id": tc.id, "function_name": func_name},
},
)
span_id = span_res.json().get("spanId", "")
requests.post(
f"{NEXUS_BASE}/api/spans/{span_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "error",
"error": f"JSONDecodeError: {e}",
"latency_ms": int((time.time() - t0) * 1000),
},
)
results.append({"tool_call_id": tc.id, "output": f"error: invalid JSON arguments: {e}"})
continue
span_res = requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/spans",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"name": f"function_call:{func_name}",
"input": json.dumps(args),
"metadata": {
"tool_call_id": tc.id,
"function_name": func_name,
"arg_keys": list(args.keys()),
},
},
)
span_id = span_res.json().get("spanId", "")
try:
output = dispatch_function(func_name, args)
requests.post(
f"{NEXUS_BASE}/api/spans/{span_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "success",
"output": str(output)[:500],
"latency_ms": int((time.time() - t0) * 1000),
},
)
results.append({"tool_call_id": tc.id, "output": str(output)})
except Exception as err:
requests.post(
f"{NEXUS_BASE}/api/spans/{span_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "error",
"error": str(err),
"latency_ms": int((time.time() - t0) * 1000),
},
)
results.append({"tool_call_id": tc.id, "output": f"error: {err}"})
return results
Monitoring token costs: prompt vs completion tokens
Mistral AI pricing charges per million tokens, with prompt tokens and completion tokens billed at different rates depending on the model. For function-calling agents, prompt tokens grow with each loop iteration as the conversation history accumulates — making cost awareness critical for long-running agents.
Recording cumulative token usage on the parent trace lets you identify expensive traces at a glance and set budget thresholds:
# Token cost estimates per million tokens (as of early 2026)
MISTRAL_COSTS = {
"mistral-large-latest": {"prompt": 2.00, "completion": 6.00},
"mistral-small-latest": {"prompt": 0.20, "completion": 0.60},
}
def estimate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
"""Return estimated USD cost for a Mistral API call."""
rates = MISTRAL_COSTS.get(model, {"prompt": 0.0, "completion": 0.0})
return (prompt_tokens * rates["prompt"] + completion_tokens * rates["completion"]) / 1_000_000
# In your trace close call, include estimated cost:
estimated_cost_usd = estimate_cost(
"mistral-large-latest",
total_prompt_tokens,
total_completion_tokens,
)
requests.post(
f"{NEXUS_BASE}/api/traces/{trace_id}/end",
headers={"Authorization": f"Bearer {NEXUS_API_KEY}"},
json={
"status": "success",
"output": final_answer[:500],
"latency_ms": int((time.time() - t0) * 1000),
"metadata": {
"total_prompt_tokens": total_prompt_tokens,
"total_completion_tokens": total_completion_tokens,
"estimated_cost_usd": round(estimated_cost_usd, 6),
"loop_count": loop_count,
},
},
)
With estimated_cost_usd in your trace metadata, you can sort traces by cost in the Nexus dashboard and immediately spot the outliers: agents looping more than expected, long conversation histories inflating prompt tokens, or model size mismatches (using mistral-large for tasks that mistral-small handles equally well).
Common failure patterns and how to spot them
Tool schema validation errors
Mistral sometimes generates tool call arguments that don’t match your JSON schema — missing required fields, wrong types, or extra properties. These show up as JSONDecodeError or validation errors in your function handler, not as an API-level error.
In your traces, these look like: a chat.complete span with finish_reason: tool_calls followed immediately by a function_call span with status: error. If you see this pattern on a specific function across many traces, your schema description likely needs clarification or stricter typing.
Rate limit errors
Mistral’s rate limits apply per API key at the request and token level. A rate limit error mid-loop surfaces as an exception in your client.chat.complete call. Handle it with exponential backoff and record the retry in your span metadata:
import time
from mistralai.models.sdkerror import SDKError
def chat_with_retry(messages: list, tools: list, model: str, max_retries: int = 3):
"""Call Mistral chat complete with exponential backoff on rate limits."""
for attempt in range(max_retries):
try:
return client.chat.complete(
model=model,
messages=messages,
tools=tools,
tool_choice="auto",
)
except SDKError as e:
if e.status_code == 429 and attempt < max_retries - 1:
wait = 2 ** attempt
time.sleep(wait)
continue
raise # re-raise on final attempt or non-429 errors
Record the retry count and wait time in your span metadata so you can correlate rate limit events with time-of-day or request volume patterns.
Agent loops that don’t terminate
An agent that loops more than 4–5 times without reaching finish_reason: stop is almost always stuck in one of two failure modes: the function is returning an output the model can’t interpret, or the tool schema description is ambiguous and the model keeps generating subtly different arguments hoping for a different result.
Your loop_count metadata field makes this visible immediately: sort traces by loop_count descending and inspect the span timeline for the high-count outliers.
TypeScript wrapper for Node.js apps
If your backend is TypeScript, the same pattern works with the official Mistral TypeScript SDK:
import Mistral from '@mistralai/mistralai'
const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY! })
const NEXUS_API_KEY = process.env.NEXUS_API_KEY!
const NEXUS_BASE = 'https://nexus.keylightdigital.dev'
const tools = [
{
type: 'function' as const,
function: {
name: 'get_weather',
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: {
city: { type: 'string', description: 'City name' },
units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['city'],
},
},
},
]
async function runMistralAgent(userMessage: string): Promise<{ answer: string; traceId: string }> {
const t0 = Date.now()
const messages: any[] = [{ role: 'user', content: userMessage }]
const traceRes = await fetch(`${NEXUS_BASE}/api/traces`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
name: 'mistral-agent:get_weather',
input: userMessage,
metadata: { model: 'mistral-large-latest', platform: 'mistral-ai' },
}),
})
const { traceId } = await traceRes.json()
let totalPromptTokens = 0
let totalCompletionTokens = 0
let loopCount = 0
let finalAnswer = ''
while (loopCount < 10) {
loopCount++
const callT0 = Date.now()
const response = await client.chat.complete({
model: 'mistral-large-latest',
messages,
tools,
toolChoice: 'auto',
})
const usage = response.usage!
totalPromptTokens += usage.promptTokens
totalCompletionTokens += usage.completionTokens
const choice = response.choices![0]
const assistantMsg = choice.message
const spanRes = await fetch(`${NEXUS_BASE}/api/traces/${traceId}/spans`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
name: `chat.complete:loop_${loopCount}`,
input: messages[messages.length - 1]?.content ?? '',
metadata: {
loop: loopCount,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
finishReason: choice.finishReason,
},
}),
})
const { spanId } = await spanRes.json()
if (choice.finishReason === 'stop') {
finalAnswer = typeof assistantMsg.content === 'string' ? assistantMsg.content : ''
await fetch(`${NEXUS_BASE}/api/spans/${spanId}/end`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ status: 'success', output: finalAnswer.slice(0, 500), latency_ms: Date.now() - callT0 }),
})
break
}
const toolCalls = assistantMsg.toolCalls ?? []
messages.push({ role: 'assistant', content: assistantMsg.content, toolCalls })
for (const tc of toolCalls) {
const funcName = tc.function.name
const args = JSON.parse(tc.function.arguments as string)
const output = await dispatchFunction(funcName, args)
messages.push({ role: 'tool', toolCallId: tc.id, content: JSON.stringify(output) })
}
await fetch(`${NEXUS_BASE}/api/spans/${spanId}/end`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ status: 'success', output: `tool_calls:${toolCalls.length}`, latency_ms: Date.now() - callT0 }),
})
}
await fetch(`${NEXUS_BASE}/api/traces/${traceId}/end`, {
method: 'POST',
headers: { Authorization: `Bearer ${NEXUS_API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
status: 'success',
output: finalAnswer.slice(0, 500),
latency_ms: Date.now() - t0,
metadata: { totalPromptTokens, totalCompletionTokens, loopCount, model: 'mistral-large-latest' },
}),
})
return { answer: finalAnswer, traceId }
}
Choosing between mistral-large and mistral-small
One of the most actionable decisions trace data supports is model rightsizing. mistral-large costs 10× more per token than mistral-small, but for agents with simple, well-defined tools, the smaller model often reaches stop in the same number of loops with comparable output quality.
A/B test across models by logging the model name in each trace’s metadata:
- Compare loop_count across model variants — if both reach
stopin the same number of turns, the smaller model is equivalent for this task - Compare function_call error rates — if the smaller model generates more schema mismatches, your tool descriptions may need more precision
- Compare estimated_cost_usd — the delta is the cost of the quality difference, if any
What to monitor in production
Once traces are flowing from your Mistral agents, these four metrics give you the most actionable signal:
- Function call error rate by function name: Filter
function_call:*spans withstatus: error. High error rates on a specific function usually mean a schema description issue or a broken handler. - Mean loop count per trace: A healthy single-tool agent should average 2 loops (one tool call loop + one stop loop). Loop counts above 4 indicate the agent is struggling — check function outputs and tool descriptions.
- Prompt token growth across loops: In multi-turn agents, prompt tokens grow with each loop. If a P95 trace is spending 80% of its tokens on conversation history, add a summarization step before the context window fills.
- Rate limit retry frequency: Track spans with
retries > 0in metadata. Consistent rate limiting at specific hours points to a quota upgrade or request-batching opportunity.
Next steps
Mistral AI’s function-calling API is a lightweight alternative to OpenAI for developers who want competitive reasoning capability at lower cost. Instrumenting each completion call, each tool span, and cumulative token usage gives you the data to debug failures, optimize costs, and choose the right model tier for your workload. Sign up for a free Nexus account to start capturing traces from your Mistral AI agents today.
Add observability to Mistral AI agents
Free tier, no credit card required. Full trace visibility in under 5 minutes.