Tracing Flowise Chatflows: Observability for No-Code AI Agent Workflows

Flowise lets you build AI chatflows visually by connecting LangChain nodes in a drag-and-drop UI — but when a chatflow returns a wrong answer, a custom tool node throws silently, or a production chatflow starts hallucinating, Flowise's built-in logs don't tell you which node failed or why. Here's how to add full trace observability to Flowise chatflows using Nexus.

What Flowise is

Flowise is an open-source drag-and-drop UI for building LangChain-powered AI chatbots and agent workflows. Instead of writing Python or TypeScript directly, you connect nodes in a visual canvas — LLM nodes, memory nodes, retriever nodes, tool nodes — and Flowise handles the LangChain wiring underneath. The result is a chatflow: a reusable AI workflow you can call via a simple REST API.

A typical Flowise chatflow for a RAG-backed customer support bot looks like this:

Chat Model node — configured with your OpenAI or Anthropic API key and model
Retriever node — pulls relevant documents from a connected vector store
Memory node — maintains conversation history across turns
Tool nodes — custom JavaScript or Python functions the LLM can call
Conversational Retrieval QA Chain — wires it all together into a chat interface

Flowise is popular for internal tools, customer support bots, and rapid prototyping because you can stand up a working chatflow in minutes without touching LangChain’s API directly. That speed comes with a tradeoff: Flowise’s built-in logging is minimal, and production failures are hard to diagnose.

Observability blind spots in Flowise chatflows

Flowise shows you a request log in its admin UI, but three failure modes are invisible without external instrumentation:

Silent tool failures: Custom tool nodes that throw a JavaScript exception return an empty string to the LLM instead of surfacing the error. The LLM then invents an answer rather than admitting the tool failed. Without a span recording tool.status: error, these failures are invisible.
Retrieval quality drift: When your vector store embeddings age or your document corpus changes, retrieval quality drops silently. The chatflow keeps running and returning answers — they’re just wrong. You won’t catch this from Flowise’s logs alone.
Latency attribution: Flowise returns total request latency but doesn’t tell you whether the bottleneck is your retriever, your LLM call, or a slow custom tool. Without per-step timing, you can’t optimize the right thing.

Tracing Flowise chatflow API calls

The cleanest Flowise instrumentation pattern is a client-side wrapper: instead of calling the Flowise /api/v1/prediction/{chatflowId} endpoint directly from your application, you wrap the call with a Nexus trace. This gives you latency, error rate, and metadata for every chatflow invocation without modifying the Flowise server.

Install the dependencies:

pip install requests nexus-sdk

Here is a complete Python wrapper:

import os
import time
import requests
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

FLOWISE_URL = os.environ["FLOWISE_URL"]  # e.g. http://localhost:3000
CHATFLOW_ID = os.environ["CHATFLOW_ID"]  # from Flowise admin UI

def ask_chatflow(question: str, session_id: str) -> str:
    """Call a Flowise chatflow with full Nexus trace instrumentation."""
    trace = nexus.start_trace({
        "agent_id": f"flowise-{CHATFLOW_ID}",
        "name": f"chatflow: {question[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "session_id": session_id,
            "question": question[:300],
            "chatflow_id": CHATFLOW_ID,
        },
    })
    trace_id = trace["trace_id"]
    t0 = time.time()

    try:
        response = requests.post(
            f"{FLOWISE_URL}/api/v1/prediction/{CHATFLOW_ID}",
            json={"question": question, "overrideConfig": {"sessionId": session_id}},
            timeout=30,
        )
        response.raise_for_status()
        elapsed_ms = int((time.time() - t0) * 1000)

        result = response.json()
        answer = result.get("text", "")

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "answer_length": len(answer),
                "source_documents": len(result.get("sourceDocuments", [])),
            },
        })
        return answer

    except requests.HTTPError as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}",
        })
        raise
    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

Every chatflow call now produces a Nexus trace with the question, latency, answer length, and number of source documents retrieved. You can filter by source_documents: 0 to find queries where retrieval returned nothing — a leading indicator of hallucination.

Adding spans for individual steps

If your application calls multiple Flowise chatflows in sequence — for example, a router chatflow that classifies intent followed by a specialist chatflow — you can model each step as a span within a parent trace:

def handle_user_message(message: str, session_id: str) -> str:
    """Route to the right chatflow and record the full pipeline as one trace."""
    trace = nexus.start_trace({
        "agent_id": "flowise-router",
        "name": f"pipeline: {message[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {"session_id": session_id},
    })
    trace_id = trace["trace_id"]
    t0 = time.time()

    try:
        # Step 1: classify intent
        t_classify = time.time()
        classification = call_chatflow(CLASSIFIER_FLOW_ID, message)
        intent = classification.get("text", "general").strip().lower()
        nexus.add_span(trace_id, {
            "name": "step:intent_classification",
            "started_at": nexus.now(),
            "status": "success",
            "latency_ms": int((time.time() - t_classify) * 1000),
            "metadata": {"intent": intent, "chatflow_id": CLASSIFIER_FLOW_ID},
        })

        # Step 2: route to specialist chatflow
        t_specialist = time.time()
        specialist_id = SPECIALIST_FLOWS.get(intent, DEFAULT_FLOW_ID)
        result = call_chatflow(specialist_id, message, session_id)
        answer = result.get("text", "")
        nexus.add_span(trace_id, {
            "name": "step:specialist_response",
            "started_at": nexus.now(),
            "status": "success",
            "latency_ms": int((time.time() - t_specialist) * 1000),
            "metadata": {
                "intent": intent,
                "chatflow_id": specialist_id,
                "answer_length": len(answer),
                "source_documents": len(result.get("sourceDocuments", [])),
            },
        })

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {"intent": intent, "answer_length": len(answer)},
        })
        return answer

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise


def call_chatflow(chatflow_id: str, question: str, session_id: str = "") -> dict:
    response = requests.post(
        f"{FLOWISE_URL}/api/v1/prediction/{chatflow_id}",
        json={"question": question, "overrideConfig": {"sessionId": session_id}},
        timeout=30,
    )
    response.raise_for_status()
    return response.json()

Tracing custom tool nodes

Flowise lets you define custom tools as JavaScript functions that the LLM can call. These run inside the Flowise server process — not your application — so you can’t wrap them with the Python SDK. Instead, use the Nexus REST API directly from the tool function to add a span to the active trace.

The pattern requires passing the Nexus traceId into Flowise via the overrideConfig field, then reading it inside the tool:

// Flowise Custom Tool: fetch_order_status
// Add this JavaScript in the Flowise "Custom Tool" node

const NEXUS_API_KEY = $env.NEXUS_API_KEY;
const NEXUS_BASE_URL = "https://nexus.keylightdigital.dev";

async function fetchOrderStatus(orderId) {
  const traceId = $vars.nexusTraceId; // passed via overrideConfig.vars
  const t0 = Date.now();

  try {
    // Your actual tool logic
    const response = await fetch(`https://api.yourstore.com/orders/${orderId}`, {
      headers: { Authorization: `Bearer ${$env.STORE_API_KEY}` },
    });

    if (!response.ok) {
      throw new Error(`Order API returned ${response.status}`);
    }

    const order = await response.json();
    const latencyMs = Date.now() - t0;

    // Record the tool call as a Nexus span
    if (traceId) {
      await fetch(`${NEXUS_BASE_URL}/v1/traces/${traceId}/spans`, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "Authorization": `Bearer ${NEXUS_API_KEY}`,
        },
        body: JSON.stringify({
          name: "tool:fetch_order_status",
          started_at: new Date(t0).toISOString(),
          status: "success",
          latency_ms: latencyMs,
          metadata: {
            order_id: orderId,
            order_status: order.status,
            tool: "fetch_order_status",
          },
        }),
      });
    }

    return JSON.stringify({ status: order.status, updated_at: order.updatedAt });

  } catch (err) {
    const latencyMs = Date.now() - t0;

    if (traceId) {
      await fetch(`${NEXUS_BASE_URL}/v1/traces/${traceId}/spans`, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "Authorization": `Bearer ${NEXUS_API_KEY}`,
        },
        body: JSON.stringify({
          name: "tool:fetch_order_status",
          started_at: new Date(t0).toISOString(),
          status: "error",
          latency_ms: latencyMs,
          error: err.message,
          metadata: { order_id: orderId, tool: "fetch_order_status" },
        }),
      });
    }

    return JSON.stringify({ error: "Could not retrieve order status." });
  }
}

return await fetchOrderStatus($input.orderId);

Pass the trace ID from your application when starting the chatflow call:

def ask_chatflow_with_tool_tracing(question: str, session_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": f"flowise-{CHATFLOW_ID}",
        "name": f"chatflow: {question[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {"session_id": session_id},
    })
    trace_id = trace["trace_id"]
    t0 = time.time()

    try:
        response = requests.post(
            f"{FLOWISE_URL}/api/v1/prediction/{CHATFLOW_ID}",
            json={
                "question": question,
                "overrideConfig": {
                    "sessionId": session_id,
                    "vars": {"nexusTraceId": trace_id},  # passed to tool nodes
                },
            },
            timeout=30,
        )
        response.raise_for_status()
        result = response.json()
        answer = result.get("text", "")

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {"answer_length": len(answer)},
        })
        return answer

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

With this pattern, a Nexus trace for a chatflow call shows both the top-level latency and individual spans for every tool the LLM invoked — including the tool status (success or error) and tool-specific metadata like order IDs or search queries.

TypeScript equivalent

If your application is TypeScript-based (Next.js, Express, Hono), the same wrapper pattern applies using the Nexus TypeScript SDK:

import { NexusClient } from 'nexus-sdk'

const nexus = new NexusClient({ apiKey: process.env.NEXUS_API_KEY! })
const FLOWISE_URL = process.env.FLOWISE_URL!
const CHATFLOW_ID = process.env.CHATFLOW_ID!

export async function askChatflow(question: string, sessionId: string): Promise<string> {
  const trace = await nexus.startTrace({
    agentId: `flowise-${CHATFLOW_ID}`,
    name: `chatflow: ${question.slice(0, 60)}`,
    status: 'running',
    startedAt: nexus.now(),
    metadata: { sessionId, question: question.slice(0, 300), chatflowId: CHATFLOW_ID },
  })
  const traceId = trace.traceId
  const t0 = Date.now()

  try {
    const res = await fetch(`${FLOWISE_URL}/api/v1/prediction/${CHATFLOW_ID}`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        question,
        overrideConfig: {
          sessionId,
          vars: { nexusTraceId: traceId },
        },
      }),
    })

    if (!res.ok) {
      const text = await res.text()
      throw new Error(`HTTP ${res.status}: ${text.slice(0, 200)}`)
    }

    const result = await res.json()
    const answer: string = result.text ?? ''
    const latencyMs = Date.now() - t0

    await nexus.endTrace(traceId, {
      status: 'success',
      latencyMs,
      metadata: {
        answerLength: answer.length,
        sourceDocuments: (result.sourceDocuments ?? []).length,
      },
    })

    return answer

  } catch (err) {
    await nexus.endTrace(traceId, {
      status: 'error',
      latencyMs: Date.now() - t0,
      error: err instanceof Error ? err.message : String(err),
    })
    throw err
  }
}

Debugging chatflow failures in production

Three failure patterns show up most often in production Flowise chatflows, and each has a distinct trace signature:

Retrieval returning zero documents: The chatflow answers from the LLM’s parametric knowledge rather than your vector store. Look for traces where source_documents: 0. If these correlate with low-quality answers, your retriever query isn’t matching your corpus — check embedding model consistency and document chunking strategy.
Tool timeouts: A custom tool that calls an external API can hang if that API is slow. Flowise doesn’t surface tool-level timeouts in its UI. With Nexus spans from inside the tool, you can see tool:fetch_order_status latency spikes that explain why the overall chatflow response was slow.
Rate limit cascades: When your chatflow hits an OpenAI rate limit, the entire request fails with a 429. The trace status will be error with the rate limit message in the error field. A burst of rate limit errors at the same timestamp usually indicates a traffic spike — add retry logic with exponential backoff on the Flowise call.

# Detect zero-retrieval traces for quality monitoring
def ask_chatflow_with_quality_check(question: str, session_id: str) -> dict:
    trace = nexus.start_trace({
        "agent_id": f"flowise-{CHATFLOW_ID}",
        "name": f"chatflow: {question[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {"session_id": session_id, "question": question[:300]},
    })
    trace_id = trace["trace_id"]
    t0 = time.time()

    try:
        response = requests.post(
            f"{FLOWISE_URL}/api/v1/prediction/{CHATFLOW_ID}",
            json={"question": question, "overrideConfig": {"sessionId": session_id}},
            timeout=30,
        )
        response.raise_for_status()
        result = response.json()

        answer = result.get("text", "")
        source_docs = result.get("sourceDocuments", [])
        grounded = len(source_docs) > 0
        quality_warning = not grounded or len(answer.strip()) < 20

        nexus.end_trace(trace_id, {
            "status": "warning" if quality_warning else "success",
            "latency_ms": int((time.time() - t0) * 1000),
            "metadata": {
                "answer_length": len(answer),
                "source_documents": len(source_docs),
                "grounded": grounded,
                "quality_warning": quality_warning,
                "zero_retrieval": not grounded,
            },
        })
        return {"answer": answer, "grounded": grounded}

    except requests.Timeout:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": "chatflow request timed out after 30s",
        })
        raise
    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

What to monitor in production

Once traces are flowing from your Flowise integration, three metrics are most actionable:

Zero-retrieval rate: The percentage of chatflow calls where source_documents: 0. A zero-retrieval rate above 10% usually means your embedding index is stale or your chunk size is wrong. Alert on a spike in this metric before users start complaining.
Tool error rate: How often custom tool spans come back with status: error. Tool errors that the LLM hides (by returning a graceful fallback answer) are dangerous because they mask broken integrations. Set a webhook alert when tool error rate exceeds 5% over a 1-hour window.
P95 chatflow latency: Flowise chatflows involve at least two round trips (retrieval + LLM call), so latency is typically 1–5s. If P95 creeps above 8s, users experience noticeable delays. Trace the bottleneck by looking at which step has the longest latency_ms in your spans.

Next steps

Flowise’s visual UI is excellent for building chatflows quickly — but production observability requires instrumentation at the API boundary and inside custom tool nodes. The client-side wrapper gives you end-to-end latency and error rate with a few lines of code. Adding the Nexus REST API call inside tool nodes gives you the tool-level visibility you need to debug silent failures. Sign up for a free Nexus account to start capturing traces from your Flowise chatflows today.