Observability for Microsoft Semantic Kernel Agents in Python

Microsoft Semantic Kernel gives you a structured way to build AI agents in Python with plugins, planners, and multi-model support. When a planner selects the wrong function, a plugin throws silently, or a kernel invocation spikes latency, you need trace visibility to diagnose it. Here's how to integrate Nexus into Semantic Kernel agents.

What Semantic Kernel adds

Microsoft Semantic Kernel is a production-ready SDK for building AI agents in Python (and C#/.NET). Its key abstractions are Kernel (the central orchestrator), Plugins (collections of functions exposed to the LLM), and Planners (which reason about which plugin functions to call to satisfy a goal).

This architecture introduces observability challenges that standard logging can’t solve:

Planner decisions are opaque: When the planner selects a sequence of plugin calls, there’s no built-in record of why — what prompt it generated, which plan it chose, or why it rejected alternatives.
Plugin errors are swallowed: A plugin function that raises an exception may cause the kernel to retry with a different plan rather than surfacing the error. Without spans, you can’t tell which plugin call failed.
Multi-step kernel invocations: A single kernel.invoke() call can trigger a chain of plugin executions and LLM calls. Latency attribution requires a span per step.
Multi-model configurations: SK supports routing different tasks to different models (e.g., GPT-4o for planning, GPT-3.5 for summarization). Without per-call model tracking, you can’t attribute cost or latency to specific model choices.

Tracing a basic kernel invocation

Install the SDK and wrap your kernel calls with Nexus traces:

pip install semantic-kernel nexus-sdk

import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

kernel = Kernel()
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4o",
        ai_model_id="gpt-4o",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)

async def invoke_with_tracing(prompt: str, user_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "semantic-kernel-agent",
        "name": f"kernel: {prompt[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "user_id": user_id,
            "prompt_length": len(prompt),
            "environment": os.environ.get("APP_ENV", "dev"),
        },
    })
    trace_id = trace["trace_id"]

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(prompt)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

Tracing plugin function calls

Plugins are the core building block in Semantic Kernel — each plugin is a Python class decorated with @kernel_function. Wrap plugin methods to emit a span per function execution:

from semantic_kernel.functions import kernel_function
from typing import Annotated

class SearchPlugin:
    def __init__(self, nexus_client, trace_id: str):
        self._nexus = nexus_client
        self._trace_id = trace_id

    @kernel_function(name="search_web", description="Search the web for information")
    def search_web(
        self,
        query: Annotated[str, "The search query"],
    ) -> Annotated[str, "Search results"]:
        t0 = time.time()
        try:
            # your search implementation
            results = _do_search(query)
            self._nexus.add_span(self._trace_id, {
                "name": "plugin:SearchPlugin.search_web",
                "status": "success",
                "latency_ms": int((time.time() - t0) * 1000),
                "metadata": {
                    "query": query[:120],
                    "result_count": len(results),
                },
            })
            return results
        except Exception as e:
            self._nexus.add_span(self._trace_id, {
                "name": "plugin:SearchPlugin.search_web",
                "status": "error",
                "latency_ms": int((time.time() - t0) * 1000),
                "error": str(e),
            })
            raise

Tracing planner decisions

Semantic Kernel’s planners (like FunctionChoiceBehavior with auto function calling) determine which plugin functions to invoke for a given goal. Wrapping the planning step in a span captures what plan was generated and how long planning took:

from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings

async def invoke_with_auto_planning(kernel, goal: str, trace_id: str) -> str:
    settings = OpenAIChatPromptExecutionSettings(
        service_id="gpt4o",
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    )

    t_plan = time.time()
    try:
        result = await kernel.invoke_prompt(
            goal,
            settings=settings,
        )
        plan_ms = int((time.time() - t_plan) * 1000)

        nexus.add_span(trace_id, {
            "name": "planner:auto_function_calling",
            "status": "success",
            "latency_ms": plan_ms,
            "metadata": {
                "goal": goal[:120],
                "output": str(result)[:300],
            },
        })
        return str(result)

    except Exception as e:
        nexus.add_span(trace_id, {
            "name": "planner:auto_function_calling",
            "status": "error",
            "latency_ms": int((time.time() - t_plan) * 1000),
            "error": str(e),
        })
        raise

Tracking multi-model routing

One of Semantic Kernel’s strengths is routing different tasks to different models. Record the service ID and model name as span metadata so you can attribute latency and cost per model:

kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4o",
        ai_model_id="gpt-4o",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt35",
        ai_model_id="gpt-3.5-turbo",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)

async def invoke_with_model_tracking(
    kernel, prompt: str, service_id: str, trace_id: str
) -> str:
    settings = OpenAIChatPromptExecutionSettings(service_id=service_id)

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(prompt, settings=settings)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.add_span(trace_id, {
            "name": f"llm:{service_id}",
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "service_id": service_id,
                "prompt_length": len(prompt),
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.add_span(trace_id, {
            "name": f"llm:{service_id}",
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
            "metadata": {"service_id": service_id},
        })
        raise

Full agent trace with plugins and planning

Putting it together: one trace per agent invocation, with spans for planning, each plugin call, and the final response:

import asyncio
import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

async def run_sk_agent(goal: str, user_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "sk-research-agent",
        "name": f"agent: {goal[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "user_id": user_id,
            "goal": goal[:200],
            "environment": os.environ.get("APP_ENV", "dev"),
        },
    })
    trace_id = trace["trace_id"]

    kernel = Kernel()
    kernel.add_service(
        OpenAIChatCompletion(
            service_id="gpt4o",
            ai_model_id="gpt-4o",
            api_key=os.environ["OPENAI_API_KEY"],
        )
    )

    # Register plugin with trace context injected
    search_plugin = SearchPlugin(nexus, trace_id)
    kernel.add_plugin(search_plugin, plugin_name="SearchPlugin")

    settings = OpenAIChatPromptExecutionSettings(
        service_id="gpt4o",
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    )

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(goal, settings=settings)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

What to watch for in production

Once traces are flowing, three failure patterns show up repeatedly in Semantic Kernel agents:

Planner over-selection: The auto function calling behavior calls more plugins than necessary, burning tokens on unnecessary tool calls. Look for traces where 4+ plugin spans fire for a simple query.
Plugin retry storms: A plugin that raises an exception on transient errors (network timeouts, rate limits) may be called multiple times before the kernel gives up. Span counts per plugin name reveal repeated calls.
Silent plan failures: The kernel may return an empty or malformed result without raising an exception when the planner can’t satisfy the goal with available plugins. Track output_length == 0 as a soft failure signal in trace metadata.

Next steps

Semantic Kernel is growing fast — Microsoft is actively adding new connectors, memory backends, and multi-agent patterns. The instrumentation approach here works regardless of which connector or planner you use, since it wraps the kernel invocation boundary rather than any specific internal API. Sign up for a free Nexus account to start capturing traces from your Semantic Kernel agents today.