2026-04-20 · 8 min read

Observability for Microsoft Semantic Kernel Agents in Python

Microsoft Semantic Kernel gives you a structured way to build AI agents in Python with plugins, planners, and multi-model support. When a planner selects the wrong function, a plugin throws silently, or a kernel invocation spikes latency, you need trace visibility to diagnose it. Here's how to integrate Nexus into Semantic Kernel agents.

What Semantic Kernel adds

Microsoft Semantic Kernel is a production-ready SDK for building AI agents in Python (and C#/.NET). Its key abstractions are Kernel (the central orchestrator), Plugins (collections of functions exposed to the LLM), and Planners (which reason about which plugin functions to call to satisfy a goal).

This architecture introduces observability challenges that standard logging can’t solve:

Tracing a basic kernel invocation

Install the SDK and wrap your kernel calls with Nexus traces:

pip install semantic-kernel nexus-sdk
import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

kernel = Kernel()
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4o",
        ai_model_id="gpt-4o",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)

async def invoke_with_tracing(prompt: str, user_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "semantic-kernel-agent",
        "name": f"kernel: {prompt[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "user_id": user_id,
            "prompt_length": len(prompt),
            "environment": os.environ.get("APP_ENV", "dev"),
        },
    })
    trace_id = trace["trace_id"]

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(prompt)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

Tracing plugin function calls

Plugins are the core building block in Semantic Kernel — each plugin is a Python class decorated with @kernel_function. Wrap plugin methods to emit a span per function execution:

from semantic_kernel.functions import kernel_function
from typing import Annotated

class SearchPlugin:
    def __init__(self, nexus_client, trace_id: str):
        self._nexus = nexus_client
        self._trace_id = trace_id

    @kernel_function(name="search_web", description="Search the web for information")
    def search_web(
        self,
        query: Annotated[str, "The search query"],
    ) -> Annotated[str, "Search results"]:
        t0 = time.time()
        try:
            # your search implementation
            results = _do_search(query)
            self._nexus.add_span(self._trace_id, {
                "name": "plugin:SearchPlugin.search_web",
                "status": "success",
                "latency_ms": int((time.time() - t0) * 1000),
                "metadata": {
                    "query": query[:120],
                    "result_count": len(results),
                },
            })
            return results
        except Exception as e:
            self._nexus.add_span(self._trace_id, {
                "name": "plugin:SearchPlugin.search_web",
                "status": "error",
                "latency_ms": int((time.time() - t0) * 1000),
                "error": str(e),
            })
            raise

Tracing planner decisions

Semantic Kernel’s planners (like FunctionChoiceBehavior with auto function calling) determine which plugin functions to invoke for a given goal. Wrapping the planning step in a span captures what plan was generated and how long planning took:

from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings

async def invoke_with_auto_planning(kernel, goal: str, trace_id: str) -> str:
    settings = OpenAIChatPromptExecutionSettings(
        service_id="gpt4o",
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    )

    t_plan = time.time()
    try:
        result = await kernel.invoke_prompt(
            goal,
            settings=settings,
        )
        plan_ms = int((time.time() - t_plan) * 1000)

        nexus.add_span(trace_id, {
            "name": "planner:auto_function_calling",
            "status": "success",
            "latency_ms": plan_ms,
            "metadata": {
                "goal": goal[:120],
                "output": str(result)[:300],
            },
        })
        return str(result)

    except Exception as e:
        nexus.add_span(trace_id, {
            "name": "planner:auto_function_calling",
            "status": "error",
            "latency_ms": int((time.time() - t_plan) * 1000),
            "error": str(e),
        })
        raise

Tracking multi-model routing

One of Semantic Kernel’s strengths is routing different tasks to different models. Record the service ID and model name as span metadata so you can attribute latency and cost per model:

kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt4o",
        ai_model_id="gpt-4o",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)
kernel.add_service(
    OpenAIChatCompletion(
        service_id="gpt35",
        ai_model_id="gpt-3.5-turbo",
        api_key=os.environ["OPENAI_API_KEY"],
    )
)

async def invoke_with_model_tracking(
    kernel, prompt: str, service_id: str, trace_id: str
) -> str:
    settings = OpenAIChatPromptExecutionSettings(service_id=service_id)

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(prompt, settings=settings)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.add_span(trace_id, {
            "name": f"llm:{service_id}",
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "service_id": service_id,
                "prompt_length": len(prompt),
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.add_span(trace_id, {
            "name": f"llm:{service_id}",
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
            "metadata": {"service_id": service_id},
        })
        raise

Full agent trace with plugins and planning

Putting it together: one trace per agent invocation, with spans for planning, each plugin call, and the final response:

import asyncio
import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from nexus_sdk import NexusClient

nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])

async def run_sk_agent(goal: str, user_id: str) -> str:
    trace = nexus.start_trace({
        "agent_id": "sk-research-agent",
        "name": f"agent: {goal[:60]}",
        "status": "running",
        "started_at": nexus.now(),
        "metadata": {
            "user_id": user_id,
            "goal": goal[:200],
            "environment": os.environ.get("APP_ENV", "dev"),
        },
    })
    trace_id = trace["trace_id"]

    kernel = Kernel()
    kernel.add_service(
        OpenAIChatCompletion(
            service_id="gpt4o",
            ai_model_id="gpt-4o",
            api_key=os.environ["OPENAI_API_KEY"],
        )
    )

    # Register plugin with trace context injected
    search_plugin = SearchPlugin(nexus, trace_id)
    kernel.add_plugin(search_plugin, plugin_name="SearchPlugin")

    settings = OpenAIChatPromptExecutionSettings(
        service_id="gpt4o",
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    )

    t0 = time.time()
    try:
        result = await kernel.invoke_prompt(goal, settings=settings)
        elapsed_ms = int((time.time() - t0) * 1000)

        nexus.end_trace(trace_id, {
            "status": "success",
            "latency_ms": elapsed_ms,
            "metadata": {
                "output_length": len(str(result)),
            },
        })
        return str(result)

    except Exception as e:
        nexus.end_trace(trace_id, {
            "status": "error",
            "latency_ms": int((time.time() - t0) * 1000),
            "error": str(e),
        })
        raise

What to watch for in production

Once traces are flowing, three failure patterns show up repeatedly in Semantic Kernel agents:

Next steps

Semantic Kernel is growing fast — Microsoft is actively adding new connectors, memory backends, and multi-agent patterns. The instrumentation approach here works regardless of which connector or planner you use, since it wraps the kernel invocation boundary rather than any specific internal API. Sign up for a free Nexus account to start capturing traces from your Semantic Kernel agents today.

Add observability to Semantic Kernel

Free tier, no credit card required. Full trace visibility in under 5 minutes.