Observability for Microsoft Semantic Kernel Agents in Python
Microsoft Semantic Kernel gives you a structured way to build AI agents in Python with plugins, planners, and multi-model support. When a planner selects the wrong function, a plugin throws silently, or a kernel invocation spikes latency, you need trace visibility to diagnose it. Here's how to integrate Nexus into Semantic Kernel agents.
What Semantic Kernel adds
Microsoft Semantic Kernel is a production-ready SDK for building AI agents in Python (and C#/.NET). Its key abstractions are Kernel (the central orchestrator), Plugins (collections of functions exposed to the LLM), and Planners (which reason about which plugin functions to call to satisfy a goal).
This architecture introduces observability challenges that standard logging can’t solve:
- Planner decisions are opaque: When the planner selects a sequence of plugin calls, there’s no built-in record of why — what prompt it generated, which plan it chose, or why it rejected alternatives.
- Plugin errors are swallowed: A plugin function that raises an exception may cause the kernel to retry with a different plan rather than surfacing the error. Without spans, you can’t tell which plugin call failed.
- Multi-step kernel invocations: A single
kernel.invoke()call can trigger a chain of plugin executions and LLM calls. Latency attribution requires a span per step. - Multi-model configurations: SK supports routing different tasks to different models (e.g., GPT-4o for planning, GPT-3.5 for summarization). Without per-call model tracking, you can’t attribute cost or latency to specific model choices.
Tracing a basic kernel invocation
Install the SDK and wrap your kernel calls with Nexus traces:
pip install semantic-kernel nexus-sdk
import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from nexus_sdk import NexusClient
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])
kernel = Kernel()
kernel.add_service(
OpenAIChatCompletion(
service_id="gpt4o",
ai_model_id="gpt-4o",
api_key=os.environ["OPENAI_API_KEY"],
)
)
async def invoke_with_tracing(prompt: str, user_id: str) -> str:
trace = nexus.start_trace({
"agent_id": "semantic-kernel-agent",
"name": f"kernel: {prompt[:60]}",
"status": "running",
"started_at": nexus.now(),
"metadata": {
"user_id": user_id,
"prompt_length": len(prompt),
"environment": os.environ.get("APP_ENV", "dev"),
},
})
trace_id = trace["trace_id"]
t0 = time.time()
try:
result = await kernel.invoke_prompt(prompt)
elapsed_ms = int((time.time() - t0) * 1000)
nexus.end_trace(trace_id, {
"status": "success",
"latency_ms": elapsed_ms,
"metadata": {
"output_length": len(str(result)),
},
})
return str(result)
except Exception as e:
nexus.end_trace(trace_id, {
"status": "error",
"latency_ms": int((time.time() - t0) * 1000),
"error": str(e),
})
raise
Tracing plugin function calls
Plugins are the core building block in Semantic Kernel — each plugin is a Python class decorated with @kernel_function. Wrap plugin methods to emit a span per function execution:
from semantic_kernel.functions import kernel_function
from typing import Annotated
class SearchPlugin:
def __init__(self, nexus_client, trace_id: str):
self._nexus = nexus_client
self._trace_id = trace_id
@kernel_function(name="search_web", description="Search the web for information")
def search_web(
self,
query: Annotated[str, "The search query"],
) -> Annotated[str, "Search results"]:
t0 = time.time()
try:
# your search implementation
results = _do_search(query)
self._nexus.add_span(self._trace_id, {
"name": "plugin:SearchPlugin.search_web",
"status": "success",
"latency_ms": int((time.time() - t0) * 1000),
"metadata": {
"query": query[:120],
"result_count": len(results),
},
})
return results
except Exception as e:
self._nexus.add_span(self._trace_id, {
"name": "plugin:SearchPlugin.search_web",
"status": "error",
"latency_ms": int((time.time() - t0) * 1000),
"error": str(e),
})
raise
Tracing planner decisions
Semantic Kernel’s planners (like FunctionChoiceBehavior with auto function calling) determine which plugin functions to invoke for a given goal. Wrapping the planning step in a span captures what plan was generated and how long planning took:
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
async def invoke_with_auto_planning(kernel, goal: str, trace_id: str) -> str:
settings = OpenAIChatPromptExecutionSettings(
service_id="gpt4o",
function_choice_behavior=FunctionChoiceBehavior.Auto(),
)
t_plan = time.time()
try:
result = await kernel.invoke_prompt(
goal,
settings=settings,
)
plan_ms = int((time.time() - t_plan) * 1000)
nexus.add_span(trace_id, {
"name": "planner:auto_function_calling",
"status": "success",
"latency_ms": plan_ms,
"metadata": {
"goal": goal[:120],
"output": str(result)[:300],
},
})
return str(result)
except Exception as e:
nexus.add_span(trace_id, {
"name": "planner:auto_function_calling",
"status": "error",
"latency_ms": int((time.time() - t_plan) * 1000),
"error": str(e),
})
raise
Tracking multi-model routing
One of Semantic Kernel’s strengths is routing different tasks to different models. Record the service ID and model name as span metadata so you can attribute latency and cost per model:
kernel.add_service(
OpenAIChatCompletion(
service_id="gpt4o",
ai_model_id="gpt-4o",
api_key=os.environ["OPENAI_API_KEY"],
)
)
kernel.add_service(
OpenAIChatCompletion(
service_id="gpt35",
ai_model_id="gpt-3.5-turbo",
api_key=os.environ["OPENAI_API_KEY"],
)
)
async def invoke_with_model_tracking(
kernel, prompt: str, service_id: str, trace_id: str
) -> str:
settings = OpenAIChatPromptExecutionSettings(service_id=service_id)
t0 = time.time()
try:
result = await kernel.invoke_prompt(prompt, settings=settings)
elapsed_ms = int((time.time() - t0) * 1000)
nexus.add_span(trace_id, {
"name": f"llm:{service_id}",
"status": "success",
"latency_ms": elapsed_ms,
"metadata": {
"service_id": service_id,
"prompt_length": len(prompt),
"output_length": len(str(result)),
},
})
return str(result)
except Exception as e:
nexus.add_span(trace_id, {
"name": f"llm:{service_id}",
"status": "error",
"latency_ms": int((time.time() - t0) * 1000),
"error": str(e),
"metadata": {"service_id": service_id},
})
raise
Full agent trace with plugins and planning
Putting it together: one trace per agent invocation, with spans for planning, each plugin call, and the final response:
import asyncio
import os
import time
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
OpenAIChatCompletion,
OpenAIChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from nexus_sdk import NexusClient
nexus = NexusClient(api_key=os.environ["NEXUS_API_KEY"])
async def run_sk_agent(goal: str, user_id: str) -> str:
trace = nexus.start_trace({
"agent_id": "sk-research-agent",
"name": f"agent: {goal[:60]}",
"status": "running",
"started_at": nexus.now(),
"metadata": {
"user_id": user_id,
"goal": goal[:200],
"environment": os.environ.get("APP_ENV", "dev"),
},
})
trace_id = trace["trace_id"]
kernel = Kernel()
kernel.add_service(
OpenAIChatCompletion(
service_id="gpt4o",
ai_model_id="gpt-4o",
api_key=os.environ["OPENAI_API_KEY"],
)
)
# Register plugin with trace context injected
search_plugin = SearchPlugin(nexus, trace_id)
kernel.add_plugin(search_plugin, plugin_name="SearchPlugin")
settings = OpenAIChatPromptExecutionSettings(
service_id="gpt4o",
function_choice_behavior=FunctionChoiceBehavior.Auto(),
)
t0 = time.time()
try:
result = await kernel.invoke_prompt(goal, settings=settings)
elapsed_ms = int((time.time() - t0) * 1000)
nexus.end_trace(trace_id, {
"status": "success",
"latency_ms": elapsed_ms,
"metadata": {
"output_length": len(str(result)),
},
})
return str(result)
except Exception as e:
nexus.end_trace(trace_id, {
"status": "error",
"latency_ms": int((time.time() - t0) * 1000),
"error": str(e),
})
raise
What to watch for in production
Once traces are flowing, three failure patterns show up repeatedly in Semantic Kernel agents:
- Planner over-selection: The auto function calling behavior calls more plugins than necessary, burning tokens on unnecessary tool calls. Look for traces where 4+ plugin spans fire for a simple query.
- Plugin retry storms: A plugin that raises an exception on transient errors (network timeouts, rate limits) may be called multiple times before the kernel gives up. Span counts per plugin name reveal repeated calls.
- Silent plan failures: The kernel may return an empty or malformed result without raising an exception when the planner can’t satisfy the goal with available plugins. Track
output_length == 0as a soft failure signal in trace metadata.
Next steps
Semantic Kernel is growing fast — Microsoft is actively adding new connectors, memory backends, and multi-agent patterns. The instrumentation approach here works regardless of which connector or planner you use, since it wraps the kernel invocation boundary rather than any specific internal API. Sign up for a free Nexus account to start capturing traces from your Semantic Kernel agents today.
Add observability to Semantic Kernel
Free tier, no credit card required. Full trace visibility in under 5 minutes.