Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft
Agents drift. Models change, prompts get tweaked, edge cases accumulate, and the gap between what your agent does and what you need it to do widens without you noticing. Amy and Nitya walk through Microsoft Foundry's observability stack: tracing built on OpenTelemetry, built-in evaluators for quality, safety, and agentic metrics like intent resolution and task adherence, and red teaming where a second AI attacks your agent with adversarial prompts to find vulnerabilities before your users do.
The piece worth watching for is the observe skill demo. You point it at an agent with no eval dataset, no baselines, nothing. It generates the dataset, runs batch evaluations, optimizes the prompt, compares versions, and rolls back to the best one... all from a single prompt to a coding agent. The skill shows its reasoning at each step, which is where the real value is: it surfaces the failures you didn't know to look for.
Speaker info:
- https://x.com/NityaNarasimhan
- https://www.linkedin.com/in/nityan/
- https://x.com/AmyKateNicho
- https://www.linkedin.com/in/amykatenicho/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Agent Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
10 Real-World AI Agent Projects
Medium · LLM
Actually, vibe coding didn't kill testing — agentic engineering did
Dev.to · Muggle AI
Gemini 3.1 Flash Lite vs DeepSeek V4 Flash: Budget API Showdown for High-Volume Agent Loops (2026)
Dev.to AI
WebMCP Reality Check: Where the Spec Actually Stands
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI