Patronus AI with Anand Kannappan - Weaviate Podcast #122!

Weaviate vector database · Advanced ·🤖 AI Agents & Automation ·1y ago

Skills: Agent Foundations90%Tool Use & Function Calling80%

AI agents are getting more complex and harder to debug. How do you know what's happening when your agent makes 20+ function calls? What if you have a Multi-Agent System orchestrating several Agents? Anand Kannappan, co-founder of Patronus AI, reveals how their groundbreaking tool Percival transforms agent debugging and evaluation. Percival can instantly analyze complex agent traces, it pinpoints failures across 60 different modes, and it automatically suggests prompt fixes to improve performance. Anand unpacks several of these common failure modes. This includes the critical challenges of "context explosion" where agents process millions of tokens. He also explains domain adaptation for specific use cases, and the complex challenge of multi-agent orchestration. The paradigm of AI Evals is shifting from static evaluation to dynamic oversight! Also learn how Percival's memory architecture leverages both episodic and semantic knowledge with Weaviate! This conversation explores powerful concepts like process vs. outcome rewards and LLM-as-judge approaches. Anand shares his vision for "agentic supervision" where equally capable AI systems provide oversight for complex agent workflows. Whether you're building AI agents, evaluating LLM systems, or interested in how debugging autonomous systems will evolve, this episode delivers concrete techniques. You'll gain philosophical insights on evaluation and a roadmap for how evaluation must transform to keep pace with increasingly autonomous AI systems. Links: Percival Launch: https://www.patronus.ai/percival Docs: https://docs.patronus.ai/docs/percival/ Paper: https://arxiv.org/abs/2505.08638 Chapters 0:00 Welcome Anand! 1:15 Percival! 17:20 Online and Offline Agent Tracing 20:40 Complex Agent Traces 23:05 Quick Insights and Deep Research 24:47 Automated Agent Tuning 31:19 LLM-as-Judge and Scalable Oversight 42:24 Agent Inbox for Evals 45:49 Causal Inference and AI 51:24 Percival and Weaviate 56:04 Exciting Directions for AI

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Agent Foundations

View skill →

Build and Deploy an Agent with Reasoning Engine in Vertex AI

Adding a Phone Gateway to a Virtual Agent

From Zero to Working AI Agent in 60 Seconds

From Zero to Working AI Agent in 60 Seconds

Create An AI Agent With Replit That Automates Your Sales

Create An AI Agent With Replit That Automates Your Sales

Capstone: Autonomous Runway Detection for IoT

Capstone: Autonomous Runway Detection for IoT

AI Agents with Model Context Protocol & Typescript

AI Agents with Model Context Protocol & Typescript

Related AI Lessons

The Context Layer: Why Enterprise AI Agents Fail Without It — and What It Actually Takes to Fix That

Enterprise AI agents often fail due to lack of context, but understanding the four-layer context problem can help fix this issue

Dev.to · Swapnil Chougule

Comparing 6 AI Routers Is a Mistake — Until You Define ‘Survived’

Evaluating AI routers requires a clear definition of success criteria, as comparing them without context is misleading

Comparing 6 AI Routers Is a Mistake — Until You Define ‘Survived’

Evaluating AI routers requires defining survival metrics, as a simple comparison of 6 AI routers can be misleading

Medium · Programming

What if an AI continued thinking even after you closed the chat?

Explore the concept of AI systems that continue thinking after a conversation ends and its implications

Dev.to · Stell

Chapters (11)

Welcome Anand!

1:15 Percival!

17:20 Online and Offline Agent Tracing

20:40 Complex Agent Traces

23:05 Quick Insights and Deep Research

24:47 Automated Agent Tuning

31:19 LLM-as-Judge and Scalable Oversight

42:24 Agent Inbox for Evals

45:49 Causal Inference and AI

51:24 Percival and Weaviate

56:04 Exciting Directions for AI

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase