Evaluating AI Search: A Practical Framework for Augmented AI Systems — Quotient AI + Tavily
Skills:
RAG Basics80%
AI search is becoming the front door to information, whether through Retrieval-Augmented Generation (RAG), Search-Augmented Generation (SAG), or custom agents that synthesize answers on top of indexed content. As users rely more heavily on these systems, evaluating their quality becomes mission-critical. But traditional metrics like precision and recall don’t capture the full picture.
In this talk, we introduce a practical evaluation framework for AI-powered search, across three dimensions:
- Are the retrieved sources relevant to the query?
- And is the final answer complete?
- Are the sources faithfully used in the generated answer?
We’ll share lessons from working with search companies and present early findings from a new benchmark evaluating popular augmented AI systems across these dimensions. Rather than ranking winners and losers, we explore where different systems excel or break down, and how these tradeoffs inform product decisions.
This talk is for AI engineers and product teams who want to build trusted, high-quality AI search experiences, and need a way to measure if it’s actually working.
About Julia Neagu
Julia is the co-founder and CEO of Quotient AI, which provides intelligent observability for AI apps by automatically detecting failures, uncovering root causes, and recommending improvements. Before Quotient, she was the Director of Data for Copilot, GitHub's AI pair programmer, where her team built the systems evaluating the large language models behind Copilot. Previously, she was the Director of Analytics at Tamr and led end-to-end quantitative modeling at Aon's Intellectual Property Solutions group. Julia has a PhD and MA in Physics from Harvard, an AB in Physics from Princeton.
About Deanna Emery
Deanna is the Founding AI Researcher at Quotient AI, where she is leading research on evaluation of Large Language Models in real-world products and applications. Before Quotient, Deanna was a Principal Data Scientist at Aon, where she led the team
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AI Engineer · AI Engineer · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
AI Engineer Summit 2023 — DAY 1 Livestream
AI Engineer
AI Engineer Summit 2023 — DAY 2 Livestream
AI Engineer
Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)
AI Engineer
Announcing the AI Engineer Network: Benjamin Dunphy
AI Engineer
The 1,000x AI Engineer: Swyx
AI Engineer
Building AI For All: Amjad Masad & Michele Catasta
AI Engineer
The Age of the Agent: Flo Crivello
AI Engineer
See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman
AI Engineer
Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
AI Engineer
Pydantic is all you need: Jason Liu
AI Engineer
Building Blocks for LLM Systems & Products: Eugene Yan
AI Engineer
The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer
AI Engineer
Climbing the Ladder of Abstraction: Amelia Wattenberger
AI Engineer
Supabase Vector: The Postgres Vector database: Paul Copplestone
AI Engineer
[Workshop] AI Engineering 101
AI Engineer
The Hidden Life of Embeddings: Linus Lee
AI Engineer
[Workshop] AI Engineering 201: Inference
AI Engineer
The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex
AI Engineer
The AI Evolution: Mario Rodriguez, GitHub
AI Engineer
Move Fast Break Nothing: Dedy Kredo
AI Engineer
AI Engineering 201: The Rest of the Owl
AI Engineer
Building Reactive AI Apps: Matt Welsh
AI Engineer
Pragmatic AI with TypeChat: Daniel Rosenwasser
AI Engineer
Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
AI Engineer
Retrieval Augmented Generation in the Wild: Anton Troynikov
AI Engineer
Building Production-Ready RAG Applications: Jerry Liu
AI Engineer
120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson
AI Engineer
The Weekend AI Engineer: Hassan El Mghari
AI Engineer
Harnessing the Power of LLMs Locally: Mithun Hunsur
AI Engineer
Trust, but Verify: Shreya Rajpal
AI Engineer
Open Questions for AI Engineering: Simon Willison
AI Engineer
Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
AI Engineer
GPT Web App Generator - 10,000 apps created in a month: Matija Sosic
AI Engineer
Using AI to Build an Infinite Game: Jeff Schomay
AI Engineer
How to Become an AI Engineer from a Fullstack Background - Reid Mayo
AI Engineer
The Code AI Maturity Model and What It Means For You: Ado Kukic
AI Engineer
AI Engineer World’s Fair 2024 - Keynotes & Multimodality track
AI Engineer
From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
AI Engineer
The Making of Devin by Cognition AI: Scott Wu
AI Engineer
The Future of Knowledge Assistants: Jerry Liu
AI Engineer
Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney
AI Engineer
Open Challenges for AI Engineering: Simon Willison
AI Engineer
Lessons From A Year Building With LLMs
AI Engineer
From Software Developer to AI Engineer: Antje Barth
AI Engineer
Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner
AI Engineer
Copilots Everywhere: Thomas Dohmke and Eugene Yan
AI Engineer
Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
AI Engineer
Low Level Technicals of LLMs: Daniel Han
AI Engineer
Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta
AI Engineer
How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
AI Engineer
What's new from Anthropic and what's next: Alex Albert
AI Engineer
Using agents to build an agent company: Joao Moura
AI Engineer
Decoding the Decoder LLM without de code: Ishan Anand
AI Engineer
Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
AI Engineer
Building with Anthropic Claude: Prompt Workshop with Zack Witten
AI Engineer
Building Reliable Agentic Systems: Eno Reyes
AI Engineer
10x Development: LLMs For the working Programmer - Manuel Odendahl
AI Engineer
Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner
AI Engineer
Hypermode Launch: Kevin Van Gundy
AI Engineer
Git push get an AI API: Ryan Fox-Tyler
AI Engineer
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
SQL to Python: The Exact Transition Every BA Needs to Make
Medium · Data Science
SQL to Python: The Exact Transition Every BA Needs to Make
Medium · Python
Psychology of Decision Support Systems (DSS)
Medium · Data Science
Snowflake Cortex AI: Your Smartest Hire That Never Sleeps
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI