Stop testing your prompts equally (there's a provably better way)

Efficient NLP · Beginner ·🧠 Large Language Models ·1w ago
Most people evaluate prompts the wrong way. In this video, I show why uniform prompt testing wastes your LLM eval budget and how multi-armed bandit algorithms, specifically best arm identification (BAI), give a provably better alternative. We break down Successive Rejects and Sequential Halving, explain why they focus compute on the hardest-to-distinguish prompts, and connect this to real systems like TRIPLE (NeurIPS 2024) that improve prompt selection in pipelines like APE and APO. 0:00 - Introduction 0:50 - Best arm identification (BAI) 2:38 - Successive Rejects (SR) 3:32 - Sequential Halvi…
Watch on YouTube ↗ (saves to browser)

Chapters (6)

Introduction
0:50 Best arm identification (BAI)
2:38 Successive Rejects (SR)
3:32 Sequential Halving (SH)
4:39 Error analysis
6:22 Prompt selection using bandits (TRIPLE)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)