Benchmark & Optimize LLM App Performance
Benchmark & Optimize LLM App Performance is a hands-on journey from “it works” to “it flies.” You’ll start by treating speed and cost as product features-defining a baseline with the right metrics (p50/p95 latency, tokens/sec, throughput, determinism, cost per task) and building a lightweight benchmarking harness you can rerun on every change. Next, you’ll learn to hunt bottlenecks across the stack-network, model, prompt, and post-processing-using practical patterns that cut tokens without cutting quality, plus caching strategies for embeddings, RAG, and tool calls. Then you’ll run A/B/C exper…
Watch on Coursera ↗
(saves to browser)
DeepCamp AI