RAG Evaluation
Measure and improve RAG quality — faithfulness, relevance, context precision.
0%
Confidence · no data yet
After this skill you can…
- Run RAGAS evaluation on a RAG pipeline
- Interpret faithfulness and answer relevance scores
- A/B test chunking strategies
Prerequisites
Watch (10 videos)
[Evals Workshop] Mastering AI Evaluation: From Playground to Production
→ Build evaluation frameworks for AI applications→ Implement offline and online evaluation strategies
[Full Workshop] Building Metrics that actually work — David Karam, Pi Labs (fmr Google Search)
→ Develop reliable evaluation metrics→ Optimize LLM performance
Build a RAG Evaluation Tool and Python Library
→ Build a RAG evaluation tool→ Implement evaluation metrics→ Create a Python library
GenAI Interview Questions: LLM Evaluation Pipeline in Production #generativeai
→ Deploy LLM evaluation pipelines to production→ Implement evaluation metrics for GenAI applications
Your mental model for AI testing: evals, LLM judges, and test layering
→ Evaluate AI models using LLM judges and test layering→ Optimize AI app development with AI testing
[VOD] First Look At Claude 3 - Can It Beat GPT-4?
→ Compare LLMs using evaluation metrics→ Analyze AI model performance
Advanced LLM Evaluation Techniques: Chapter 22
→ Evaluate LLM model performance→ Implement advanced evaluation techniques
How to evaluate your Gen AI models with Vertex AI
→ Evaluate Gen AI models with Vertex AI→ Scale RAG models for reliable results
open-rag-eval: RAG Evaluation without "golden" answers — Ofer Mendelevitch, Vectara
→ Evaluate RAG models without golden answers→ Use LLM judges for scalable evaluation
Evaluate your AI with Stax
→ Evaluate AI with Stax→ Build data-driven AI evaluations
DeepCamp AI