RAG Evaluation — DeepCamp Skills

After this skill you can…

Run RAGAS evaluation on a RAG pipeline
Interpret faithfulness and answer relevance scores
A/B test chunking strategies

Prerequisites

RAG Basics

Watch (10 videos)

[Evals Workshop] Mastering AI Evaluation: From Playground to Production

AI Engineer · intermediate hands-on

→ Build evaluation frameworks for AI applications→ Implement offline and online evaluation strategies

[Full Workshop] Building Metrics that actually work — David Karam, Pi Labs (fmr Google Search)

AI Engineer · intermediate hands-on

→ Develop reliable evaluation metrics→ Optimize LLM performance

Build a RAG Evaluation Tool and Python Library

AI Anytime · intermediate hands-on

→ Build a RAG evaluation tool→ Implement evaluation metrics→ Create a Python library

GenAI Interview Questions: LLM Evaluation Pipeline in Production #generativeai

BEPEC · intermediate hands-on

→ Deploy LLM evaluation pipelines to production→ Implement evaluation metrics for GenAI applications

Your mental model for AI testing: evals, LLM judges, and test layering

Chrome for Developers · intermediate hands-on

→ Evaluate AI models using LLM judges and test layering→ Optimize AI app development with AI testing

[VOD] First Look At Claude 3 - Can It Beat GPT-4?

bycloud · intermediate hands-on

→ Compare LLMs using evaluation metrics→ Analyze AI model performance

Advanced LLM Evaluation Techniques: Chapter 22

Weights & Biases · intermediate hands-on

→ Evaluate LLM model performance→ Implement advanced evaluation techniques

How to evaluate your Gen AI models with Vertex AI

Google Cloud Tech · beginner hands-on

→ Evaluate Gen AI models with Vertex AI→ Scale RAG models for reliable results

open-rag-eval: RAG Evaluation without "golden" answers — Ofer Mendelevitch, Vectara

AI Engineer · advanced hands-on

→ Evaluate RAG models without golden answers→ Use LLM judges for scalable evaluation

Evaluate your AI with Stax

Google for Developers · intermediate hands-on

→ Evaluate AI with Stax→ Build data-driven AI evaluations

Read (10 articles)

📄

RAG in Practice — Part 7: Your RAG System Is Wrong. Here's How to Find Out Why.

Dev.to · Gursharan Singh · 2026-04-24

📄

Chunking for RAG: stop tuning the wrong knob

Dev.to · saurabh naik · 2026-05-18

📄

RAG Evaluation with RAGAS: Measuring Faithfulness, Context Precision, and Recall in Production

Dev.to · Anna Danilec · 2026-05-18

📄

Four Metrics That Actually Tell You Whether Your Enterprise RAG Is Working

Dev.to · Manjunath · 2026-05-21

📄

RAG Retrieval Quality: Are Large Models Really Necessary?

Dev.to · Mustafa ERBAY · 2026-06-06

📄

Stop Trusting Your RAG Retriever Blindly — Here’s How to Actually Make It Smart

Medium · Machine Learning · 2026-04-30

📄

THE EVALUATION PROBLEM

Medium · AI · 2026-04-22

📄

Why Your RAG Pipeline Lies to You

Medium · NLP · 2026-04-19

📄

My Electricity-Theft Detector Scored 90%. Then I Caught It Lying

Medium · Machine Learning · 2026-06-01

📄

Evaluation: Prove it before you ship it

Dev.to AI · 2026-05-18