Advanced LLM Evaluation Techniques: Chapter 22

Weights & Biases ยท Intermediate ยท๐Ÿง  Large Language Models ยท2y ago
๐Ÿค– LLM Evaluation Deep Dive - Join as we explore sophisticated evaluation techniques for LLM applications. ๐Ÿง‘๐Ÿพโ€๐ŸŽ“ Full course with certification and class materials available free at http://wandb.me/building-llm-powered-apps ๐Ÿ† Daily swag draw and grand prize Airpods draw from Dec 1 and 31, 2023. Details at http://wandb.me/llm-apps-contest ๐Ÿ—ฃ๏ธ Join the course conversation on our Discord channel at http://wandb.me/course-discord *Episode Description* In this chapter of our "Building LLM-Powered Apps" course, offered by Weights & Biases, Darek Kleczek, Machine Learning Engineer, guides you through the process of evaluating Large Language Model (LLM) applications. Learn how to implement a model-based evaluation approach using synthetic datasets and understand the importance of tracking data lineage for accurate assessments. ๐ŸŒŸ Chapter Highlights -Implementing an Evaluation Script: Explore the steps involved in setting up an evaluation script for LLM applications. -Loading and Tracking Evaluation Data: Understand the process of loading evaluation datasets and tracking their versions using Weights & Biases artifacts. -Using QA Chains for Evaluation: Discover how to utilize conversational retrieval chains for generating model responses to evaluation questions. -Creating Evaluation Prompts: Learn about constructing effective prompts to evaluate the correctness of LLM-generated answers. -Analyzing Evaluation Results: Gain insights into calculating model accuracy and logging results for interactive exploration and further analysis. ๐ŸŽ“ Enroll for Free: Join us on this educational journey to master the art of building LLM-powered applications. Enroll at http://wandb.me/building-llm-powered-apps. ๐Ÿ‘‰ Next Chapter Sneak Peek: Stay tuned for our course conclusion where we recap key learnings and explore next steps in LLM application development.
Watch on YouTube โ†— (saves to browser)
Sign in to unlock AI tutor explanation ยท โšก30

Playlist

Uploads from Weights & Biases ยท Weights & Biases ยท 0 of 60

โ† Previous Next โ†’
1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
18 11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
19 12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
22 15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
26 Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects โ€” W&B walkthrough (2020)
Organizing ML projects โ€” W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer โ€” Machine Learning in Production for Robots
Brandon Rohrer โ€” Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky โ€” Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky โ€” Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman โ€” Conversational AI and Linguistics
Rachael Tatman โ€” Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Frรฉchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Frรฉchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark โ€” Building Trustworthy AI Systems
Jack Clark โ€” Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle โ€” Designing ML Models for Millions of Consumer Robots
Angela & Danielle โ€” Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

Related AI Lessons

โšก
GPT-5.5 vs Claude Opus 4.7: Pricing, Speed, and Benchmarks
Compare GPT-5.5 and Claude Opus 4.7 pricing, speed, and benchmarks to choose the best AI language model for your project
Dev.to AI
โšก
From Idea to Image: A Practical Midjourney Prompting Guide
Learn to craft effective Midjourney prompts to generate high-quality images, focusing on clarity and creative briefs
Dev.to AI
โšก
Dell Becomes OpenAI's On-Prem Channel For Frontier Models
Dell partners with OpenAI to bring Codex to on-premises environments, expanding access to frontier AI models for enterprises
Forbes Innovation
โšก
Beyond Simple RAG:Creating an Evidence-Driven Coordination Environment for Local AI
Create a testable local AI environment using evidence-driven coordination, and learn how to prioritize data collection for better results
Medium ยท Programming
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch โ†’