The Role of Validation Sets in Model Training | Train-Test-Validation Splits | Clearly explained!

AI For Beginners · Beginner ·📄 Research Papers Explained ·2y ago

Skills: ML Pipelines90%Supervised Learning80%

Key Takeaways

The video explains the role of validation sets in model training, including train-test-validation splits and their significance in machine learning model creation, highlighting tools like data analysis and feature engineering, and techniques such as hyperparameter tuning and model selection.

Full Transcript

in our previous video we talked about the train test split why is it important and how to properly split the data set in this video we will refer to validation data a proportion from the overall data set that has a significant role machine learning model creation has the following steps firstly splitting the data into train validation and test sets secondly doing data analysis and feature engineering thirdly training models hyperparameter tuning and model selection and lastly final model testing training and validation sets are used for training and preparing the final model while the test set is only used for final evaluation validation data can be defined as a set that is not used for training but we use its results during the training process to select the appropriate model and configure its hyperparameters so while it can be referred to as unseen data we still used its information to select our final model eval Val ating the final model only on the validation set will provide an optimistic estimation of the performance in other words we already maximize the performance of the final model based on the validation data during the training process training performance and validation performance are evaluated to see how the performance of the model is improving if we see that both sets are improving then we are on the correct path the validation set ensures that the model does not just memorize the training data but learns patterns that apply to new data as well remember that at some point the training performance will continue to improve but we will start seeing a declining performance on the validation set this is the point that suggests we stop the training because it starts to overfit a concept we will refer to in the upcoming videos as a result you will select the model and the set of hyperparameters that provide the highest validation performance finally you evaluate the unbiased performance estimate using the test data the size of the validation set is often similar to the test size sometimes a bit less just be sure to have data large enough to provide a reliable estimate of the model's performance if you want to learn more about artificial intelligence subscribe to our channel to be aware of the new videos press the like button and let's discuss AI in the comments section

Original Description

🔥 In this video we referred to the validation set, a proportion from the overall dataset that has a very significant role! Validation dataset is used for final model selection and hyperparameter tuning, as well as to understand whether your model learns patterns or just overfits the training data. It gives a rough estimate of the performance of the model on an "unseen" data. Remember to use test dataset for final evaluation. You can't use the results from the validation set only, as you used its feedback to tune your hyperparameters and select the best model! 🔍 Key points covered: 0:00 - Introduction. 0:15 - How different data splits are used in the model creation procedure? 0:41 - How we define the validation set? 0:52 - How is validation different from test and train? 1:00 - What if you evaluate the model based on the validation set? 1:12 - How is validation data used during the training? 1:33 - At what point the validation performance will start declining? 1:48 - How you select the best model based on the validation results? 1:54 - How to evaluate the final performance? 1:59 - The size of the validation set. 2:10 - Subscribe to us! 🔔 Don't forget to like, subscribe, and hit the bell icon to stay updated with our latest videos! 🤖 Note that we use synthetic generations, such as AI-generated images and voices, to enhance the appeal and engagement of our content. 🌐 If you have any questions or topics you want us to cover, leave a comment below. Additionally, share with your thoughts about the content, how do you think we can make them better? Thanks for watching!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI For Beginners · AI For Beginners · 11 of 32

← Previous Next →

Artificial Intelligence Explained In Simple Words | What Is AI? | Explained On A Real World Example!

Artificial Intelligence Explained In Simple Words | What Is AI? | Explained On A Real World Example!

AI For Beginners

AI vs. ML vs. DL vs. DS - Difference Explained | On Real World Examples | AI For Beginners

AI vs. ML vs. DL vs. DS - Difference Explained | On Real World Examples | AI For Beginners

AI For Beginners

Types Of Machine Learning Algorithms | Explained On Real World Examples | ML For Beginners

Types Of Machine Learning Algorithms | Explained On Real World Examples | ML For Beginners

AI For Beginners

Best AI Music Generator | Music Generation Tool for FREE | MusicGen developed by Meta AI

Best AI Music Generator | Music Generation Tool for FREE | MusicGen developed by Meta AI

AI For Beginners

The Ultimate Guide To Supervised Learning | Explained On Binary Classification Example | Part 1

The Ultimate Guide To Supervised Learning | Explained On Binary Classification Example | Part 1

AI For Beginners

The Ultimate Guide To Supervised Learning | Classification And Regression | Part 2

The Ultimate Guide To Supervised Learning | Classification And Regression | Part 2

AI For Beginners

Linear Regression Explained | A Beginner's Guide To Regression | The Basics You Need to Know!

Linear Regression Explained | A Beginner's Guide To Regression | The Basics You Need to Know!

AI For Beginners

Assumptions Of Linear Regression | What To Do If The Assumptions Do Not Hold? | Part 1

Assumptions Of Linear Regression | What To Do If The Assumptions Do Not Hold? | Part 1

AI For Beginners

Checking The Assumptions Of Linear Regression | Statistical And Visual Methods | Part 2

Checking The Assumptions Of Linear Regression | Statistical And Visual Methods | Part 2

AI For Beginners

The Purpose of Train-Test Split in Machine Learning | How to Correctly Split Data?

The Purpose of Train-Test Split in Machine Learning | How to Correctly Split Data?

AI For Beginners

The Role of Validation Sets in Model Training | Train-Test-Validation Splits | Clearly explained!

The Role of Validation Sets in Model Training | Train-Test-Validation Splits | Clearly explained!

AI For Beginners

Overfitting and Underfitting | Bias and Variance Tradeoff in Machine Learning | Clearly Explained!

Overfitting and Underfitting | Bias and Variance Tradeoff in Machine Learning | Clearly Explained!

AI For Beginners

Gradient Descent Explained | How Do ML and DL Models Learn? | Simple Explanation!

Gradient Descent Explained | How Do ML and DL Models Learn? | Simple Explanation!

AI For Beginners

Main Types of Gradient Descent | Batch, Stochastic and Mini-Batch Explained! | Which One to Choose?

Main Types of Gradient Descent | Batch, Stochastic and Mini-Batch Explained! | Which One to Choose?

AI For Beginners

The Role of Loss Functions | Most Common Loss Functions in Machine Learning | Explained!

The Role of Loss Functions | Most Common Loss Functions in Machine Learning | Explained!

AI For Beginners

How to Evaluate Your ML Models Effectively? | Evaluation Metrics in Machine Learning!

How to Evaluate Your ML Models Effectively? | Evaluation Metrics in Machine Learning!

AI For Beginners

8 Best Tips For Cleaning Your Data | Data Cleaning | Machine Learning, Data Preparation.

8 Best Tips For Cleaning Your Data | Data Cleaning | Machine Learning, Data Preparation.

AI For Beginners

Numerical vs. Categorical Data | Represent Your Dataset Correctly!

Numerical vs. Categorical Data | Represent Your Dataset Correctly!

AI For Beginners

3 Main Types of Missing Data | Do THIS Before Handling Missing Values!

3 Main Types of Missing Data | Do THIS Before Handling Missing Values!

AI For Beginners

7 PROVEN Strategies To Become An AI Engineer (2025 Updated)

7 PROVEN Strategies To Become An AI Engineer (2025 Updated)

AI For Beginners

Easiest Guide to K-Fold Cross Validation | Explained in 2 Minutes!

Easiest Guide to K-Fold Cross Validation | Explained in 2 Minutes!

AI For Beginners

Normalization and Standardization | Why to Scale the Features? | ML Basics

Normalization and Standardization | Why to Scale the Features? | ML Basics

AI For Beginners

The Ultimate Guide to Hyperparameter Tuning | Grid Search vs. Randomized Search

The Ultimate Guide to Hyperparameter Tuning | Grid Search vs. Randomized Search

AI For Beginners

How is Artificial Intelligence different from Traditional Programming?

How is Artificial Intelligence different from Traditional Programming?

AI For Beginners

All Machine Learning Models Clearly Explained!

All Machine Learning Models Clearly Explained!

AI For Beginners

6 Mistakes to Avoid When Learning Machine Learning in 2025

6 Mistakes to Avoid When Learning Machine Learning in 2025

AI For Beginners

Best Practices for Effective Data Visualization In Machine Learning!

Best Practices for Effective Data Visualization In Machine Learning!

AI For Beginners

Central Limit Theorem Intuition Explained Like You're 5!

Central Limit Theorem Intuition Explained Like You're 5!

AI For Beginners

Which Door Would You Choose? | Monty Hall Problem Explained!

Which Door Would You Choose? | Monty Hall Problem Explained!

AI For Beginners

All Machine Learning Concepts Explained in 18 Minutes!

All Machine Learning Concepts Explained in 18 Minutes!

AI For Beginners

What’s the Probability That Two Randomly Drawn Chords in a Circle Intersect?

What’s the Probability That Two Randomly Drawn Chords in a Circle Intersect?

AI For Beginners

Causation vs Correlation | The Most Confused Concept in Data Science

Causation vs Correlation | The Most Confused Concept in Data Science

AI For Beginners

The video explains the importance of validation sets in machine learning model creation, including their role in hyperparameter tuning, model selection, and preventing overfitting. It highlights the need for a reliable estimate of model performance and provides guidance on selecting the right model and hyperparameters. By understanding the role of validation sets, viewers can improve their model training and evaluation skills.

Key Takeaways

Split the data into train, validation, and test sets
Perform data analysis and feature engineering
Train models and perform hyperparameter tuning
Evaluate model performance on the validation set
Select the model and hyperparameters with the highest validation performance
Evaluate the final model on the test set

💡 The validation set plays a crucial role in preventing overfitting and ensuring that the model learns patterns that apply to new data, rather than just memorizing the training data.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related Reads

On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]

arXiv is becoming an independent nonprofit organization after 25 years at Cornell University, backed by major funding, which will impact the future of research and academia

Reddit r/MachineLearning

CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available

Learn about the CS-NRRM's official publications on a 12-year longitudinal human observation archive and its significance in research and development

Medium · Data Science

Found a potential mistake in an ICLR 2026 blogpost [D]

Verify a potential mistake in an ICLR 2026 blog post and learn how to effectively report errors in academic publications

Reddit r/MachineLearning

Rebuttals Move Peer-Review Scores, but Initial-Review Structure Bounds the Movement

Learn how author rebuttals impact peer-review scores and the factors that influence their effectiveness in ICLR 2024-2025, using LLMs for measurement

Chapters (11)

Introduction.

0:15 How different data splits are used in the model creation procedure?

0:41 How we define the validation set?

0:52 How is validation different from test and train?

1:00 What if you evaluate the model based on the validation set?

1:12 How is validation data used during the training?

1:33 At what point the validation performance will start declining?

1:48 How you select the best model based on the validation results?

1:54 How to evaluate the final performance?

1:59 The size of the validation set.

2:10 Subscribe to us!

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom