Interpretable Machine Learning Models

Data Professor · Beginner ·📐 ML Fundamentals ·6y ago

Key Takeaways

The video discusses the importance of interpretable machine learning models, covering concepts such as data pre-processing, hyperparameter optimization, feature selection, and model evaluation metrics like R square, mean squared error, and accuracy. It also touches on handling imbalanced data sets and model retraining.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is tenon now Towson Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing okay so by now you probably have created some data science models and you're ready to move your data science model to the next stage which is to deploy the model but before doing that let's consider some important point here which is the interpretability of your data science model which essentially boils down to can you make sense of your data science model does it make sense can you interpret it do you know which features are important for making the prediction which are important for class a which are important for Class B so these are some of the questions that we're gonna cover about today and so without further ado let's get started so first head on to the data professor github where we will go to the infographic repository so here I have compiled all of the infographic that I have drawn so far and so far it has been five infographic the first one was starting out on the new year so the first one was building the machine learning model and this first one was also converted into a YouTube video and shortly after dr. Tatiana translated this into Portuguese and I have reached on the infographic that's posted here so big thanks to her for making this into Portuguese language and then the second infographic was about handling missing data so the idea of this infographic and a subsequent video which I will show the link here as well was suggested by Marco so big shout out to Marco for providing that idea so that we started in the data pre-processing in our series where we will cover about data pre-processing so the first one was about handling missing data so the infographic was also translated to Portuguese by dr. Tatiana and the third date for graphic is machine learning learning curve so the translation supporting gives is also in the making and will be released soon and the fourth infographic was based on a popular question on social media which was what are the skill sets for becoming a data scientist so I have summarized this into the HQ set mention here and also mentioned in the video on exploring the landscape of data science so if you haven't watched that check that out so the links are up in the card here and then the recent one is interpretability of data science models which is the topic that we're going to cover today so let's open up that one and click on the download because the finalists rather big and so you see that this one starts out with the trained model so before beginning let's head on over to the first infographic so as you can see the training model is almost the last stage of the building the machine learning model here so you normally would start out with your initial data set that you would like to create a model of and then after that you're going to do some data pre-processing you're gonna clean the data you're going to create the data you're going to remove redundant features from the data and then you get the preprocessdataset and then you have perform some form of data splitting using different ratios it could be 80/20 it could be 60/40 70/30 or it could be more than two data split so depending on you and then you're going to apply some learning algorithms and then perform some form of hyper parameter optimization and then you're gonna do feature selection in order to reduce the number of feature which could be potentially high number of feature into a lower set of features and then you would get your trained model right and then subsequent States will be mentioned in the next infographic as well so once you get your trained model you're going to use it to create the prediction your gonna predict some y values and then you're going to evaluate your model performance so before we can make any meaningful interpretations or meaningful use out of your predictive model we first must verify that the model is robust it has good performance and how do we do that so depending on your Y variable which could be quantitative or qualitative if it is quantitative you want to use regression where you could use R square mean squared error root mean squared error if your Y variable is qualitative you could use the classification where you could use accuracy sensitivity specificity and the Mathews correlation coefficient so based on these performance metrics you will determine whether you're a predictive model is robust or not if it is not you have to retrain the model again so by retraining the model you will have to adjust some of your model composition you might add additional features you might evaluate and then you figure out that you have left out some important feature which you will compute and add to your dataset or maybe you want to expand your dataset collect additional samples and do the model building process again if the robustness of the prediction law does not you have to repeat that step again until you get a satisfactory model once you have a satisfactory model then you're ready to interpret your models so some of the key issues in interpreting the models include looking under the hood of the predictive model figuring out which features are important for making the prediction for example if you want to classify whether your molecule is an active molecule or a inactive molecule then essentially that is a classification problem you want to classify whether the drug is active or inactive and once you have identified which features are contributing to your predictive model you have to determine whether they are contributing to the active group of data samples or the inactive group of data samples for example if you have a molecular weight feature or feature one for example if feature one is contributing to good prediction but feature one is it favorable for active compounds or is it favorable for the inactive molecule and how do you do that you might use some form of statistical analysis you could use essentially comparing the mean of your stratified data set by stratifying meaning that you subset your data further okay so for feature one analysis you will subset your data into active and inactive then you will determine what is the average value what is the mean value for the active group in particularly to feature one so what's the average value of feature one for the active group and for the inactive group and then based on that you will be able to determine which one has a higher or lower mean or whether they have the same and then you can use the pair t-test to statistically test for the statistical significance whether they are statistically different or not you could also use box plot to compare between the distribution of feature one between active and inactive group whether the box have roughly the same distribution maybe one has a smaller q1 and q3 range and one has larger gap difference and whether they are relatively higher or lower to one another so you will be able to determine that by performing some additional canals and then the next important question is what is the consequence of the prediction if your model produces wrong prediction what is the consequences of that is it life-threatening is their predictive model influencing the life or death matter of a patient of a user for example it could be assisting surgeons or physicians in diagnosing or in surgery of patients so if the model produces wrong prediction that would consequently lead to a life or death matter so in a similar fashion to a self-driving car if the prediction model analyzes the situation and produces a wrong advice it could lead to an accident and so the next logical step would mean what led to the wrong prediction you want to understand what led to good prediction or accurate prediction and which leads to wrong prediction so that would be possibly done by looking at whether your data set has any potential outliers or are you missing some potential features right so based on your feature analysis you could try to think the data set over again whether you're missing some features that are important that is not in consideration by your predictive model or maybe there are some situations that are extreme cases that are outliers and therefore it is outside the applicability domain of your data set or of your predictive model for example if you're creating a predictive model for classifying apples and the classification apples could be based on the color if you classify apples as red and green and let's say that you feed in an image of an orange then that orange would be outside the applicability domain of your predictive model because your model has not been trained to recognize an orange so what can you do in this situation you retrain the model by feeding in examples of oranges and so your predictive model wouldn't be updated with information about oranges and so the iteration will continue on and on and on if you want to train it with pears wearing it with bananas or other fruits that's another potential issue that might arise is maybe your data has imbalanced data set your class labels are in meaning that your number of apples to the number of oranges are imbalanced you might have ten times the number of apples to oranges you might have a thousand apples and only one times with oranges so your model is inherently biased to make a prediction that is favorable for apples so there are other ways on how you can handle the situation such as performing under sampling or over sampling so that is not the scope of this video and I could cover that in the future video ok so those are some of the major issues that you should consider about in interpreting your model you're trying to make sense of the model you're trying to add value to the data you're trying to provide insights actionable insights that could help the decision making process so take a critical look analyze your model together with your stakeholders with other departments of your organization marketing or other business or engineering departments and figure out together how you can resolve the issue so once you understand your model you could interpret your model then you could put it to action you could then deploy it so this is inherently tied to model deployment so understand the model interpret the model and then finally you can deploy the model thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

In this video, I will be discussing about the importance of interpretable machine learning models as well as some of the issues that you should think about in interpreting and explaining models. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor 📎INFOGRAPHIC: https://github.com/dataprofessor/infographic/blob/master/05-Interpretability-of-Data-Science-Models.JPG ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe: If you're new here, it would mean the world to me if you would consider subscribing to this channel. ✅ Subscribe: https://www.youtube.com/dataprofessor?sub_confirmation=1 ⭕ Recommended Tools: Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it! ✅ Check out Kite: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : https://amzn.to/3fO0JiZ ✅ Python Data Science Handbook : https://amzn.to/37Tvf8n ✅ R fo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 37 of 60

1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

This video teaches the importance of interpretable machine learning models and how to achieve interpretability through techniques like feature selection and hyperparameter optimization. It also covers model evaluation metrics and handling imbalanced data sets. By watching this video, viewers will learn how to build and deploy interpretable models.

Key Takeaways
  1. Retrain model if performance metrics indicate poor performance
  2. Adjust model composition
  3. Add features
  4. Collect additional samples
  5. Interpret model by identifying important features and their contributions to predictions
  6. Retrain the model with new data
  7. Handle imbalanced data through under sampling or over sampling
  8. Analyze the data set for potential outliers or missing features
  9. Deploy the model after understanding and interpreting it
💡 Interpretable machine learning models are crucial for making informed decisions and avoiding life-threatening consequences, especially in applications like medical diagnosis or self-driving cars.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →