Interpretable Machine Learning Models

Data Professor · Beginner ·📐 ML Fundamentals ·6y ago

Skills: ML Maths Basics80%Supervised Learning70%Unsupervised Learning60%ML Pipelines50%

Key Takeaways

The video discusses the importance of interpretable machine learning models, covering concepts such as data pre-processing, hyperparameter optimization, feature selection, and model evaluation metrics like R square, mean squared error, and accuracy. It also touches on handling imbalanced data sets and model retraining.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is tenon now Towson Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing okay so by now you probably have created some data science models and you're ready to move your data science model to the next stage which is to deploy the model but before doing that let's consider some important point here which is the interpretability of your data science model which essentially boils down to can you make sense of your data science model does it make sense can you interpret it do you know which features are important for making the prediction which are important for class a which are important for Class B so these are some of the questions that we're gonna cover about today and so without further ado let's get started so first head on to the data professor github where we will go to the infographic repository so here I have compiled all of the infographic that I have drawn so far and so far it has been five infographic the first one was starting out on the new year so the first one was building the machine learning model and this first one was also converted into a YouTube video and shortly after dr. Tatiana translated this into Portuguese and I have reached on the infographic that's posted here so big thanks to her for making this into Portuguese language and then the second infographic was about handling missing data so the idea of this infographic and a subsequent video which I will show the link here as well was suggested by Marco so big shout out to Marco for providing that idea so that we started in the data pre-processing in our series where we will cover about data pre-processing so the first one was about handling missing data so the infographic was also translated to Portuguese by dr. Tatiana and the third date for graphic is machine learning learning curve so the translation supporting gives is also in the making and will be released soon and the fourth infographic was based on a popular question on social media which was what are the skill sets for becoming a data scientist so I have summarized this into the HQ set mention here and also mentioned in the video on exploring the landscape of data science so if you haven't watched that check that out so the links are up in the card here and then the recent one is interpretability of data science models which is the topic that we're going to cover today so let's open up that one and click on the download because the finalists rather big and so you see that this one starts out with the trained model so before beginning let's head on over to the first infographic so as you can see the training model is almost the last stage of the building the machine learning model here so you normally would start out with your initial data set that you would like to create a model of and then after that you're going to do some data pre-processing you're gonna clean the data you're going to create the data you're going to remove redundant features from the data and then you get the preprocessdataset and then you have perform some form of data splitting using different ratios it could be 80/20 it could be 60/40 70/30 or it could be more than two data split so depending on you and then you're going to apply some learning algorithms and then perform some form of hyper parameter optimization and then you're gonna do feature selection in order to reduce the number of feature which could be potentially high number of feature into a lower set of features and then you would get your trained model right and then subsequent States will be mentioned in the next infographic as well so once you get your trained model you're going to use it to create the prediction your gonna predict some y values and then you're going to evaluate your model performance so before we can make any meaningful interpretations or meaningful use out of your predictive model we first must verify that the model is robust it has good performance and how do we do that so depending on your Y variable which could be quantitative or qualitative if it is quantitative you want to use regression where you could use R square mean squared error root mean squared error if your Y variable is qualitative you could use the classification where you could use accuracy sensitivity specificity and the Mathews correlation coefficient so based on these performance metrics you will determine whether you're a predictive model is robust or not if it is not you have to retrain the model again so by retraining the model you will have to adjust some of your model composition you might add additional features you might evaluate and then you figure out that you have left out some important feature which you will compute and add to your dataset or maybe you want to expand your dataset collect additional samples and do the model building process again if the robustness of the prediction law does not you have to repeat that step again until you get a satisfactory model once you have a satisfactory model then you're ready to interpret your models so some of the key issues in interpreting the models include looking under the hood of the predictive model figuring out which features are important for making the prediction for example if you want to classify whether your molecule is an active molecule or a inactive molecule then essentially that is a classification problem you want to classify whether the drug is active or inactive and once you have identified which features are contributing to your predictive model you have to determine whether they are contributing to the active group of data samples or the inactive group of data samples for example if you have a molecular weight feature or feature one for example if feature one is contributing to good prediction but feature one is it favorable for active compounds or is it favorable for the inactive molecule and how do you do that you might use some form of statistical analysis you could use essentially comparing the mean of your stratified data set by stratifying meaning that you subset your data further okay so for feature one analysis you will subset your data into active and inactive then you will determine what is the average value what is the mean value for the active group in particularly to feature one so what's the average value of feature one for the active group and for the inactive group and then based on that you will be able to determine which one has a higher or lower mean or whether they have the same and then you can use the pair t-test to statistically test for the statistical significance whether they are statistically different or not you could also use box plot to compare between the distribution of feature one between active and inactive group whether the box have roughly the same distribution maybe one has a smaller q1 and q3 range and one has larger gap difference and whether they are relatively higher or lower to one another so you will be able to determine that by performing some additional canals and then the next important question is what is the consequence of the prediction if your model produces wrong prediction what is the consequences of that is it life-threatening is their predictive model influencing the life or death matter of a patient of a user for example it could be assisting surgeons or physicians in diagnosing or in surgery of patients so if the model produces wrong prediction that would consequently lead to a life or death matter so in a similar fashion to a self-driving car if the prediction model analyzes the situation and produces a wrong advice it could lead to an accident and so the next logical step would mean what led to the wrong prediction you want to understand what led to good prediction or accurate prediction and which leads to wrong prediction so that would be possibly done by looking at whether your data set has any potential outliers or are you missing some potential features right so based on your feature analysis you could try to think the data set over again whether you're missing some features that are important that is not in consideration by your predictive model or maybe there are some situations that are extreme cases that are outliers and therefore it is outside the applicability domain of your data set or of your predictive model for example if you're creating a predictive model for classifying apples and the classification apples could be based on the color if you classify apples as red and green and let's say that you feed in an image of an orange then that orange would be outside the applicability domain of your predictive model because your model has not been trained to recognize an orange so what can you do in this situation you retrain the model by feeding in examples of oranges and so your predictive model wouldn't be updated with information about oranges and so the iteration will continue on and on and on if you want to train it with pears wearing it with bananas or other fruits that's another potential issue that might arise is maybe your data has imbalanced data set your class labels are in meaning that your number of apples to the number of oranges are imbalanced you might have ten times the number of apples to oranges you might have a thousand apples and only one times with oranges so your model is inherently biased to make a prediction that is favorable for apples so there are other ways on how you can handle the situation such as performing under sampling or over sampling so that is not the scope of this video and I could cover that in the future video ok so those are some of the major issues that you should consider about in interpreting your model you're trying to make sense of the model you're trying to add value to the data you're trying to provide insights actionable insights that could help the decision making process so take a critical look analyze your model together with your stakeholders with other departments of your organization marketing or other business or engineering departments and figure out together how you can resolve the issue so once you understand your model you could interpret your model then you could put it to action you could then deploy it so this is inherently tied to model deployment so understand the model interpret the model and then finally you can deploy the model thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

In this video, I will be discussing about the importance of interpretable machine learning models as well as some of the issues that you should think about in interpreting and explaining models. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor 📎INFOGRAPHIC: https://github.com/dataprofessor/infographic/blob/master/05-Interpretability-of-Data-Science-Models.JPG ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe: If you're new here, it would mean the world to me if you would consider subscribing to this channel. ✅ Subscribe: https://www.youtube.com/dataprofessor?sub_confirmation=1 ⭕ Recommended Tools: Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it! ✅ Check out Kite: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : https://amzn.to/3fO0JiZ ✅ Python Data Science Handbook : https://amzn.to/37Tvf8n ✅ R fo

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 37 of 60

← Previous Next →

How a Biologist became a Data Scientist

How a Biologist became a Data Scientist

WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch

Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

Quotes #1 on Big Data and Data Science

Quotes #1 on Big Data and Data Science

Quotes #2 on Big Data and Data Science

Quotes #2 on Big Data and Data Science

Quotes #3 on Big Data and Data Science

Quotes #3 on Big Data and Data Science

Quotes #4 on Big Data and Data Science

Quotes #4 on Big Data and Data Science

Quotes #5 on Big Data and Data Science

Quotes #5 on Big Data and Data Science

Data Science 101: Starting a Data Science / Data Mining Project

Data Science 101: Starting a Data Science / Data Mining Project

Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps

Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps

R Programming 101: How to Define Variables

R Programming 101: How to Define Variables

R Programming 101: Read and Write CSV files

R Programming 101: Read and Write CSV files

Data Science 101: Basic Command-Line for Data Science

Data Science 101: Basic Command-Line for Data Science

Strategies for Learning Data Science in 2020 (Data Science 101)

Strategies for Learning Data Science in 2020 (Data Science 101)

Building your Data Science Portfolio with GitHub (Data Science 101)

Building your Data Science Portfolio with GitHub (Data Science 101)

R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)

R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)

Exploratory Data Analysis in R: Towards Data Understanding

Exploratory Data Analysis in R: Towards Data Understanding

Exploratory Data Analysis in R: Quick Dive into Data Visualization

Exploratory Data Analysis in R: Quick Dive into Data Visualization

Machine Learning in R: Building a Classification Model

Machine Learning in R: Building a Classification Model

Machine Learning in R: Repurpose Machine Learning Code for New Data

Machine Learning in R: Repurpose Machine Learning Code for New Data

Data Science 101: Deploying your Machine Learning Model

Data Science 101: Deploying your Machine Learning Model

Machine Learning in R: Deploy Machine Learning Model using RDS

Machine Learning in R: Deploy Machine Learning Model using RDS

Data Pre-processing in R: Handling Missing Data

Data Pre-processing in R: Handling Missing Data

Machine Learning in R: Speed up Model Building with Parallel Computing

Machine Learning in R: Speed up Model Building with Parallel Computing

Data Science 101: Overview of Machine Learning Model Building Process

Data Science 101: Overview of Machine Learning Model Building Process

Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1

Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1

Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2

Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2

Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3

Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3

Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4

Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4

Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5

Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5

Machine Learning in R: Building a Linear Regression Model

Machine Learning in R: Building a Linear Regression Model

What programming language to learn for Data Science? R versus Python

What programming language to learn for Data Science? R versus Python

How to Become a Data Scientist (Learning Path and Skill Sets Needed)

How to Become a Data Scientist (Learning Path and Skill Sets Needed)

Using Python in R

Using Python in R

Interpretable Machine Learning Models

Interpretable Machine Learning Models

Making Scatter Plots in R [Data Visualisation in R series]

Making Scatter Plots in R [Data Visualisation in R series]

Machine Learning in Python: Building a Classification Model

Machine Learning in Python: Building a Classification Model

Compare Machine Learning Classifiers in Python

Compare Machine Learning Classifiers in Python

Hyperparameter Tuning of Machine Learning Model in Python

Hyperparameter Tuning of Machine Learning Model in Python

Practical Introduction to Google Colab for Data Science

Practical Introduction to Google Colab for Data Science

File Handling in Google Colab for Data Science

File Handling in Google Colab for Data Science

Pandas for Data Science: Create and Combine DataFrames / Rename Columns

Pandas for Data Science: Create and Combine DataFrames / Rename Columns

Machine Learning in Python: Building a Linear Regression Model

Machine Learning in Python: Building a Linear Regression Model

Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

How to Plot an ROC Curve in Python | Machine Learning in Python

How to Plot an ROC Curve in Python | Machine Learning in Python

Installing conda on Google Colab for Data Science

Installing conda on Google Colab for Data Science

Use native R on Google Colab for Data Science

Use native R on Google Colab for Data Science

How to Save and Download files from Google Colab

How to Save and Download files from Google Colab

Easy Web Scraping in Python using Pandas for Data Science

Easy Web Scraping in Python using Pandas for Data Science

Data Science for Computational Drug Discovery using Python (Part 1)

Data Science for Computational Drug Discovery using Python (Part 1)

Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)

Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)

Exploratory Data Analysis in Python using pandas

Exploratory Data Analysis in Python using pandas

Quick tour of PyCaret (a low-code machine learning library in Python)

Quick tour of PyCaret (a low-code machine learning library in Python)

How to Upload Files to Google Colab

How to Upload Files to Google Colab

How to Install and Use Pandas Profiling on Google Colab

How to Install and Use Pandas Profiling on Google Colab

How to Adjust the Style of Pandas DataFrame

How to Adjust the Style of Pandas DataFrame

How to use Bamboolib for Data Wrangling in Data Science

How to use Bamboolib for Data Wrangling in Data Science

How to use Pandas Profiling on Kaggle

How to use Pandas Profiling on Kaggle

This video teaches the importance of interpretable machine learning models and how to achieve interpretability through techniques like feature selection and hyperparameter optimization. It also covers model evaluation metrics and handling imbalanced data sets. By watching this video, viewers will learn how to build and deploy interpretable models.

Key Takeaways

Retrain model if performance metrics indicate poor performance
Adjust model composition
Add features
Collect additional samples
Interpret model by identifying important features and their contributions to predictions
Retrain the model with new data
Handle imbalanced data through under sampling or over sampling
Analyze the data set for potential outliers or missing features
Deploy the model after understanding and interpreting it

💡 Interpretable machine learning models are crucial for making informed decisions and avoiding life-threatening consequences, especially in applications like medical diagnosis or self-driving cars.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression

Medium · Machine Learning

Why Your Python Loops Are Creating the Wrong Functions

Learn why Python loops create functions with the same value and how to fix it using default argument capture and factory functions

Answer Calculator: Step-by-Step Math Help

Learn to use an Answer Calculator for step-by-step math help, making it a valuable tool for late-night studying or work

Learn Deep Learning by Hand (Beginner's Guide - Part 1)