Compare Machine Learning Classifiers in Python

Data Professor · Beginner ·🛠️ AI Tools & Apps ·6y ago

Key Takeaways

This video demonstrates how to compare the performance of several machine learning classifiers in Python using scikit-learn and Seaborn, covering supervised learning, unsupervised learning, and machine learning pipelines. The video uses tools such as scikit-learn, pandas, and Seaborn to generate a synthetic dataset, split the data into training and test sets, and compare the accuracy scores of 14 machine learning models.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is Shannon non toss and Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing in this video we will be comparing the performance of 14 machine learning algorithms on a synthetically generated data set so without further ado let's get started so the first thing that you want to do now is to head over to the github of the data professor and click on the code repository scroll down and click on the Python folder and then you will see the comparing classifiers dot I py and B so go ahead and click on that one okay so if you don't have access to a computer where you can code just follow along on the github page here because it is already in the notebook format and for those of you who would like to follow along please download a copy by right-click on the raw link and then save link as into your computer so I'm gonna save this into the Python folder okay and then I'm going to open up my command prompt and then head over to the folder ok and then I'm going to activate my working environment in Khanda and then Jupiter notebook ok and then I'm gonna click on the notebook ok so here let's begin so the first thing that we want to do is to create a synthetic data set and in order to do that we're gonna use the make classification function of the scikit-learn package so to run this go ahead and shift enter and then we're going to generate the synthetic data set by using this made classification and then we're going to assign it to the x and y variable which will be newly generated as a result of this function and so the input argument that we're going to use here is we're going to create a synthetic data set comprising of 1000 samples so for n underscore samples we're going to have it assign a value of 1000 and for n underscore classes we're going to assign a value of 2 because we're going to create two classes for this data set and n underscore features we're going to have five so n underscore redundant we're going to have it zero and then we're going to assign the random underscore state to be one okay for we put the stability so let's go ahead and run this cell okay so let's examine the shape of the newly generated variable so X dot shape will give us 1,000 rows which is the first value and a second value correspond to five column which is a number of features here which we have already assigned a value of five and then the y dot shape will give us 1000 which will be the same dimension here because there are 1000 samples and no value follow this one so it means that there is one column which is the Y class label so the next step that we're gonna do is we're gonna split the data into 8020 so we're gonna import the library particularly we're gonna use the tween test split from the scikit-learn package so go ahead and run that and the display will be performed here using this train test split and it will take us input the x and y variables which corresponds to the five input features and the class label and test size with 0.2 which is the 20% and then the 80% will be for the tween and so here we will generate four variables concurrently and it will compress of extreme x test y train and why tests so the two x here are the input features the five input features and the white rain and y tests are the class labels right okay so the x train and the white rain will be used to generate the machine learning model and after we have done that we're going to apply the machine learning model to make a prediction which we will be using the X test so we're gonna mention that below so let's examine the data dimension so X underscore trained on shape will give us eight hundred by five because there are eight hundred samples and because eighty percent of 1000 is eight hundred and why trained on shape will give us eight hundred rows and one column X test out shape will give us 205 which corresponds to two hundred samples and five columns why test a shape will give us two hundred so the fun part is right here so we're gonna import all of the modules so the first one will be pandas SP de for the subsequent generation of the data frame of the results and then the big chunk of code here will be all of the machine learning algorithms that we're gonna use so we're gonna run that and the names of all of the machine learning algorithms are shown here in this names list and then the classifiers will contain a list of all of the machine learning algorithms that we're gonna use so here we're gonna use the basic input for the algorithms so in a future video we're probably gonna cover about how you can optimize these hyper parameters in an automated way so stay tuned for that one but today we're just going to use the default values so go ahead and run that line okay so now we're gonna iterate the construction of the machine learning models one by one in a for loop so firstly we're going to generate a empty scores variable where the model will be generated and then the scores of the model will be calculated and for each iteration which will correspond to the construction of one machine learning model out of the 14 machine learning models and for each of the machine learning model it will output the score and each score will be appended to the scores variable so because there are 14 machine learning algorithms the for loop will occur for 14 times and for each time it will output a score the accuracy score and add that to the scores variable and because it's gonna loop this over and over for 14 times therefore the scores variable will be a list of 14 accuracy score values so let's go ahead and run that okay so we haven't run this yet right do it again okay so the model building will take some time because it is looping over 14 algorithms okay so model building is finished and then we're gonna just type in scores so that we see the value of this variable so the accuracy score of the 14 machine learning models are shown here so the 14 machine learning model gave a accuracy score in the range of zero point seven nine and zero point eight eight five so maybe you're wondering that the score lists here is good it's informative but it's a bit too plain so let's see if we can spice it up a bit so this is the purpose of the fourth section here so analysis of the model performance okay so the first part we are going to import the library so we're going to use the pandas and we're going to use Seabourn and so then we're gonna create a data frame of the results so DF equals to PD which is using the pandas function dot data frame so this will create an empty data frame and then DF bracket and then we're gonna use single quotation name equal to name and the names here will coming from here the names right here the name of the machine learning classifier and then the scores will come from the scores so the scores variable will contain the fourteen accuracy score from the 14 machine learning algorithms so here we're gonna use name and score so we have the name of the 14 machine learning algorithms and the corresponding accuracy score so this looks much better than the previous list that that we can see here so maybe you're wondering okay this is good but can I have it a bit better okay so let's have a look here so what about we add some color to this okay so here we're gonna use the Seabourn and light pal a function and the color is green so we feel free to change this to other color that you like and we're going to style the background as you can see here using this function style background gradient and okay so I have to import Seaborn SSNs so let me put it up here oh I already have it okay but I didn't run it so let's run it okay there you go so you can see that the model with the least performance will have lighter shade of green and the best performance will have darker shade of green here and let's have a look in another way so if we make Bart plot of the model performance so let's say that we want to have the background to be white and then we're going to create a simple war plot so the y-axis will be the name of the machine learning algorithms and the x-axis will be the accuracy score and so here we're going to specify that the input data is coming from the DF data frame and so this is a graphical view of the same data that we have here so to switch ever want that you like or you can even use both so feel free to play around with this code and we're gonna cover in more depth of Seabourn in future videos so if there is any additional topic that you would like to be covered please let me know in the comments so try changing the input data to be a data that interests you and play around with the code modify it and then upload this to github so that your data science portfolio could grow and as always the best way to learn data science is to do data science and to build your data science portfolio so if you haven't yet check out the video that I have covered about building your data science portfolio so take that out in the link up here okay so until next time thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

In this video, I will show you how to compare the performance of several machine learning classifiers in Python. Particularly, we will generate a synthetic classification dataset and compare an exhaustive set of 14 machine learning algorithms from the scikit-learn package. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor 📎CODE: https://github.com/dataprofessor/code/blob/master/python/comparing-classifiers.ipynb ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe: If you're new here, it would mean the world to me if you would consider subscribing to this channel. ✅ Subscribe: https://www.youtube.com/dataprofessor?sub_confirmation=1 ⭕ Recommended Tools: Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it! ✅ Check out Kite: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : https://amzn.to/3fO0JiZ ✅ Python D
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 40 of 60

1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

This video teaches how to compare the performance of several machine learning classifiers in Python using scikit-learn and Seaborn. It covers generating a synthetic dataset, splitting the data into training and test sets, and comparing the accuracy scores of 14 machine learning models. The video is useful for beginners who want to learn about machine learning classification and model performance analysis.

Key Takeaways
  1. Create a synthetic dataset using make_classification function
  2. Split the data into 80% training set and 20% test set using train_test_split
  3. Assign random state to 1 for stability
  4. Examine the shape of the generated variables X and y
  5. Import necessary modules
  6. Create a data frame of results using pandas and Seaborn
  7. Add color to the data frame for visualization
  8. Run machine learning algorithms in a for loop
  9. Calculate accuracy scores for each algorithm
  10. Run code to display model performance
💡 Using Seaborn for visualization can help to effectively compare the performance of different machine learning models

Related AI Lessons

Up next
I Asked ChatGPT to Apply to 500 Jobs (8 Interviews in 48 Hours)
Sabrina Ramonov 🍄
Watch →