Compare Machine Learning Classifiers in Python

Data Professor · Beginner ·🛠️ AI Tools & Apps ·6y ago

Skills: Supervised Learning80%ML Pipelines70%ML Maths Basics60%

Key Takeaways

This video demonstrates how to compare the performance of several machine learning classifiers in Python using scikit-learn and Seaborn, covering supervised learning, unsupervised learning, and machine learning pipelines. The video uses tools such as scikit-learn, pandas, and Seaborn to generate a synthetic dataset, split the data into training and test sets, and compare the accuracy scores of 14 machine learning models.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is Shannon non toss and Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing in this video we will be comparing the performance of 14 machine learning algorithms on a synthetically generated data set so without further ado let's get started so the first thing that you want to do now is to head over to the github of the data professor and click on the code repository scroll down and click on the Python folder and then you will see the comparing classifiers dot I py and B so go ahead and click on that one okay so if you don't have access to a computer where you can code just follow along on the github page here because it is already in the notebook format and for those of you who would like to follow along please download a copy by right-click on the raw link and then save link as into your computer so I'm gonna save this into the Python folder okay and then I'm going to open up my command prompt and then head over to the folder ok and then I'm going to activate my working environment in Khanda and then Jupiter notebook ok and then I'm gonna click on the notebook ok so here let's begin so the first thing that we want to do is to create a synthetic data set and in order to do that we're gonna use the make classification function of the scikit-learn package so to run this go ahead and shift enter and then we're going to generate the synthetic data set by using this made classification and then we're going to assign it to the x and y variable which will be newly generated as a result of this function and so the input argument that we're going to use here is we're going to create a synthetic data set comprising of 1000 samples so for n underscore samples we're going to have it assign a value of 1000 and for n underscore classes we're going to assign a value of 2 because we're going to create two classes for this data set and n underscore features we're going to have five so n underscore redundant we're going to have it zero and then we're going to assign the random underscore state to be one okay for we put the stability so let's go ahead and run this cell okay so let's examine the shape of the newly generated variable so X dot shape will give us 1,000 rows which is the first value and a second value correspond to five column which is a number of features here which we have already assigned a value of five and then the y dot shape will give us 1000 which will be the same dimension here because there are 1000 samples and no value follow this one so it means that there is one column which is the Y class label so the next step that we're gonna do is we're gonna split the data into 8020 so we're gonna import the library particularly we're gonna use the tween test split from the scikit-learn package so go ahead and run that and the display will be performed here using this train test split and it will take us input the x and y variables which corresponds to the five input features and the class label and test size with 0.2 which is the 20% and then the 80% will be for the tween and so here we will generate four variables concurrently and it will compress of extreme x test y train and why tests so the two x here are the input features the five input features and the white rain and y tests are the class labels right okay so the x train and the white rain will be used to generate the machine learning model and after we have done that we're going to apply the machine learning model to make a prediction which we will be using the X test so we're gonna mention that below so let's examine the data dimension so X underscore trained on shape will give us eight hundred by five because there are eight hundred samples and because eighty percent of 1000 is eight hundred and why trained on shape will give us eight hundred rows and one column X test out shape will give us 205 which corresponds to two hundred samples and five columns why test a shape will give us two hundred so the fun part is right here so we're gonna import all of the modules so the first one will be pandas SP de for the subsequent generation of the data frame of the results and then the big chunk of code here will be all of the machine learning algorithms that we're gonna use so we're gonna run that and the names of all of the machine learning algorithms are shown here in this names list and then the classifiers will contain a list of all of the machine learning algorithms that we're gonna use so here we're gonna use the basic input for the algorithms so in a future video we're probably gonna cover about how you can optimize these hyper parameters in an automated way so stay tuned for that one but today we're just going to use the default values so go ahead and run that line okay so now we're gonna iterate the construction of the machine learning models one by one in a for loop so firstly we're going to generate a empty scores variable where the model will be generated and then the scores of the model will be calculated and for each iteration which will correspond to the construction of one machine learning model out of the 14 machine learning models and for each of the machine learning model it will output the score and each score will be appended to the scores variable so because there are 14 machine learning algorithms the for loop will occur for 14 times and for each time it will output a score the accuracy score and add that to the scores variable and because it's gonna loop this over and over for 14 times therefore the scores variable will be a list of 14 accuracy score values so let's go ahead and run that okay so we haven't run this yet right do it again okay so the model building will take some time because it is looping over 14 algorithms okay so model building is finished and then we're gonna just type in scores so that we see the value of this variable so the accuracy score of the 14 machine learning models are shown here so the 14 machine learning model gave a accuracy score in the range of zero point seven nine and zero point eight eight five so maybe you're wondering that the score lists here is good it's informative but it's a bit too plain so let's see if we can spice it up a bit so this is the purpose of the fourth section here so analysis of the model performance okay so the first part we are going to import the library so we're going to use the pandas and we're going to use Seabourn and so then we're gonna create a data frame of the results so DF equals to PD which is using the pandas function dot data frame so this will create an empty data frame and then DF bracket and then we're gonna use single quotation name equal to name and the names here will coming from here the names right here the name of the machine learning classifier and then the scores will come from the scores so the scores variable will contain the fourteen accuracy score from the 14 machine learning algorithms so here we're gonna use name and score so we have the name of the 14 machine learning algorithms and the corresponding accuracy score so this looks much better than the previous list that that we can see here so maybe you're wondering okay this is good but can I have it a bit better okay so let's have a look here so what about we add some color to this okay so here we're gonna use the Seabourn and light pal a function and the color is green so we feel free to change this to other color that you like and we're going to style the background as you can see here using this function style background gradient and okay so I have to import Seaborn SSNs so let me put it up here oh I already have it okay but I didn't run it so let's run it okay there you go so you can see that the model with the least performance will have lighter shade of green and the best performance will have darker shade of green here and let's have a look in another way so if we make Bart plot of the model performance so let's say that we want to have the background to be white and then we're going to create a simple war plot so the y-axis will be the name of the machine learning algorithms and the x-axis will be the accuracy score and so here we're going to specify that the input data is coming from the DF data frame and so this is a graphical view of the same data that we have here so to switch ever want that you like or you can even use both so feel free to play around with this code and we're gonna cover in more depth of Seabourn in future videos so if there is any additional topic that you would like to be covered please let me know in the comments so try changing the input data to be a data that interests you and play around with the code modify it and then upload this to github so that your data science portfolio could grow and as always the best way to learn data science is to do data science and to build your data science portfolio so if you haven't yet check out the video that I have covered about building your data science portfolio so take that out in the link up here okay so until next time thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

In this video, I will show you how to compare the performance of several machine learning classifiers in Python. Particularly, we will generate a synthetic classification dataset and compare an exhaustive set of 14 machine learning algorithms from the scikit-learn package. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor 📎CODE: https://github.com/dataprofessor/code/blob/master/python/comparing-classifiers.ipynb ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe: If you're new here, it would mean the world to me if you would consider subscribing to this channel. ✅ Subscribe: https://www.youtube.com/dataprofessor?sub_confirmation=1 ⭕ Recommended Tools: Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it! ✅ Check out Kite: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : https://amzn.to/3fO0JiZ ✅ Python D

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 40 of 60

← Previous Next →

How a Biologist became a Data Scientist

How a Biologist became a Data Scientist

WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch

WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch

Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery

Quotes #1 on Big Data and Data Science

Quotes #1 on Big Data and Data Science

Quotes #2 on Big Data and Data Science

Quotes #2 on Big Data and Data Science

Quotes #3 on Big Data and Data Science

Quotes #3 on Big Data and Data Science

Quotes #4 on Big Data and Data Science

Quotes #4 on Big Data and Data Science

Quotes #5 on Big Data and Data Science

Quotes #5 on Big Data and Data Science

Data Science 101: Starting a Data Science / Data Mining Project

Data Science 101: Starting a Data Science / Data Mining Project

Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps

Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps

R Programming 101: How to Define Variables

R Programming 101: How to Define Variables

R Programming 101: Read and Write CSV files

R Programming 101: Read and Write CSV files

Data Science 101: Basic Command-Line for Data Science

Data Science 101: Basic Command-Line for Data Science

Strategies for Learning Data Science in 2020 (Data Science 101)

Strategies for Learning Data Science in 2020 (Data Science 101)

Building your Data Science Portfolio with GitHub (Data Science 101)

Building your Data Science Portfolio with GitHub (Data Science 101)

R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)

R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)

Exploratory Data Analysis in R: Towards Data Understanding

Exploratory Data Analysis in R: Towards Data Understanding

Exploratory Data Analysis in R: Quick Dive into Data Visualization

Exploratory Data Analysis in R: Quick Dive into Data Visualization

Machine Learning in R: Building a Classification Model

Machine Learning in R: Building a Classification Model

Machine Learning in R: Repurpose Machine Learning Code for New Data

Machine Learning in R: Repurpose Machine Learning Code for New Data

Data Science 101: Deploying your Machine Learning Model

Data Science 101: Deploying your Machine Learning Model

Machine Learning in R: Deploy Machine Learning Model using RDS

Machine Learning in R: Deploy Machine Learning Model using RDS

Data Pre-processing in R: Handling Missing Data

Data Pre-processing in R: Handling Missing Data

Machine Learning in R: Speed up Model Building with Parallel Computing

Machine Learning in R: Speed up Model Building with Parallel Computing

Data Science 101: Overview of Machine Learning Model Building Process

Data Science 101: Overview of Machine Learning Model Building Process

Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1

Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1

Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2

Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2

Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3

Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3

Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4

Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4

Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5

Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5

Machine Learning in R: Building a Linear Regression Model

Machine Learning in R: Building a Linear Regression Model

What programming language to learn for Data Science? R versus Python

What programming language to learn for Data Science? R versus Python

How to Become a Data Scientist (Learning Path and Skill Sets Needed)

How to Become a Data Scientist (Learning Path and Skill Sets Needed)

Using Python in R

Using Python in R

Interpretable Machine Learning Models

Interpretable Machine Learning Models

Making Scatter Plots in R [Data Visualisation in R series]

Making Scatter Plots in R [Data Visualisation in R series]

Machine Learning in Python: Building a Classification Model

Machine Learning in Python: Building a Classification Model

Compare Machine Learning Classifiers in Python

Compare Machine Learning Classifiers in Python

Hyperparameter Tuning of Machine Learning Model in Python

Hyperparameter Tuning of Machine Learning Model in Python

Practical Introduction to Google Colab for Data Science

Practical Introduction to Google Colab for Data Science

File Handling in Google Colab for Data Science

File Handling in Google Colab for Data Science

Pandas for Data Science: Create and Combine DataFrames / Rename Columns

Pandas for Data Science: Create and Combine DataFrames / Rename Columns

Machine Learning in Python: Building a Linear Regression Model

Machine Learning in Python: Building a Linear Regression Model

Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data

How to Plot an ROC Curve in Python | Machine Learning in Python

How to Plot an ROC Curve in Python | Machine Learning in Python

Installing conda on Google Colab for Data Science

Installing conda on Google Colab for Data Science

Use native R on Google Colab for Data Science

Use native R on Google Colab for Data Science

How to Save and Download files from Google Colab

How to Save and Download files from Google Colab

Easy Web Scraping in Python using Pandas for Data Science

Easy Web Scraping in Python using Pandas for Data Science

Data Science for Computational Drug Discovery using Python (Part 1)

Data Science for Computational Drug Discovery using Python (Part 1)

Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)

Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)

Exploratory Data Analysis in Python using pandas

Exploratory Data Analysis in Python using pandas

Quick tour of PyCaret (a low-code machine learning library in Python)

Quick tour of PyCaret (a low-code machine learning library in Python)

How to Upload Files to Google Colab

How to Upload Files to Google Colab

How to Install and Use Pandas Profiling on Google Colab

How to Install and Use Pandas Profiling on Google Colab

How to Adjust the Style of Pandas DataFrame

How to Adjust the Style of Pandas DataFrame

How to use Bamboolib for Data Wrangling in Data Science

How to use Bamboolib for Data Wrangling in Data Science

How to use Pandas Profiling on Kaggle

How to use Pandas Profiling on Kaggle

This video teaches how to compare the performance of several machine learning classifiers in Python using scikit-learn and Seaborn. It covers generating a synthetic dataset, splitting the data into training and test sets, and comparing the accuracy scores of 14 machine learning models. The video is useful for beginners who want to learn about machine learning classification and model performance analysis.

Key Takeaways

Create a synthetic dataset using make_classification function
Split the data into 80% training set and 20% test set using train_test_split
Assign random state to 1 for stability
Examine the shape of the generated variables X and y
Import necessary modules
Create a data frame of results using pandas and Seaborn
Add color to the data frame for visualization
Run machine learning algorithms in a for loop
Calculate accuracy scores for each algorithm
Run code to display model performance

💡 Using Seaborn for visualization can help to effectively compare the performance of different machine learning models

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related Reads

The Food Stayed Real. The World Around It Changed.

Learn how AI transformed real breakfast photographs into various art forms without altering the food itself

AI APIs in 2026: The Honest Developer's Guide to Choosing One

Learn how to choose the right AI API for your project by considering tradeoffs, not just picking the 'best' model

Dev.to · Shaw Sha

A real satellite, a real pass, a real Doppler shift

Apply real satellite data to a link budget tool to analyze Doppler shift and margin curves

Medium · Python

Corvorum OS 1.0 - Sistema Operativo Tecnomántico

Learn about Corvorum OS 1.0, a technomantic operating system with local AI and Windows support, and how it can benefit developers

Dev.to · Technomantus Corvi

Claude Tag Is Dangerous for Your Business

Leveling Up with Eric Siu