Introduction to Machine Learning | Learning ML with Scikit | Iris Dataset | Part 2
Key Takeaways
The video demonstrates machine learning concepts using Scikit-learn and the Iris dataset, covering topics such as exploratory data analysis, model evaluation, and selection. It showcases the implementation of various machine learning algorithms, including logistic regression, linear discriminant analysis, and support vector machines, to predict iris types.
Full Transcript
[Music] what's going on guys hope you're having a great day and welcome to today's video today we're going to go through part two of the iris dataset what we're gonna finish off the tutorial circled in red we have what we're gonna cover in today's video where we learn how to split train and test the data and then we're gonna learn to build numerous animal models and determine which one is best fit for our data you're also going to go back over a little bit more of Jupiter and get more familiar with scikit-learn but we've covered a lot of that in the first video if you didn't see part 1 I'll link it down in the description below you should definitely check it out so you can follow along with this video so let's get into today's video we hop back in right where we left off last video at data visualization so we'll do a quick debrief of where we caught up to last video we started with importing the required dependencies loading the RS data set summarizing our data set peeking into our data and doing some stats and looking in our data even deeper so now we're at data visualization so the goal of data visualization is to see how our fields in our features correlate together so we can see what type of model we you know end up wanting to do whether we do a linear model a nonlinear model and stuff like that so we'll hop right into it so we've did EDA up above which is exploratory data analysis we're gonna want to do a visual EDA here because we're going to plot so we're gonna do our graphs we're gonna do from pandas the scatter matrix so PD dot plotting scatter matrix and what this does is this allows us to exactly what I said before see how each feature within our data set correlates with the other feet and see if you can see any patterns so taking our data frame correlation on our Y so our features because we want to predict which I mean on our targets aren't labels which is the Y because we want to see what our features correlate together what how it affects our target are classified iris and we'll set the fig size to ten by ten or s is 150 and then we'll just hit our marker two nice easy one as d so we run this it might not run depending if we need to employ scatter matrix oh no it does perfect so we see this nice clean crisp graph with our features on the left and our features on the horizontal axis so really we can see that where these bar graph kind of looking graphs are is when it's correlates to itself so sepal length and sepal length stuff like that but what we really want to look at is how sepal length and all the other features or feature a feature how they correlate with other features because that tells us how they're related so we can see right away there there are lint some linear trends like petal length - sepal width seems to be petal length i mean - petal width which is what you expect seems to have some linear relations as well as sepal length - petal length so the lengths looks like they're kind of correlated literally is it you can see it sepal length increases petal length looks to increase as well but then there's ones like this supple wet it's petal length which really doesn't have correlation it's all over the place a bunch of them you can see they're all over the place so this tells us that our data is kind of a good mix that we could probably use various different linear and nonlinear models which is why I actually chose this data set so I can show you how to do some linear models and some nonlinear models so that pretty much is it for our data visualization we can see that we can implement both types of models so let's move on to our valuator algorithms so I've gone ahead and populated some text to go cover what we're gonna go over in this section and evaluate your algorithm section so in this section we're gonna cover splitting our data into our training in our test set we're gonna set up validation using 10-fold cross-validation method which I'll explain down below we're gonna build six separate models for predicting the iris type based off the features or then we're going to determine the best model for the data so I've also gone up to into the import dependencies and added a lot more SK learn sigh kit dependencies that we're gonna need so you're gonna want to go up here and copy these down these are all the models we're gonna use and then scoring and report metrics for determining which one which model is the best yep so one down so let's play our data be open a new one let's set up our test set size so I recommend that you do between either 80 and 70 for your training set so the rest should be your test set so I'm gonna do 80 for my training so I'll just test that is 0.2 that means to my trainees 0.8 what you want to do is set a seed value I'm just gonna set it to 7 setting a seed for this next line allows it so that if you keep running your model over and over again it like the test set will remain the same values and your training will remain the same values so that way when you keep going back and forth you can keep the results the same between models like you're not splitting your different tests and training sets so it's a good good practice best practice to do so we're going to use train test splits from si kit to split our data so extreme X test y train my test this is gonna split it into our different training and training label testing data and testing labels train test split is the function from Sai kit that we've imported up above takes in our date our features our labels and now we this is our gr test size so equals just test set size and the random state is where we put the seed so that all of our models are gonna have the same training and test data and it's not going to jumble it all up so we go to control and rerun that perfect so now we can move down to actually doing our cross validation so I mentioned above we're going to be doing 10-fold cross-validation this is a technique to determine the effectiveness of your training so what it does is that she takes the training data set and splits it into ten parts or folds as they call them and then it uses one for validation and the rest for training and it iterates through and does this for all different combinations so it'll do it'll take the first nine portions train with that validate with the other one then take this next nine validate with the first one and it'll rotate through all the different combinations and really validate our test set or our training set it rather and what this does is it allows us to keep our test set pure I'm biased we won't keep going and validating on our test set we keep it pure until we're ready to actually test and see whether our model is good and this allows us to tune hyper parameters like the learning rate or batch size in our training and not have to do that in our testing set so that might be a lot thrown at you let's just get to know that 10 volt cross-validation it'sjust allows us to keep our test set set aside and not use it until we know that our model is good enough to be used on the test set so now we're ready to build our models so like I said before we don't know exactly which type of model is gonna be best at solving our data set from the data visualization it appeared that some variables are linearly separable so we're gonna go ahead and just use three linearly separable and three non linear separable models so the algorithm is we're gonna use our logistic regression which is unwinnable linear discriminant analysis which is also linear model K nearest neighbors which is nonlinear classification and regression trees nonlinear Gaussian naive Bayes nonlinear and support vector machines which are linear and I'll have future videos where we go in-depth about these algorithms and how or what they're actually doing but for this it's just an intro video we're just going to show you how to create from models using these and then see if they're any good so pop up a cell create a list of models some models equals empty now we're gonna go ahead and start adding in our models so models dot append and this one will be our top one let's just a regression label L R and then we'll just stick regression which is if we go up here to the top and our dependencies we see that we have imported from scikit-learn a linear model logistic regression so now we can just college' stack aggression and get the functionality of creating a model so solver equals Lib linear these are just psych psych it terms to distinguish what type of model are wanting and it is multi class because we have different types of viruses and we'll set two autos it is automatically and we'll run that to see if it works okay yep no air is thrown so we can continue on and we'll just do this for each and every one so I'll come back when I've done it for each and every one so I've gone ahead and appended every model into our models lists you can see that we have all six of these so now we can go ahead and actually cross the holidayed on them so all the results last time t and that names list is empty and now we're going to loop through our models and train so let's have a for loop for name and models in models we are going to want to do the cross full validation so k fold equals k fold from scikit-learn our number of splits oars number of folds we want to do ten 10-fold cross-validation is a good best practice and usually works the best and then we're going to do a random state to our C that we named above this way every model is using the same data and it's not jumbling it so now we're gonna do our cross validation results equals R cross Val score from side kit which takes our model our X training data and our Y training which is the labels and then we set cross file equal to okay fold we just created and our accuracy metric is or our scoring metric is going to be accuracy so this will give us a percentage I thought let's create our results and add each one to our results perfect and now we're gonna go ahead and just our printing stuff well washy first of all append the names into our names list and now we can set our printing message so our message well do our % s % s there we go % F F so this is just setting up our print statement to look nice if you're not familiar with percentage symbols and how to format print strings then I would might create video for you on that let me know down below what you from this video what you guys want more explaining on and we'll go over it so we're gonna do the mean across each because as I said before it's gonna split into nine about nine training and one validation the train data into nine training and one validation and keep iterating through all different combinations so we're gonna want to take the mean from that and then we're also going to want to take will print the standard deviation and from these values we should be able to see oh I didn't print it let's go ahead and print it let's go ahead and actually print the message and oh we want to do modeling models okay perfect so now we can see the standard average accuracy on the validation set so we're getting ninety six point six percent accurate for our logistic regression or linear discriminant 97.5 or canines ninety eight point three which is how super-high our cart is ninety-six our naive Bayes is 97.5 and our SVM is ninety nine point one Wow and we can see that our standard deviation is really really low on this one and it's pretty pretty much the same on all the rest so that's evaluating our algorithms on the training data so this will predict how well we're gonna do on the test data and whether we're ready to go the test data these are all super high and show that linear models and non linear models are both good on our data set on the IRS data set for predicting the IRS type so for making predictions and doing our testing or I'm just going to move forward with the top two so I'll move forward with K nearest neighbor and support vector machines because it's the same thing for each way to test it we're just gonna show and choose which actual one it's the top of the best of the best so yeah so we'll start making predictions we'll start with our K and n so we'll set up our model again so k n equals our K neighbors classifier and we're gonna train on the whole training set we're not gonna do the cross-validation anymore we'll just use that to see which model would be good going moving forward and now we're gonna use the whole training data to Train so set up that I know KN we use so I KITT learns dot fit this is how you train your data for the full set and you do X chain and just watching it's that easy and you run it and you can see it's Auto the leaf size this is all just defaulted defaults to the nearest neighbors five which is usually best practice a relatively strong and good and nearest neighbor size so we're gonna go with that so that's the training that's all it's simple as that you just set up your model and then you do dot fit and give it your trip asset in your training data and now we can do predict so we'll go predictions equals KN on our model predict and you give it your X data of the test set you don't want to give it the Y because it should be guessing that and now we go score cannon equals our accuracy score from psych it where now we give it Y test and predictions and it will give us the score of how well it did oh I got a do print score okay cool so that shows us that when we train the model and then predict using the test set we have an accuracy of 90% so out of every 10 we're getting 9 right and 1 is mislabeled so that's pretty good so we're gonna do now I'm going a little deeper and actually use a couple tools to see which ones we got wrong and where our model and wrong so this is called we're gonna print the confusion matrix and the classification report so confusion matrix will give you a matrix showing like false positives false negatives positive positives like true positives and stuff like that so you can see which one went wrong one and same it's classification it's going to tell us the accuracy and the weighted averages so confusion matrix equals this method from so I KITT these are all from side ket they're all in the dependencies that we put up above and then we're shouldn't give it Y tests and predictions same as our scores and we're gonna go print we'll give this a lot you label it which is let's look nice confusion matrix skip a line oh no it's called comma and then our classification report is the exact same saying it gets the y test and predictions and we will print it the exact same way as well [Music] so here's our confusion matrix matrix it shows that we had so the first column is we had seven of our first iris type in our test set so let's go up and see what our first type is our first type is so Tosa second type first column virginica so first Atossa and our test set there were seven of them and it got them all right because there's none that guessed wrong for the second iris there was 11 of them but a guest two of them as the third type and there was nine of our third iris and it gets one of them to be the second one so that's three got wrong two here and one here so this gives us this is the precision that's the confusion matrix this classification report it gives us all these values and shows our average that you can shows us that we got one percent on the first on label 1 and then it shows all your averages and weighs them so you can see so we're going to do this exact same thing with so I'll just we're gonna do all this again with our SVM so I'll just grab that I'll quickly get put marked down as SVM [Music] so now we'll change this over to SVM so SVM equals SVC and it takes in as gamma to be Auto and change this to SVM so we can now train it with our full set perfect is the default parameter showing you what SVC e takes in so now we're gonna do our SVC dot product I'll just call productions to the UM don't dict and we give it our X test same as before and now we'll do our score SVM equals our accuracy score and will give the Y tests and predictions and now we will print our score oh I definitely name something wrong ah this is productions - what is giving me a little typo so we can see that actually our SVM is 93% accuracy settlement accurate on the test set so it actually looks like it's better than our KNN so we're going to do the exact same confusion matrix on the SVM print it in here and I'll just label everything as to how far to take some projections to predictions - HUP - HUP - okay so you can see which makes sense our test set is the exact same as before but that we yet we got a hundred percent on the first two and only one wrong on the third so that averages out to 93 percent accurate so in conclusion a little markdown conclusion you get a header to that since the accuracy on our SVM model was greater than that of our KN model and I'll put in brackets out what percentage they were 93.3% 90 that are um or you can say since alright action or a CI model if 90% was greater than that about it can be deemed that the SVM model is best for our data set it best represents our dataset as it is almost perfect highly accurate and that's that for this video guys so I'm start to finish we've gone over a law and I know we imported our acquire dependencies loader dataset summarize our iris dataset you can see our feature are four features and then we can take to some analysis some stats some data visualization and then we can from our data visualization evaluate different types of algorithms using cross fold validation to see which we want to move forward with and trained fully so we chose kan and SVM is they were the highest rated and then from there our final step is to make predictions so we test use our test set trainer models on our training and then predict using our test set and then actors using accuracy square score from psych it and then that will give us the accuracy on our test set and then we can do some reports and matrices to visualize it better and see where Adachi went wrong and see what our test set was laid out layout like and then in conclusion we can deem that one model is better representative of our data set and trains at best so yeah I hope you learned a lot in this video guys and I'm looking forward to creating more content for you let me know down in the comments below what kind of content you want to see and smash that like button thanks guys [Music]
Original Description
Introduction to Machine Learning Part 2 - A step-by-step tutorial on how to go about solving simple machine learning problems. This video focuses on learning how to solve the famous Iris Dataset using Scikit Learn, Jupyter notebook and Python.
#machinelearning #iris #tutorial
Watch Part 1 Here: https://www.youtube.com/watch?v=eXTPngsx-as
In this video I cover:
1. How to do data visualization and determine which type of models to use.
2. How to evaluate and use cross-validation on models.
3. How to train, test and split your data.
4. Predict your data and determine how accurate your models are on test set.
If you want to look at the code you can see it here: https://www.youtube.com/redirect?q=https%3A%2F%2Fgithub.com%2FtheAIGuysCode%2FIris-Tutorial&redir_token=-bBG1SyHJJUfu5NaRuuD2vUxI698MTU3MTk2ODkwNEAxNTcxODgyNTA0&v=eXTPngsx-as&event=video_description
If you enjoyed the video, toss it a like! 👍
To Subscribe: https://www.youtube.com/channel/UCrydcKaojc44XnuXrfhlV8Q?sub_confirmation=1
Thanks so much for watching!
- The AI Guy
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from The AI Guy · The AI Guy · 3 of 28
1
2
▶
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
How to Setup a Machine Learning and AI Environment
The AI Guy
Introduction to Machine Learning | | Learning ML with Scikit | Iris Dataset
The AI Guy
Introduction to Machine Learning | Learning ML with Scikit | Iris Dataset | Part 2
The AI Guy
How to Install OpenCV on Windows | OpenCV Python Tutorial | Setting up Computer Vision
The AI Guy
How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision
The AI Guy
YOLOv3 Object Detection with Darknet for Windows/Linux | Install and Run with GPU and OPENCV
The AI Guy
Creating a YOLOv3 Custom Dataset | Quick and Easy | 9,000,000+ Images
The AI Guy
Train YOLOv3 Custom Object Detector with Darknet | Aimbot and Security Camera | Fast and Easy
The AI Guy
Create Labels and Annotations for Custom YOLOv3 Google Images Dataset | LabelImg Tutorial
The AI Guy
How to Build an Object Detection Classifier with TensorFlow 2.0 on Windows/Linux
The AI Guy
YOLOv3 in the CLOUD : Install and Train Custom Object Detector (FREE GPU)
The AI Guy
A.I. Learns to Play World's Hardest Game (QWOP)
The AI Guy
How to Build Object Detection APIs Using TensorFlow and Flask
The AI Guy
Getting Started with Azure Machine Learning Studio
The AI Guy
Real-time Yolov3 Object Detection for Webcam and Video (using Tensorflow)
The AI Guy
How to Build an Object Tracker Using YOLOv3, Deep SORT and TensorFlow
The AI Guy
YOLOv4 in the CLOUD: Install and Run Object Detector (FREE GPU)
The AI Guy
YOLOv4 in the CLOUD: Build and Train Custom Object Detector (FREE GPU)
The AI Guy
YOLOv4 Object Detection with TensorFlow, TensorFlow Lite and TensorRT Models (images, video, webcam)
The AI Guy
How to Build a Custom YOLOv4 Object Detector using TensorFlow (License Plate Detector)
The AI Guy
Counting Objects Using YOLOv4 Object Detection | Custom YOLOv4 Functions with TensorFlow
The AI Guy
Object Tracking Using YOLOv4, Deep SORT, and TensorFlow
The AI Guy
Crop and Save YOLOv4 Object Detections | Custom YOLOv4 Functions with TensorFlow
The AI Guy
License Plate Recognition Using YOLOv4 Object Detection, OpenCV, and Tesseract OCR
The AI Guy
YOLOv4 in the CLOUD: Build Object Tracking Using DeepSORT in Google Colab (FREE GPU)
The AI Guy
How to Use Webcam In Google Colab for Images and Video (FACE DETECTION)
The AI Guy
Real-time YOLOv4 Object Detection on Webcam in Google Colab | Images and Video
The AI Guy
Getting Started with Agent Development Kit Tools (MCP, Google Search, LangChain, etc.)
The AI Guy
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · AI
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Data Science
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Programming
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Python
🎓
Tutor Explanation
DeepCamp AI