Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3

Harshit Tyagi · Beginner ·📐 ML Fundamentals ·5y ago

Skills: Supervised Learning90%ML Pipelines80%ML Maths Basics70%

Key Takeaways

This video tutorial demonstrates training and fine-tuning machine learning models using scikit-learn, covering model selection, hyperparameter tuning, and deployment using Flask. It utilizes various algorithms such as linear regression, decision tree, random forest, and support vector machines.

Full Transcript

hello everyone welcome back to the channel data science with her so this is the third part of the Hindu and machine learning project series and in this particular part we would be deep diving into machine learning so we be selecting and training a few machine learning models we'll look at linear regression decision tree random forest and support vector machines we are going to choose the best out of them and then we're going to look into hyper parameter tuning evaluate all of these models using some performance metrics like root mean square error and at the end once we have figured out which one is working the best for us we would be finalizing it and then testing the entire system on test data that we set aside earlier in the beginning and once we are like sure that this is the model that we have to be deploying in production what we will do is we'll same that file the train model using the picol module so this is basically the curriculum for this particular series and then in the next video we would be covering how to deploy this model using a flask web service so let's get started [Music] so in the third part what we are going to do is we are going to select entry in a few models and these are the steps basically we are going to select an - in a few model which is linear regression decision tree random forests and a few others that we will see the second thing is we are going to need some performance metrics to evaluate these models so that we can compare them which with each other as in which one is doing better and then the third step is model evaluation using cross-validation we will see why do we need cross-validation what's the purpose there you know deep dive into hyper parameter tuning once we have figured out which ones are the better performing algorithms we are going to do hyper parameter tuning on those shortlisted models using grid search CB method class basically and then the fifth step is to check feature importance we will see which features are actually contributing more to the prediction so we're going to find out that and then six step is to evaluate the final system on testing data once you have logged your model we'll be testing it on the testing data that we set aside earlier and the seventh the final thing is - after a few random predictions random testing you can then save your model using the pickle module so first thing first I have imported all the libraries all the functions classes that I wanted so you can see all the meth all the liabilities all the methods and the classes that we used in the previous part I have imported right in the first cell over here so let's run this I have all of them imported then I'm reading the file as we did in the previous part as well I am removing the MGP the target variable from the data so I'm segregating the data labels aside and then I segregating the future variables all right so we have features in one data frame which is the data variable and then all the target variable the npg column is in my data labels variable all right now we have segregated the feature and the target variable next thing is I have created this function to pre-process origin the categorical column it's basically renaming one two three as India Lu Sen Germany and it will generate return on the data frame itself then we saw how we added the custom attribute adder this basically adds acceleration on power and I acceleration on cylinder this is again works the same way then the next thing that I have done here which is important to look at is this function over here so what I've done is so we saw how we could create those numerical pipeline and you know Club the numerical and categorical transformation using column transform now here what I've done is I have clubbed them I have encapsulated the functionality into functions so after 82 functions where this one is the numerical pipeline transformation so all it does is it takes this numerical columns of the data frame so this is how I am segregating the numerical data so using the Select D type and using the numerix which is float and integer and then I am calling the pipeline class instantiating it providing the method name the class that I am going to use so I'm first using imputing method strategy as medium so simple imputed class then custom attribute to add a class that is what we defined above then I am scaling all the values in my numerical data set so this returns all the numerical attribute data as well as the numerical pipeline the object so all we need to do is simply call transform method on this and we'll have Numerical transformed numerical data and then the second step which is basically doing everything so this is the main function which is pipeline transformer this what it does is it it is complete transformation pipeline for both numerical as well as categorical data so I have kept this categorical attributes as origin and then I have numerical attributes and the numerical pipeline that I would be using so this is coming from the up this function that I've defined above numerical pipeline transformer I get the numerical attributes the numerical data set and the numerical pipeline object then I am printing basically this is what I was testing with so I am printing the names of the columns of the numerical attributes the data set so you can delete this all right then I have created this full pipeline which is again a column transformer class object and passing it the name this is numerical this is categorical I have passed on the numerical pipeline object and for categorical and simply just one hot encoding all the values India USA Germany and then I am passing on the list of numerical columns and my list of categorical columns then what it does it calls the fit transform method on the data that I provided fit transfer method so basically I would simply provide the data frame to it and it would create a numpy array and then all of those computations basically adding custom ed addition of those attributes and everything would would be carried out and I would have prepared data with me so two steps I have reduced this whole process into two steps so I have gone from raw data to processed data in just two steps by encapsulating all of this functionality and functions so what I've done here is once you have your raw data I have this data with me all I need to do is first call the pre process origin columns which will basically rename one-two-three as India USA Germany and then all I need to do is call this pipeline transformer function and it will pre-process do everything on numerical as well as categorical data and give me the prepared data so this is what are prepared data basically looks like so this is our prepared data and this is the first value of the prepared data basically you can see that we have 11 attributes so 6 were the numerical columns then we added 2 as acceleration on cycle cylinder and acceleration on power so that makes it 8 and then 3 from the one hot encoding of the origin column we've got 11 attributes in total so preparation of data is working absolutely fine now the next thing that we have is selecting and training models now so firstly I have linear regression with me so linear regression as we know this we are working on a regulation problem so the first algorithm that comes to mind is fitting a straight line to our data so I have picked the linear regression model which is the first one that comes to mind so all I've done is I have imported this linear regression class from the linear model module from the scikit-learn package simple we must have all done it or seen it somewhere so what I've done is I have instantiated my linear regression class from the linear model module and then what I've done is I've invoked the fit method on this so let's sit on this so what it does is it gives you a linear regression object so this is this is all fine I have trained my model at this point dot fit basically means that we have trained your model and I have passed the prepared data so this was my prepared data that I got after running my pre-processed data 2 through pipeline transformer function I passed my prepared data and my data labels that I segregated right at the beginning and now what I am doing is I am trying out some predictions so here what I've done is I have picked five rows which is I'm calling a sample data then five labels again top five rows from the labels column as well as sample labels next thing I do is I'll have to transform it so I am passing my sample data to this pipeline transformer this will give me sample data prepared sample data then I am calling this predict function to predict the results or the predict the mpg value of these five rows so the sample data so I passed my sample data to it and this will basically print all the predictions so these are basically my predictions so this is so I am so this is basically I was printing that column so this is happening because of that let me run this again yeah so I have removed the list that was actually being printed it was because of a print statement so what I am what we have is these are the predictions of the first five rows the first five samples of the data from the training data itself now if we compare these results with the actual sample so these are the actual sample labels as you can see these this is the list of the actual sample levels so you can compare we have the first one is twenty nine point zero eight we have 32 either actual value was 32 the model predicted 29 model predicted 27 we have 31 26 this is also 26 this is fine 12 then we went to 18 here and then we had 22 for the last one and actual value was 26 so these are somewhat somewhat clues not good but somewhat rose okay so this was how linear regression performed but this is not a good method to quantify the difference between the actual value and the predicted value so what we do is we use certain performance metrics and when it comes to regression the typical performance metric that we use for evaluation of a particular module is root mean square error so root mean square error basically tells you how much error a particular system makes in its predictions and based on the difference between the actual value and the predicted value it increases the error as then basically as the distance between the difference between the actual and predicted value increases because of that squared term so if you have a lot of outliers in your data it's it would probably not be a good idea to use root mean square error you can use absolute value instead so we are going to use root mean square error and for that what we are going to do is we are going to use the mean squared error function from the scikit-learn matrix model all you need to do is simply predict the results these I am have predicted all of the values from linear regression objects predict method and I have stored them in MPG predictions then the mean squared error what it takes is you pass on the data labels the actual values and then PG predictions which are the predictions made by the model then num by square root so these are the mean squared errors now we have to find out the root mean square error so we have the number square root function you pass on the linear mean squared error values and it will give you the lean linear mean square root mean square error of the linear model the linear regression model so it has come out to be two point nine five we're going to be comparing these values for each of the models going ahead so let's now try it out with this season three model now the steps would still remain the same off with almost every model or every algorithm that you would import from the scikit-learn library now first thing you have to import the class of that particular algorithm so from the free model I have imported decision t3 regressive and I have instantiated it and created an object tree rag so this is a tree regressor now you train in the model using the fit method simple you provide the prepare data and you provide the labels you run it it will give your decision tree regression training model next thing that you have to do is your model is trained you have to predict by providing the prepared data to the training data for now because we are not using the testing data so tree red dot predict pass on the prepared data mean squared error same method you have to find out the error so you're using the mean squared error function from the circular matrix model you'll pass on the actual values predictions and then calculate the square root to basically find out the root mean square error so this is my tree root mean square error run this and we get zero point zero so this is like zero error so this would be amazing right but no no model is perfect so this means that a model has over fit the data to a great extent now what do we mean by over fit or under fit so basically when we say that a model fit model is over fitting the data it means that it performs really well on the training data but it performs poorly on unseen data which it has not seen yet so if I'm getting a prediction of a root mean square error of zero point zero it means that the model is highly overfitting the data what should we do to test it out on unseen data should we use the testing data that we have that's a that's a bad practice we have to keep it aside for the final model now so other thing that we can do is they can split the training data itself using the Train test split or stratified split but a great alternative hair is to use the scikit-learn k-fold cross-validation method now what it does is basically the k-fold cross-validation it randomly splits the training set into k folds so you decide how many folds you want I have used 10 here so what it does is it trains and evaluates the model k times so if you have picked 5 basically 5 folds so it will train and evaluate your model 5 times now how does it train that model so it takes a different fold for every evaluation every time and trains it on the other K minus 1 folds so if I am creating 10 folds so basically I my model would be trained on 9 folds and one fold would be reserved for testing my model so and you do that for every other model so basically you are getting 10 values 10 scores for your entire model so that's like you test it out on entire data set and there are so many permutation and combinations there and all of them are tested again a great evaluation method used in almost every industrial or every good project that I have come across so again we have the cross welds Col function from the model selection module all you need to do is simply I have invoked the function here pass it the tree regressive the mod I have provided the train or the prepared data the training data labels you have to provide the label as well and then scoring it it accepts a scoring parameter sorry argument and this scoring argument basically here is negative mean squared error so it generates a negative value so we'll have to use the negative sign ourselves to make it positive so these values are basically negative generally I think it's for signifying that error should be signified as negative so that's what it is but we will have to to compare it will basically add a negative sign as I have done in the next line so CV equals 10 mean I have created 10 cross validations here 10 folds so let me run this the last line is basically three regression our MSE scores so I'm basically this these would the cross valve core function would give us 10 scores and I am calculating the square root of all those scores and now let us check what we get in our tree regressor our MSE scores so you see that we have ten values in this array and all of these are the results the scores root mean square error values for all those folds that we used so we can calculate the average of all these values so I have used this mean function over here so the average has turned out to be three point three one for decision tree regressive now we can do the same thing the same entire process for the cross-validation for our linear aggressive linear regression model as well so what I've done is I passed the linear regression model prepare data data labels scoring method and then ten quick k-fold cross-validation attribute so run it and we get ten values here again and then the last thing that we do is we find out the average so for our decision tree it was 3.31 which is okay and then for linear regression it has reduced so linear regression has basically performed better than decision tree and it has turned out to be three point zero seven five seven so again let's try it out for another model called random forest another very it's a combination of a lot of decision trees separate decision trees combined into one so this kind of like an ensemble model so this is inside you can import the random forest regressor model algorithm from the ensembl module of Cyclone library again the process is same instantiate train using fit method provide the cross value or all the attributes so I have forest aggressive air which is the forest leg object prepared data labels scoring cross-validation folds which is ten over here then I calculate the square root of my negative values that I've got for all the ten values and then I for all the ten values of each of those folds I would be calculating the average of all those course so if we look at this this is still computing and let's see what we get so you see that we have two point five five eight nine so this has reduced from three point zero seven to two point five five so this is like a huge improvement random forest has turned out to be the better the best performer out of random forest linear regression and decision tree so the next model that we are going to test out a support vector machine regressor so again the process remains the same I have imported the svr class from the SVM module of Cyclone library instantiate I am using kernel linear because we this is like a linear model that I want use of support vector machine there I'm calling the training the fit method prepare data data labels and then I am calling the cross weld score function which is which will take the regressor prepare data labels scoring CV so on and so forth and when we run this so this again gives us three point zero eight so so far we see that random forest aggressor has turned out to be the best out of these so we are going to perform hyper parameter tuning as and we'll find out which set of parameters of the random forest regressive works the best so if we can improve the performance of random forest model from here on now the next task is to fine-tune the hyper parameters of the random forest aggressor now one way to do that is to simply fiddle around with some values manually so you can simply you know keep checking on which values and which parameters work turn out to be the best and see if you could reduce that our MSE value for random forests now the other approach or the smarter approach would be to use grid search çb of cyclic learns model selection module now what this method or what this class does is it takes some values that you want to experiment with and yet that you want to try out and what it does is it uses cross-validation to evaluate the model on each of those set of hyper parameter values that you have provided so it basically tests out all the hyper parameter combination and gives you the best result the best set best combination with which the random forest aggressor gives you the lowest our MSE value now what we need to do here is we need to first define the parameter grid so here I have defined this parameter grid these are the attributes these are the parameters that are random forests regressor consumes so I have parameter grid which is list of dictionaries so my first dictionary contains number of estimators so it is key value pairs key is the parameter and value is basically the list of values that I want to try out with so number of estimators I want to see how it performs with three ten and thirty and Max features is another hyper parameter that random forest consumes which is which basically tells you the number of features that every decision tree should use to find out those predictions and number of estimators for a random forest regressor means that how many how many trees that you are going to use in that particular random forest model so I have three number of estimators and for max features so total becomes twelve so this is three into four twelve combinations would be tried out for this particular dictionary and then for the second dictionary I have provided bootstrap equals false number of estimators is three and ten and then max features is two three four so this is 2 into 3 six bootstrap has faults so bootstrap is another just a boolean hyper parameter which tells the model if it should use bootstrap samples to build each tree so we have by default it's false so and the next thing that we do is we first instantiate our random forest read aggressive and then we instantiate grid search CV and we have to pass on these parameters the regression model the parameter grid that you have defined the scoring method because it's again using the cross-validation method so negative with mean squared error then you return train score is true so we want to look at the train score as well and then cross validation equals 10 so we are going to have ten folds now let's try to run this so this grid search object basically calls the method fit method and we pass the prepared data and the labels again the same way we train it and then it gives you the best params attribute which is this best parents attribute and it tells you which basically which combination has turned out to be the best combination so you can see that the grid set CV this is what we have tested it out on and then the best parameters has turned out to be max features 8 and number of estimators 30 so this is what has turned out to be would be the best parameters so far so what we are going to do is we are going to look at a few combinations so we want to see which parameters had returned what scores so we can Club them together the scores as well as the parameters that were used so let's try to run this what I've done here is firstly in the CV underscore results I have all the scores which is the list so CB under school I am capturing all the scores then I am zipping the scores along with the parameters and then I am printing the root mean square error as well as the parameters that was used so these are all of those scores you can sort them again so we see that the lowest value that we got was two point five five so we will see if we have got something lesser than two point five five which has turned out to be the best so we can see we have two point six nine over here we have two point seven zero and the MEXT the best parameters turned out to be this one right here so we have max features as eight and number of estimators 30 so you can again try out with a different set of parameters keep checking out keep iterating that's the process keep adding more values to test out within your grid search parametric grid so for now we are settling for two point six five or you can try out with some other configuration so for now we could also use the same the regular or the general random forest model which gave us this two point five five value of our MSE so I'm just going head ahead with grid CV these best practice best parameters so again the next thing that we do is we can check the importance of each of these features that we have used so the grid search dot best estimator and that has feature importances as well so if we run this so the grid search object that we had it has best estimator and the best estimator again would have feature importances attribute so this feature importances basically tells you the score of how how much important that particular feature is now these values might not make sense so we will try to Club them with the feature and what I have done is these are some extra attributes that were added then I have some numerical attributes and from there I would capture those numerical columns so numerical columns added with the extra columns so this would give us the scores of each of those features that we have so X acceleration on power which was the added attribute added feature that we calculated ourselves this has turned out to be the most important with a score of zero point zero to four and then we have acceleration on cylinder weight model so on and so forth these are diverse equals true so these are in reverse order so you can see this is the the top most is basically telling you the highest one the most important feature now after feats in importance what you need to do is you need to evaluate your entire system once you have finalized once you have log to your model so we are going ahead with the best estimator from the grid search series so we had the best estimator attribute this is basically our model which this is the Train model that you can use to predict so I have captured it in my final model variable now what I want to do is I want to capture my X test which is my testing features so from stress the stratified testing set I want to first drop the mpg the target variable then I want to segregate my labels of my testing data so that is I am segregating the mpg copying the mpg column from my testing set then firstly I will pre process my data pass it through so that I could rename all the regions and then I am passing it through the pipeline transformer this will give me the prepared data so the pre-processed data that I got from the first function this would be passed to the pipeline transformer this will create X test the testing prepared prepared data and I will make the final predictions so this is my final model that I have finalized I will call the predict method pass on my pit prepare data and then I'm going to calculate the mean squared error and then again find the route to find out the final root mean square error so on testing after testing the value has turned out to be three point zero one which is fine so our trainings accuracy or training root mean square error was two point five eight two point six five and this final value on testing data has turned out to be three point zero one seven so this is fine you can again go ahead I trait and try to find out a better combination but we are stinking with this particular value right here now the next thing that you do is you see that a lot of things are being repeated so what we do is we have created this function now this function what it does is it takes the configuration of the vehicle so displacement weight horsepower acceleration all of those features I create a dictionary so that we basically I have different deployed model in mind so I would be passing on JSON and all of those things so I have created this function which what it does is it accepts data frame as well as it accepts a dictionary of the vehicle with all the features and it predicts the MPG or the fuel efficiency of that particular vehicle now I am passing what it does is this function accepts the configuration which is it can be data frame or a dictionary now this config I am checking for a dictionary if it's a dictionary I converted into a data frame first otherwise this remains a data frame firstly I pre process my configuration my data frame by passing it through the pre pre process origin columns so this pre process data then again is transformed using the pipeline transformer function this gives us the prepared data we print not a necessity to print but I am printing just to check if everything is fine or not then I calculate by prediction using the model dot predict so I have passed on the model so this basically this function takes the configuration and the model that you have to use and it will generate the predictions for you let's run this so here I am testing this function on a sample so I have created this vehicle figuration here I have defined cylinders are four six eight so basically these are three rows that three instances that I'm checking it out on and these are the values that I have given so these are close to the values that are there in the training data but I have created my own combination of testing data so you can do that for yourself as well then call the predict mpg function and this will take the vehicle configuration and the final model that we have captured so this again this is the printing of this particular prepared data frame we can remove it and now if you run this we get the predictions so these are the three predictions for three rows so one other situation that we have is about one hot vectors so the the categorical transformation that we are doing is it is creating one hot vectors and a model is trained on three different categories so if you'll provide one of the categories to the data frame it would it would give you an error but this can be worked around with so if you want to work it with one of the values so currently I haven't done that I will be passing it the three categories so this would only accept only give you the right predictions if you provide all the three categories this is one two three otherwise there would be a mismatch between the number of attributes that the model is trained on and the number of attributes that the transformer generates after one hot encoding because it needs those three classes if you are passing on just any one of them and we are passing it through the pipeline transformer it will only generate lesser number of columns for each of those class which would give you an error so we are testing it out on each of those classes on each of those origins so based on that our predictions are these 34.8 three eighteen point five zero twenty point five six we can test it on some closest value you can create all these test samples for yourself which should be close to what you have seen in the training data so again you can do this the same way and the next thing is once you are finalized you have to save your model so for saving I am using the pickle module now this pickle package is comes in built in in the Python language so what we do is we open a file I'm creating and storing it as a model dot bin file writing it as f underscore out you dump the model the final model into this file and then close the file this is as simple as that and when you load it when you are reading from the file or basically when you want to read a model load a model and then use that model to make predictions what you do is simply open the file again and RB format as this particular file name and what you do is you do pickle dot load into this variable F underscore N and basically from that particular file and then your model comes in into this variable and you can use predict underscore mpg pass the configuration pass the model and run it to get the predictions so these are same as these predictions because of the same vehicle configuration now the next step is to deploy this model we have CM the model we are going to use this model file and make predictions using our flask web framework so that was all about machine learning and training models and coming up with the final version so if we are ready with the model and it needs to be deployed on account service platform and I have chosen hero who and I would be using flask web framework which is a Python web framework to deploy this to a web service so all of that detail about deployment of a particular model in the next video so schedule

Original Description

Part 3 of the Complete ML Project Series - The series will cover everything from Data Collection to Model Deployment using Flask Web framework on Heroku! Link to the Dataset: http://archive.ics.uci.edu/ml/datasets/Auto+MPG GitHub Repository: https://github.com/dswh/fuel-consumption-end-to-end-ml My Task CheatSheet: https://towardsdatascience.com/task-cheatsheet-for-almost-every-machine-learning-project-d0946861c6d0 Video on Data Science Portfolio: https://www.youtube.com/watch?v=_ANbV9lVA-M Book for basics of Machine Learning: http://themlbook.com/wiki/doku.php You can connect with me on: - LinkedIn: https://www.linkedin.com/in/tyagiharshit/ - Medium where I -write: https://medium.com/@harshit_tyagi - Twitter: https://twitter.com/tyagi_harshit24

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Harshit Tyagi · Harshit Tyagi · 21 of 60

← Previous Next →

Your PATH to learning Data Science

Your PATH to learning Data Science

Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.

Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.

Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.

Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.

Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub

Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub

Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists

Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists

Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions

Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions

Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules

Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules

NumPy Essentials for Data Science - part-1 | One Dimensional Array

NumPy Essentials for Data Science - part-1 | One Dimensional Array

NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array

NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array

Math For Data Science | Practical reasons to learn math for Machine/Deep Learning

Math For Data Science | Practical reasons to learn math for Machine/Deep Learning

Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy

Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy

Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science

Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science

Python vs R | The BEST programming language for your Data Science Project

Python vs R | The BEST programming language for your Data Science Project

Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy

Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy

The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account

The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account

Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey

Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey

Speeding up your Data Analysis | Hacks & Libraries

Speeding up your Data Analysis | Hacks & Libraries

How to build an Effective Data Science Portfolio

How to build an Effective Data Science Portfolio

End-to-End Machine Learning Project Tutorial - Part 1

End-to-End Machine Learning Project Tutorial - Part 1

Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2

Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2

Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3

Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3

Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4

Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4

Three Decades of Practising Data Science | Interview with Dean Abbott

Three Decades of Practising Data Science | Interview with Dean Abbott

Calculating Vector Norms - Linear Algebra for Data Science - IV

Calculating Vector Norms - Linear Algebra for Data Science - IV

Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow

Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow

Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N

Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N

Building projects with fastai - From Model Training to Deployment

Building projects with fastai - From Model Training to Deployment

October AI - Video Calling with One-Tenth of Internet Bandwidth

October AI - Video Calling with One-Tenth of Internet Bandwidth

November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...

November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...

Data Science learning roadmap for 2021

Data Science learning roadmap for 2021

Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra

Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra

Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)

Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)

Tableau vs Python - Building a COVID tracker dashboard

Tableau vs Python - Building a COVID tracker dashboard

[Explained] What is MLOps | Getting started with ML Engineering

[Explained] What is MLOps | Getting started with ML Engineering

Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science

Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science

Five hard truths about building a career in Data Science

Five hard truths about building a career in Data Science

Computing gradients using TensorFlow | Training a Linear Regression model from scratch.

Computing gradients using TensorFlow | Training a Linear Regression model from scratch.

Foundations for Data Science & ML - First steps for every beginner!

Foundations for Data Science & ML - First steps for every beginner!

Course Outline - Foundations for Data Science & ML

Course Outline - Foundations for Data Science & ML

How Machine Learning uses Linear Algebra to solve data problems

How Machine Learning uses Linear Algebra to solve data problems

Calculus for ML - How much you should know to get started

Calculus for ML - How much you should know to get started

Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking

Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking

AI Engineer - The next big tech role!

AI Engineer - The next big tech role!

AI researcher vs AI engineer | The next big tech role!

AI researcher vs AI engineer | The next big tech role!

Reviewing LLMs for content creation

Reviewing LLMs for content creation

Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering

Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering

High Signal AI - the most action-oriented newsletter on the web! #ai

High Signal AI - the most action-oriented newsletter on the web! #ai

Building an AI-powered Discord Chatbot Locally for FREE using Ollama

Building an AI-powered Discord Chatbot Locally for FREE using Ollama

Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes

Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes

Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2

Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2

Watch the full video on my channel - Roadmap to become an AI Engineer.

Watch the full video on my channel - Roadmap to become an AI Engineer.

Mesop - Python-based UI framework from Google!

Mesop - Python-based UI framework from Google!

How I automated my YouTube | Gumloop tutorial | No Code

How I automated my YouTube | Gumloop tutorial | No Code

ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases

Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases

Claude #AI artifacts are just amazing!

Claude #AI artifacts are just amazing!

OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me

OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me

Day in my life | Vlog #1

Day in my life | Vlog #1

How to add AI Copilot to your application using CopilotKit | Tutorial

How to add AI Copilot to your application using CopilotKit | Tutorial

Quick Questions with an AI Founder - Anudeep Yegireddi

Quick Questions with an AI Founder - Anudeep Yegireddi

This video tutorial covers the process of training and fine-tuning machine learning models using scikit-learn, including model selection, hyperparameter tuning, and deployment. It provides hands-on experience with various algorithms and techniques, making it suitable for beginners in machine learning.

Key Takeaways

Import necessary libraries
Read and pre-process data
Split data into training and testing sets
Train and evaluate models
Perform hyperparameter tuning using grid search and cross-validation
Deploy model using Flask

💡 Hyperparameter tuning is crucial for improving model performance, and techniques like grid search and cross-validation can be used to find the optimal hyperparameters.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related AI Lessons

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data by encoding and scaling features for better machine learning model performance

Medium · Machine Learning

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data for machine learning by encoding and scaling features, a crucial step for model training

Medium · Data Science

The Python Dictionary Trick That Makes Interviewers Smile

Learn the Python dictionary trick that impresses interviewers and improves your coding skills

Dev.to · Ameer Abdullah

I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026

Discover the top 5 Python courses for 2026, curated from a comparison of 50 courses, to enhance your programming skills and career prospects

Medium · Python

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB