Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Key Takeaways
This video tutorial demonstrates training and fine-tuning machine learning models using scikit-learn, covering model selection, hyperparameter tuning, and deployment using Flask. It utilizes various algorithms such as linear regression, decision tree, random forest, and support vector machines.
Full Transcript
hello everyone welcome back to the channel data science with her so this is the third part of the Hindu and machine learning project series and in this particular part we would be deep diving into machine learning so we be selecting and training a few machine learning models we'll look at linear regression decision tree random forest and support vector machines we are going to choose the best out of them and then we're going to look into hyper parameter tuning evaluate all of these models using some performance metrics like root mean square error and at the end once we have figured out which one is working the best for us we would be finalizing it and then testing the entire system on test data that we set aside earlier in the beginning and once we are like sure that this is the model that we have to be deploying in production what we will do is we'll same that file the train model using the picol module so this is basically the curriculum for this particular series and then in the next video we would be covering how to deploy this model using a flask web service so let's get started [Music] so in the third part what we are going to do is we are going to select entry in a few models and these are the steps basically we are going to select an - in a few model which is linear regression decision tree random forests and a few others that we will see the second thing is we are going to need some performance metrics to evaluate these models so that we can compare them which with each other as in which one is doing better and then the third step is model evaluation using cross-validation we will see why do we need cross-validation what's the purpose there you know deep dive into hyper parameter tuning once we have figured out which ones are the better performing algorithms we are going to do hyper parameter tuning on those shortlisted models using grid search CB method class basically and then the fifth step is to check feature importance we will see which features are actually contributing more to the prediction so we're going to find out that and then six step is to evaluate the final system on testing data once you have logged your model we'll be testing it on the testing data that we set aside earlier and the seventh the final thing is - after a few random predictions random testing you can then save your model using the pickle module so first thing first I have imported all the libraries all the functions classes that I wanted so you can see all the meth all the liabilities all the methods and the classes that we used in the previous part I have imported right in the first cell over here so let's run this I have all of them imported then I'm reading the file as we did in the previous part as well I am removing the MGP the target variable from the data so I'm segregating the data labels aside and then I segregating the future variables all right so we have features in one data frame which is the data variable and then all the target variable the npg column is in my data labels variable all right now we have segregated the feature and the target variable next thing is I have created this function to pre-process origin the categorical column it's basically renaming one two three as India Lu Sen Germany and it will generate return on the data frame itself then we saw how we added the custom attribute adder this basically adds acceleration on power and I acceleration on cylinder this is again works the same way then the next thing that I have done here which is important to look at is this function over here so what I've done is so we saw how we could create those numerical pipeline and you know Club the numerical and categorical transformation using column transform now here what I've done is I have clubbed them I have encapsulated the functionality into functions so after 82 functions where this one is the numerical pipeline transformation so all it does is it takes this numerical columns of the data frame so this is how I am segregating the numerical data so using the Select D type and using the numerix which is float and integer and then I am calling the pipeline class instantiating it providing the method name the class that I am going to use so I'm first using imputing method strategy as medium so simple imputed class then custom attribute to add a class that is what we defined above then I am scaling all the values in my numerical data set so this returns all the numerical attribute data as well as the numerical pipeline the object so all we need to do is simply call transform method on this and we'll have Numerical transformed numerical data and then the second step which is basically doing everything so this is the main function which is pipeline transformer this what it does is it it is complete transformation pipeline for both numerical as well as categorical data so I have kept this categorical attributes as origin and then I have numerical attributes and the numerical pipeline that I would be using so this is coming from the up this function that I've defined above numerical pipeline transformer I get the numerical attributes the numerical data set and the numerical pipeline object then I am printing basically this is what I was testing with so I am printing the names of the columns of the numerical attributes the data set so you can delete this all right then I have created this full pipeline which is again a column transformer class object and passing it the name this is numerical this is categorical I have passed on the numerical pipeline object and for categorical and simply just one hot encoding all the values India USA Germany and then I am passing on the list of numerical columns and my list of categorical columns then what it does it calls the fit transform method on the data that I provided fit transfer method so basically I would simply provide the data frame to it and it would create a numpy array and then all of those computations basically adding custom ed addition of those attributes and everything would would be carried out and I would have prepared data with me so two steps I have reduced this whole process into two steps so I have gone from raw data to processed data in just two steps by encapsulating all of this functionality and functions so what I've done here is once you have your raw data I have this data with me all I need to do is first call the pre process origin columns which will basically rename one-two-three as India USA Germany and then all I need to do is call this pipeline transformer function and it will pre-process do everything on numerical as well as categorical data and give me the prepared data so this is what are prepared data basically looks like so this is our prepared data and this is the first value of the prepared data basically you can see that we have 11 attributes so 6 were the numerical columns then we added 2 as acceleration on cycle cylinder and acceleration on power so that makes it 8 and then 3 from the one hot encoding of the origin column we've got 11 attributes in total so preparation of data is working absolutely fine now the next thing that we have is selecting and training models now so firstly I have linear regression with me so linear regression as we know this we are working on a regulation problem so the first algorithm that comes to mind is fitting a straight line to our data so I have picked the linear regression model which is the first one that comes to mind so all I've done is I have imported this linear regression class from the linear model module from the scikit-learn package simple we must have all done it or seen it somewhere so what I've done is I have instantiated my linear regression class from the linear model module and then what I've done is I've invoked the fit method on this so let's sit on this so what it does is it gives you a linear regression object so this is this is all fine I have trained my model at this point dot fit basically means that we have trained your model and I have passed the prepared data so this was my prepared data that I got after running my pre-processed data 2 through pipeline transformer function I passed my prepared data and my data labels that I segregated right at the beginning and now what I am doing is I am trying out some predictions so here what I've done is I have picked five rows which is I'm calling a sample data then five labels again top five rows from the labels column as well as sample labels next thing I do is I'll have to transform it so I am passing my sample data to this pipeline transformer this will give me sample data prepared sample data then I am calling this predict function to predict the results or the predict the mpg value of these five rows so the sample data so I passed my sample data to it and this will basically print all the predictions so these are basically my predictions so this is so I am so this is basically I was printing that column so this is happening because of that let me run this again yeah so I have removed the list that was actually being printed it was because of a print statement so what I am what we have is these are the predictions of the first five rows the first five samples of the data from the training data itself now if we compare these results with the actual sample so these are the actual sample labels as you can see these this is the list of the actual sample levels so you can compare we have the first one is twenty nine point zero eight we have 32 either actual value was 32 the model predicted 29 model predicted 27 we have 31 26 this is also 26 this is fine 12 then we went to 18 here and then we had 22 for the last one and actual value was 26 so these are somewhat somewhat clues not good but somewhat rose okay so this was how linear regression performed but this is not a good method to quantify the difference between the actual value and the predicted value so what we do is we use certain performance metrics and when it comes to regression the typical performance metric that we use for evaluation of a particular module is root mean square error so root mean square error basically tells you how much error a particular system makes in its predictions and based on the difference between the actual value and the predicted value it increases the error as then basically as the distance between the difference between the actual and predicted value increases because of that squared term so if you have a lot of outliers in your data it's it would probably not be a good idea to use root mean square error you can use absolute value instead so we are going to use root mean square error and for that what we are going to do is we are going to use the mean squared error function from the scikit-learn matrix model all you need to do is simply predict the results these I am have predicted all of the values from linear regression objects predict method and I have stored them in MPG predictions then the mean squared error what it takes is you pass on the data labels the actual values and then PG predictions which are the predictions made by the model then num by square root so these are the mean squared errors now we have to find out the root mean square error so we have the number square root function you pass on the linear mean squared error values and it will give you the lean linear mean square root mean square error of the linear model the linear regression model so it has come out to be two point nine five we're going to be comparing these values for each of the models going ahead so let's now try it out with this season three model now the steps would still remain the same off with almost every model or every algorithm that you would import from the scikit-learn library now first thing you have to import the class of that particular algorithm so from the free model I have imported decision t3 regressive and I have instantiated it and created an object tree rag so this is a tree regressor now you train in the model using the fit method simple you provide the prepare data and you provide the labels you run it it will give your decision tree regression training model next thing that you have to do is your model is trained you have to predict by providing the prepared data to the training data for now because we are not using the testing data so tree red dot predict pass on the prepared data mean squared error same method you have to find out the error so you're using the mean squared error function from the circular matrix model you'll pass on the actual values predictions and then calculate the square root to basically find out the root mean square error so this is my tree root mean square error run this and we get zero point zero so this is like zero error so this would be amazing right but no no model is perfect so this means that a model has over fit the data to a great extent now what do we mean by over fit or under fit so basically when we say that a model fit model is over fitting the data it means that it performs really well on the training data but it performs poorly on unseen data which it has not seen yet so if I'm getting a prediction of a root mean square error of zero point zero it means that the model is highly overfitting the data what should we do to test it out on unseen data should we use the testing data that we have that's a that's a bad practice we have to keep it aside for the final model now so other thing that we can do is they can split the training data itself using the Train test split or stratified split but a great alternative hair is to use the scikit-learn k-fold cross-validation method now what it does is basically the k-fold cross-validation it randomly splits the training set into k folds so you decide how many folds you want I have used 10 here so what it does is it trains and evaluates the model k times so if you have picked 5 basically 5 folds so it will train and evaluate your model 5 times now how does it train that model so it takes a different fold for every evaluation every time and trains it on the other K minus 1 folds so if I am creating 10 folds so basically I my model would be trained on 9 folds and one fold would be reserved for testing my model so and you do that for every other model so basically you are getting 10 values 10 scores for your entire model so that's like you test it out on entire data set and there are so many permutation and combinations there and all of them are tested again a great evaluation method used in almost every industrial or every good project that I have come across so again we have the cross welds Col function from the model selection module all you need to do is simply I have invoked the function here pass it the tree regressive the mod I have provided the train or the prepared data the training data labels you have to provide the label as well and then scoring it it accepts a scoring parameter sorry argument and this scoring argument basically here is negative mean squared error so it generates a negative value so we'll have to use the negative sign ourselves to make it positive so these values are basically negative generally I think it's for signifying that error should be signified as negative so that's what it is but we will have to to compare it will basically add a negative sign as I have done in the next line so CV equals 10 mean I have created 10 cross validations here 10 folds so let me run this the last line is basically three regression our MSE scores so I'm basically this these would the cross valve core function would give us 10 scores and I am calculating the square root of all those scores and now let us check what we get in our tree regressor our MSE scores so you see that we have ten values in this array and all of these are the results the scores root mean square error values for all those folds that we used so we can calculate the average of all these values so I have used this mean function over here so the average has turned out to be three point three one for decision tree regressive now we can do the same thing the same entire process for the cross-validation for our linear aggressive linear regression model as well so what I've done is I passed the linear regression model prepare data data labels scoring method and then ten quick k-fold cross-validation attribute so run it and we get ten values here again and then the last thing that we do is we find out the average so for our decision tree it was 3.31 which is okay and then for linear regression it has reduced so linear regression has basically performed better than decision tree and it has turned out to be three point zero seven five seven so again let's try it out for another model called random forest another very it's a combination of a lot of decision trees separate decision trees combined into one so this kind of like an ensemble model so this is inside you can import the random forest regressor model algorithm from the ensembl module of Cyclone library again the process is same instantiate train using fit method provide the cross value or all the attributes so I have forest aggressive air which is the forest leg object prepared data labels scoring cross-validation folds which is ten over here then I calculate the square root of my negative values that I've got for all the ten values and then I for all the ten values of each of those folds I would be calculating the average of all those course so if we look at this this is still computing and let's see what we get so you see that we have two point five five eight nine so this has reduced from three point zero seven to two point five five so this is like a huge improvement random forest has turned out to be the better the best performer out of random forest linear regression and decision tree so the next model that we are going to test out a support vector machine regressor so again the process remains the same I have imported the svr class from the SVM module of Cyclone library instantiate I am using kernel linear because we this is like a linear model that I want use of support vector machine there I'm calling the training the fit method prepare data data labels and then I am calling the cross weld score function which is which will take the regressor prepare data labels scoring CV so on and so forth and when we run this so this again gives us three point zero eight so so far we see that random forest aggressor has turned out to be the best out of these so we are going to perform hyper parameter tuning as and we'll find out which set of parameters of the random forest regressive works the best so if we can improve the performance of random forest model from here on now the next task is to fine-tune the hyper parameters of the random forest aggressor now one way to do that is to simply fiddle around with some values manually so you can simply you know keep checking on which values and which parameters work turn out to be the best and see if you could reduce that our MSE value for random forests now the other approach or the smarter approach would be to use grid search çb of cyclic learns model selection module now what this method or what this class does is it takes some values that you want to experiment with and yet that you want to try out and what it does is it uses cross-validation to evaluate the model on each of those set of hyper parameter values that you have provided so it basically tests out all the hyper parameter combination and gives you the best result the best set best combination with which the random forest aggressor gives you the lowest our MSE value now what we need to do here is we need to first define the parameter grid so here I have defined this parameter grid these are the attributes these are the parameters that are random forests regressor consumes so I have parameter grid which is list of dictionaries so my first dictionary contains number of estimators so it is key value pairs key is the parameter and value is basically the list of values that I want to try out with so number of estimators I want to see how it performs with three ten and thirty and Max features is another hyper parameter that random forest consumes which is which basically tells you the number of features that every decision tree should use to find out those predictions and number of estimators for a random forest regressor means that how many how many trees that you are going to use in that particular random forest model so I have three number of estimators and for max features so total becomes twelve so this is three into four twelve combinations would be tried out for this particular dictionary and then for the second dictionary I have provided bootstrap equals false number of estimators is three and ten and then max features is two three four so this is 2 into 3 six bootstrap has faults so bootstrap is another just a boolean hyper parameter which tells the model if it should use bootstrap samples to build each tree so we have by default it's false so and the next thing that we do is we first instantiate our random forest read aggressive and then we instantiate grid search CV and we have to pass on these parameters the regression model the parameter grid that you have defined the scoring method because it's again using the cross-validation method so negative with mean squared error then you return train score is true so we want to look at the train score as well and then cross validation equals 10 so we are going to have ten folds now let's try to run this so this grid search object basically calls the method fit method and we pass the prepared data and the labels again the same way we train it and then it gives you the best params attribute which is this best parents attribute and it tells you which basically which combination has turned out to be the best combination so you can see that the grid set CV this is what we have tested it out on and then the best parameters has turned out to be max features 8 and number of estimators 30 so this is what has turned out to be would be the best parameters so far so what we are going to do is we are going to look at a few combinations so we want to see which parameters had returned what scores so we can Club them together the scores as well as the parameters that were used so let's try to run this what I've done here is firstly in the CV underscore results I have all the scores which is the list so CB under school I am capturing all the scores then I am zipping the scores along with the parameters and then I am printing the root mean square error as well as the parameters that was used so these are all of those scores you can sort them again so we see that the lowest value that we got was two point five five so we will see if we have got something lesser than two point five five which has turned out to be the best so we can see we have two point six nine over here we have two point seven zero and the MEXT the best parameters turned out to be this one right here so we have max features as eight and number of estimators 30 so you can again try out with a different set of parameters keep checking out keep iterating that's the process keep adding more values to test out within your grid search parametric grid so for now we are settling for two point six five or you can try out with some other configuration so for now we could also use the same the regular or the general random forest model which gave us this two point five five value of our MSE so I'm just going head ahead with grid CV these best practice best parameters so again the next thing that we do is we can check the importance of each of these features that we have used so the grid search dot best estimator and that has feature importances as well so if we run this so the grid search object that we had it has best estimator and the best estimator again would have feature importances attribute so this feature importances basically tells you the score of how how much important that particular feature is now these values might not make sense so we will try to Club them with the feature and what I have done is these are some extra attributes that were added then I have some numerical attributes and from there I would capture those numerical columns so numerical columns added with the extra columns so this would give us the scores of each of those features that we have so X acceleration on power which was the added attribute added feature that we calculated ourselves this has turned out to be the most important with a score of zero point zero to four and then we have acceleration on cylinder weight model so on and so forth these are diverse equals true so these are in reverse order so you can see this is the the top most is basically telling you the highest one the most important feature now after feats in importance what you need to do is you need to evaluate your entire system once you have finalized once you have log to your model so we are going ahead with the best estimator from the grid search series so we had the best estimator attribute this is basically our model which this is the Train model that you can use to predict so I have captured it in my final model variable now what I want to do is I want to capture my X test which is my testing features so from stress the stratified testing set I want to first drop the mpg the target variable then I want to segregate my labels of my testing data so that is I am segregating the mpg copying the mpg column from my testing set then firstly I will pre process my data pass it through so that I could rename all the regions and then I am passing it through the pipeline transformer this will give me the prepared data so the pre-processed data that I got from the first function this would be passed to the pipeline transformer this will create X test the testing prepared prepared data and I will make the final predictions so this is my final model that I have finalized I will call the predict method pass on my pit prepare data and then I'm going to calculate the mean squared error and then again find the route to find out the final root mean square error so on testing after testing the value has turned out to be three point zero one which is fine so our trainings accuracy or training root mean square error was two point five eight two point six five and this final value on testing data has turned out to be three point zero one seven so this is fine you can again go ahead I trait and try to find out a better combination but we are stinking with this particular value right here now the next thing that you do is you see that a lot of things are being repeated so what we do is we have created this function now this function what it does is it takes the configuration of the vehicle so displacement weight horsepower acceleration all of those features I create a dictionary so that we basically I have different deployed model in mind so I would be passing on JSON and all of those things so I have created this function which what it does is it accepts data frame as well as it accepts a dictionary of the vehicle with all the features and it predicts the MPG or the fuel efficiency of that particular vehicle now I am passing what it does is this function accepts the configuration which is it can be data frame or a dictionary now this config I am checking for a dictionary if it's a dictionary I converted into a data frame first otherwise this remains a data frame firstly I pre process my configuration my data frame by passing it through the pre pre process origin columns so this pre process data then again is transformed using the pipeline transformer function this gives us the prepared data we print not a necessity to print but I am printing just to check if everything is fine or not then I calculate by prediction using the model dot predict so I have passed on the model so this basically this function takes the configuration and the model that you have to use and it will generate the predictions for you let's run this so here I am testing this function on a sample so I have created this vehicle figuration here I have defined cylinders are four six eight so basically these are three rows that three instances that I'm checking it out on and these are the values that I have given so these are close to the values that are there in the training data but I have created my own combination of testing data so you can do that for yourself as well then call the predict mpg function and this will take the vehicle configuration and the final model that we have captured so this again this is the printing of this particular prepared data frame we can remove it and now if you run this we get the predictions so these are the three predictions for three rows so one other situation that we have is about one hot vectors so the the categorical transformation that we are doing is it is creating one hot vectors and a model is trained on three different categories so if you'll provide one of the categories to the data frame it would it would give you an error but this can be worked around with so if you want to work it with one of the values so currently I haven't done that I will be passing it the three categories so this would only accept only give you the right predictions if you provide all the three categories this is one two three otherwise there would be a mismatch between the number of attributes that the model is trained on and the number of attributes that the transformer generates after one hot encoding because it needs those three classes if you are passing on just any one of them and we are passing it through the pipeline transformer it will only generate lesser number of columns for each of those class which would give you an error so we are testing it out on each of those classes on each of those origins so based on that our predictions are these 34.8 three eighteen point five zero twenty point five six we can test it on some closest value you can create all these test samples for yourself which should be close to what you have seen in the training data so again you can do this the same way and the next thing is once you are finalized you have to save your model so for saving I am using the pickle module now this pickle package is comes in built in in the Python language so what we do is we open a file I'm creating and storing it as a model dot bin file writing it as f underscore out you dump the model the final model into this file and then close the file this is as simple as that and when you load it when you are reading from the file or basically when you want to read a model load a model and then use that model to make predictions what you do is simply open the file again and RB format as this particular file name and what you do is you do pickle dot load into this variable F underscore N and basically from that particular file and then your model comes in into this variable and you can use predict underscore mpg pass the configuration pass the model and run it to get the predictions so these are same as these predictions because of the same vehicle configuration now the next step is to deploy this model we have CM the model we are going to use this model file and make predictions using our flask web framework so that was all about machine learning and training models and coming up with the final version so if we are ready with the model and it needs to be deployed on account service platform and I have chosen hero who and I would be using flask web framework which is a Python web framework to deploy this to a web service so all of that detail about deployment of a particular model in the next video so schedule
Original Description
Part 3 of the Complete ML Project Series -
The series will cover everything from Data Collection to Model Deployment using Flask Web framework on Heroku!
Link to the Dataset: http://archive.ics.uci.edu/ml/datasets/Auto+MPG
GitHub Repository: https://github.com/dswh/fuel-consumption-end-to-end-ml
My Task CheatSheet: https://towardsdatascience.com/task-cheatsheet-for-almost-every-machine-learning-project-d0946861c6d0
Video on Data Science Portfolio: https://www.youtube.com/watch?v=_ANbV9lVA-M
Book for basics of Machine Learning: http://themlbook.com/wiki/doku.php
You can connect with me on:
- LinkedIn: https://www.linkedin.com/in/tyagiharshit/
- Medium where I -write: https://medium.com/@harshit_tyagi
- Twitter: https://twitter.com/tyagi_harshit24
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Harshit Tyagi · Harshit Tyagi · 21 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
▶
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Your PATH to learning Data Science
Harshit Tyagi
Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Harshit Tyagi
Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Harshit Tyagi
Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Harshit Tyagi
Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists
Harshit Tyagi
Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Harshit Tyagi
Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Harshit Tyagi
NumPy Essentials for Data Science - part-1 | One Dimensional Array
Harshit Tyagi
NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
Harshit Tyagi
Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Harshit Tyagi
Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Harshit Tyagi
Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Harshit Tyagi
Python vs R | The BEST programming language for your Data Science Project
Harshit Tyagi
Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Harshit Tyagi
The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Harshit Tyagi
Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Harshit Tyagi
Speeding up your Data Analysis | Hacks & Libraries
Harshit Tyagi
How to build an Effective Data Science Portfolio
Harshit Tyagi
End-to-End Machine Learning Project Tutorial - Part 1
Harshit Tyagi
Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Harshit Tyagi
Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Harshit Tyagi
Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Harshit Tyagi
Three Decades of Practising Data Science | Interview with Dean Abbott
Harshit Tyagi
Calculating Vector Norms - Linear Algebra for Data Science - IV
Harshit Tyagi
Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Harshit Tyagi
Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Harshit Tyagi
Building projects with fastai - From Model Training to Deployment
Harshit Tyagi
October AI - Video Calling with One-Tenth of Internet Bandwidth
Harshit Tyagi
November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
Harshit Tyagi
Data Science learning roadmap for 2021
Harshit Tyagi
Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Harshit Tyagi
Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Harshit Tyagi
Tableau vs Python - Building a COVID tracker dashboard
Harshit Tyagi
[Explained] What is MLOps | Getting started with ML Engineering
Harshit Tyagi
Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Harshit Tyagi
Five hard truths about building a career in Data Science
Harshit Tyagi
Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Harshit Tyagi
Foundations for Data Science & ML - First steps for every beginner!
Harshit Tyagi
Course Outline - Foundations for Data Science & ML
Harshit Tyagi
How Machine Learning uses Linear Algebra to solve data problems
Harshit Tyagi
Calculus for ML - How much you should know to get started
Harshit Tyagi
Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Harshit Tyagi
AI Engineer - The next big tech role!
Harshit Tyagi
AI researcher vs AI engineer | The next big tech role!
Harshit Tyagi
Reviewing LLMs for content creation
Harshit Tyagi
Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering
Harshit Tyagi
High Signal AI - the most action-oriented newsletter on the web! #ai
Harshit Tyagi
Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Harshit Tyagi
Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes
Harshit Tyagi
Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Harshit Tyagi
Watch the full video on my channel - Roadmap to become an AI Engineer.
Harshit Tyagi
Mesop - Python-based UI framework from Google!
Harshit Tyagi
How I automated my YouTube | Gumloop tutorial | No Code
Harshit Tyagi
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
Harshit Tyagi
Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Harshit Tyagi
Claude #AI artifacts are just amazing!
Harshit Tyagi
OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
Harshit Tyagi
Day in my life | Vlog #1
Harshit Tyagi
How to add AI Copilot to your application using CopilotKit | Tutorial
Harshit Tyagi
Quick Questions with an AI Founder - Anudeep Yegireddi
Harshit Tyagi
More on: Supervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Data Science
The Python Dictionary Trick That Makes Interviewers Smile
Dev.to · Ameer Abdullah
I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026
Medium · Python
🎓
Tutor Explanation
DeepCamp AI