Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience
Skills:
ML Pipelines80%
Key Takeaways
Explains ensemble techniques in machine learning using methods like bagging and boosting to improve model precision
Full Transcript
foreign good morning good afternoon uh let me quickly uh briefly introduce myself first uh before we start I am ritika vadhavan I work as a data scientist at Johnson and Johnson I have more than nine years of experience in the field of data analytics and machine learning I have also been hosting training and mentorship programs for over two years now I have provide training in Python Tableau business statistics and machine learning algorithms so uh that's pretty much about me thank you everyone for joining today I am going to speak about Ensemble techniques in machine learning I'm going to share my screen and then we can quickly have a look at the agenda Forte the first 30 minutes off of this webinar we will dedicate it to understand the entire process of Ensemble what is it about how does it work what are the different methods and we shall learn about uh the most popular uh techniques that are used in in simple learning bagging and boosting will take some examples and try to learn in detail about these two techniques in the next 15 minutes we will have a quick live demo in Python we'll uh go over a small example and we'll try to implement any of these algorithms in Python and the last 10 to 15 minutes we'll keep it for uh your queries or questions so uh if you have any questions or doubts throughout the session you could either put them in the chat window or you could ask me towards the end of this webinar all right so let's start with the overview first what is Ensemble exactly and simple literally means a group producing a single effect and in machine learning it does exactly the same thing it is a technique that combines several base models rather several weak models in order to produce one optimal model now there could be multiple sources of Errors when we speak about machine learning algorithms it could be noise it could be variance it could be bias and in simple techniques helps to minimize these error causing factors and thus ensuring accuracy and stability of the machine learning algorithms now how is it done you learn we learn that later first let us try to understand in detail with the help of few examples what exactly is this concept about so let us talk about two kinds of Learners in machine learning there there are weak Learners and then there are strong Learners what are they so weak Learners have a low prediction accuracy they are as bad as random guessing they are very prone to overfitting which means they are unstable so if if you uh tell if you build a model A classification model and the model is unstable and it is prone to overfitting it would not be able to uh classify data that varies Too Much from the original data set uh if we consider an example let us say that you are training a model that identifies animals with pointed ears as cats now if the model comes across any uh cat whose ears are curled it might just fail it won't be able to recognize because the characteristics through which the model was doing its job was pointed ears it is not so it would fail strong Learners on the other hand have a higher prediction accuracy they convert a system of weak Learners into a single strong Learning System how now let's say there are two weak Learners here the first one again same it identify cats with pointed ears the second one can guess the cat shaped eyes it can guess the cat this is the cat shaped eyes they're two weak Learners now if we combine them what would happen is after analyzing the image for pointing ears it will then analyze for cat shaped eyes which in total would improve the overall system accuracy of and simple techniques you can in simple methods usually take advantage of the Blended output from a number of weak Learners so it will learn something from one model something from the other model eventually combine all the predictions and make a final call that is the entire concept of consumable techniques let us have a look at uh one more example here let's say there is this group of blindfolded people and they are told to explore touch and explore a mini donut factory and then share their personal experience now there's many a donut factory none of them has ever seen before so basis their experience with that particular part of the machine that they got to explore they will they would only know that part of the machine now when when we uh you know you uh go to them and talk to uh seek feed patch each person each blindfolded person would only be able to talk about their experience but when you combine their experiences you might as well get a highly detailed account of the entire product entire Machinery part and that is what and simple techniques is all about it would combine it takes advantage of all the weak Blended output of weak Learners which as compared to a solitary Model A Lone weak model would be superior and hence it would increase the prediction accuracy [Music] here a strong learner we are talking about in terms of in simple how does ensemble makes we convert multiple weak Learners into a strong Learning System all right now let's understand the process how does it work so we are going to start with what's usually given to you a data set when you uh do any sort of analysis or when you begin with any analysis uh the first thing that is given to you is data the next step in model building is always to split your data into training and testing major chunk of the data we usually keep it for training and a small portion of the data is reserved for testing testing your model on the Unseen data these are the steps that we usually do what happens in in simple techniques is random samples are withdrawn from this training data all right these samples are uh withdrawn randomly that you could Define the sample size and N number of samples would be withdrawn from these uh from this training data and then individual models would be trained using these samples so model one would take this random sample and it would be trained on only this sample model 2 but only on this part of the sample model 3 this one and there would be n number of models which would be trained on num and its sample that that's how it works now these individual models have to be very different from each other which uh eventually uh in turn what would make uh how would that would be possible if these random samples are independent of each other this sample should have nothing to do with the other one if these samples are independent the models that you will run would eventually be very different from each other now that is one of the Necessities that you have to ensure if you want your Ensemble method to be effective enough all right so far we split our data into training and testing using the training data we have drawn random samples and using these samples uh we have built n number of models now each of these models is going to have a prediction of its own this would give you some predicted values this would give you a set of predicted values and so on so each of these models is going to give you some pictures and then you could combine all of these predictions to make one final call one final prediction this is how in simple methods work in a nutshell now a thing to remember here is the sampling that is done here this is sampling with replacement all right so the samples that are withdrawn these are withdrawn with replacement with repetition now what is a sampling with replacement sampling with replacement means if a unit is withdrawn from a data point is withdrawn from the training data it would be recorded here first and then it is return to this training data back before the next unit is withdrawn you withdraw a data point recorded here and then you bring it back here so the next time when you have to choose another data point the uh the first data point you would see it again you can choose it again as well that is your sampling with replacement now why do we need uh sampling with replacement here what is the reason for that reason being number one if you do not withdraw samples with replacement the size of the data set would eventually decrease you might have to throw a lot of data to have that diversity here as I said before there has to be a lot of diversity between these random samples and you would eventually have to throw a lot of data here if you want enough diversity if you do not with uh draw samples with replacement so number one we do not want the size of the data set to be uh decrease number two it helps to make base models independent as I said when uh sampling when some when you sample with the replacement the samples that are generated they are independent of each other each of these samples would be independent of each other which would make it more diverse hence making the models independent of each other as well so that is this entire process a few things to keep in mind number one uh n number of models are generated out of n number of random samples which which are withdrawn with replacement and number two the predictions towards the end are combined to get final prediction this is the entire idea of ensembled methods in machine learning all right now let's talk about the different methods that are used so the popular ones are bagging boosting and stacking tagging and boosting are somewhat of the similar idea where you could actually combine multiple weak Learners of the same kind stacking on the other hand would allow you to combine models of different kinds as well for example uh with bagging if you have if you if your base model if your initial model is decision trees you will be only able to probably collect 10 different decision trees and put them together in tagging or if you start with the random Forest you can do it with random Forest but with stacking you could have a random forest and you could top it with another uh extreme gradient boosting algorithm which you could further top it with another algorithm so models of different kinds could be stacked together tagging and boosting the models are models have to be of the same kind now let us try to understand what bagging and boosting are all about we won't be talking a lot about stacking today we'll just focus our uh study to bagging and boosting for today all right let's talk about bagging here so from going back to the previous slide where we talked about sampling with replacement in data science or in machine learning sampling with replacement is called bootstrap the process is called bootstrapping so withdrawing random samples from the data with replacement is called bootstrapping in data science the other step which we saw here uh in the previous slide was that towards the end we combined the prediction of different base models these combining predictions is done with the help of aggregation we use an aggregation function at the end to combine multiple predictions now how to do that I'll show you with the help of certain examples but these are the two basic steps that we had talked about bootstrapping and aggregation which combine together means bagging so bootstrap plus aggregation is bagging the only thing to remember here is in bagging all your models run in parallel all the models run in parallel okay so uh from model one to Model N if you could recall the figure from the previous slide all those models will be running at the same time they would be predicting certain values and then an aggregation is applied to take a final prediction now the next thing to remember here is now there are Classics two kinds of problem that we try to deal with in machine learning the first one is classification where we classify data into multiple labels and then regression regression we use for continuous variables aggregation function for classification is mode we usually use mode for classification and then mean for regression now how to go about that let us take an example and understand this thing here so we'll talk about aggression first okay so we have two models here model zero and model one and model two all right model one is as simple as average the average function so y would be the predicted value is always going to be the predicted value is always going to be an average of all the X's here so that's your model number one model number two is a slightly complex model it stays your why would be average plus minus standard deviation so these are the two different kinds of models that we are going to work on here you have the actual value here and then basis this function over here you have the predicted value this is what you'll usually see in any sort of model in a regression model an actual value and predicted value what is an error here error is the difference between these two values my actual value was 2.6 but the model predicted of 3.3 so my error becomes I'm going to use an absolute function here because we do not want any positive or negative numbers this minus this that is your absolute error so I'm going to calculate this error for all of these values and with that my mean absolute error for these 10 observations becomes 0.98 that is the mean error of all these 10 of the regions you could also calculate the percentage error percentage error is nothing this divided by the actual value so here comes your percentage error which will allow you to calculate the mean absolute percentage error so model number one mean absolute error is 0.98 and uh mean absolute percentage error is 30 so the model is able to predict within 30 of the error that what it means now let's consider this another model which we saw it's a little complex we have predicted values we have tried to put a correction here the correction is nothing but the standard deviation using that we are uh we have certain set of predicted values let us calculate error and so just for this or let's calculate it here I'm going to use the absolute function again and it is predicted minus actual value that's your error and these are all the errors let us calculate the percentage error here model number two my mean absolute error has further increased to 1.19 and the percentage error is now approximately 35 percent first model 0.98 percentage error is 30 second model 1.19 average error and your percentage error is approximately 35 percent now uh what did we uh when we were talking about bagging we saw that we run individual models and then towards the end we combine the prediction how can we combine our prediction here for regression we use a mean to combine so means average so all we need to do here is we'll take an average of both the predictions prediction number one and prediction number two take an average that is your combined prediction here all right now let us again calculate the error my error is going to be this minus actual value the actual values are same in this case so let us use an absolute function here so this time I am not using a separate model here all I have done is I took an average of these two predictions let us see what impact is created on the error and what impact has been created on the percentage error so as you all can see my absolute error has gone down to a point eight from 1.19 to 0.98 to 0.8 so mean absolute error has gone down and percentage error has gone down to 19 and thus my accuracy of the model has increased this is how your aggregation Works in regression okay mean for continuous variables all right now let us see another example for classification so what happens in classification problem is you classify labels let's say we have these 10 records these uh Records have been labeled as one and zero these are the actual values actual labels all right now there are five models five different models that were able to classify these values and then make certain predictions so model one classified this as a one if the prediction is right it is colored as Greek if the prediction is wrong it is red so if you look at the individual accuracy this model has predicted 50 times it made 50 right predictions accuracy is 50 percent the other model the accuracy is again 50 for these three models accuracy is 60 percent so four times out of ten the model was not able to make the right prediction now let's do one more model the final model and let's take a mode mode is what the most frequently appearing value so out of these five what is the most frequently appearing value one one is appearing the maximum number of times hence my final prediction becomes one similarly here one is appearing three times one becomes the final prediction in this case 0 1 0 1 0 0 appears three times hence zero becomes my final prediction if you take a final uh prediction as the mode of different models this is your final prediction and as you could see this model has an accuracy of point eight eighty percent there are just two mistakes here two red bars rest everything is a green and hence my accuracy has increased from a 50 or 60 percent to 80 percent so this is how mode is used when we are using when you're working on a classification problem for aggregation so that was all about battling let's go back to our uh presentation here this was all about bagging the two things that you have to keep in mind is number one your uh all models run in parallel number two base estimators have to be very different from each other as different as possible and thus the errors should be independent so these are few pointers that you must keep in mind when talking about bagging that's about bagging now let us talk uh a little about boosting creates a model by combining several big realistic actions all right so first the base model is going to run the feedback of the first model would be fed into the next model and it is going to learn from the mistakes that the previous model has made your six every succeeding model would learn from the mistakes that the previous model has made that is how boosting works so uh let us look at this example here let's see these are my actual data points so at the beginning or we are going to assign certain weights to each of these uh data points and they are going to be assigned equal weights so each of these have been uh assigned a date of one year now this data goes into your base model and this is the prediction these are the predicted values now the first second third values are predicted right this yellow one means it was predicted wrong and then again fourth and fifth were predicted right right so this is your prediction from the base model there is one error this fourth value was not predicted right here what would happen in the next model the weight of this incorrect prediction is going to change the algorithm is going to increase the weight of this wrongly predicted observation so rest everything while arrest everything is going to remain one this is increased to fines so next time when the model runs it will pay more attention on predicting this one right so the next model you would see that now the fourth one is Big has been predicted right however this is uh the sixth one the final so in the next model it will increase the weight of this one so this keeps on going and going until uh your the training errors they are below a certain threshold the entire process keeps on repeating that is how boosting works so the main difference between bagging and boosting is bagging is all about parallel execution of models there is no such concept of error or learning from errors in bagging on the contrary boosting is all about learning from the errors made by the prior models so whatever has happened in the base model on the basis of that new weights would be assigned more weightage are given to the incorrect observations and then it keeps on repeating these steps until a certain threshold is obtained that is all about boosting algorithm now talking about the differences between these two bagging as I said all week Learners are built in parallel boosting is about successive Learners to improve accuracy from the prior week learns in bagging equal weight to each learner we have uh previously seen when we were looking at this example here we took an average we never took a weighted average here right so equal weightage was given to model one equal weightage was given to model two despite of the differences they're in uh that are there in their accuracy or error now this is uh the error is high for this one but we are not discriminating between the two modems equal weightage given that does not happen in boosting better models are given more weightage to evaluate the final performance tagging as I said independent samples in boosting the subsequent samples have more of those observations which have higherly uh relatively higher errors as I said it is an error based learning technique so the subsequent samples will have more observations which are relatively higher errors that is how these samples are not independent in boosting bagging helps to reduce the variance of the model variance of the model means a model suffers from variance when the model is overfitting when it is too complex so if your model is too complex it works well on the training data but when the model is exposed to unseen data the performance is not very consistent in that case you could go for bhagi on the other hand your bias can help you sorry boosting can help you reduce the bias of the model bias means the model is too simple for the problem model is not able to capture enough information from the data hence it is too simple thus bias is there and you could use any boosting algorithm to reduce the bias of the model some examples of bagging models are there is a banking classifier which is very popularly used the random Forest boosting the algorithms are adaptive boosting gradient boosting now the basic approach would remain the same but these uh these algorithms would have some small tweak which would make them which would make them different from each other and that's it but the basic working would remain the same just the way I explained all right let us go uh for a live demo now I will answer all your questions towards the end let us come quickly go through the working of one of the algorithms here so it's a small example I'm sharing my screen again this one so we are going to work with the very popular uh breast cancer data set which you could load from the scale on Library data sets the other important module which we would use is train test split which is used to divide your data into training and testing then you would use a decision tree classifier decision tree classifier is going to be your base model the base estimator classifier one of the algorithms that we just talked about all right so I'm going to create X and Y from this particular data set X would be the set of all the independent variables y would be your dependent variable Target variable the variable that you wish to predict all right X and Y are defined next we divide our data into training testing data sets so X would have its own train and test data set y would have its own train and test data set and we will use a size of 25 25 of the data is reserved for testing rest 75 we are going to use for training so here we have called a simple decision tree classifier with the max depth of three and uh we want to generate it randomly samples uh would be generated randomly and this is my base estimator now I am going to use the space estimator to call bagging classifier this is how we call a baggage classifier now let us look at some of the hyper parameters for a bagging classifier the first one is base estimator you have to provide a base estimator you could uh any any uh classification or regression algorithm could be used here you could use logistic regression you could use a decision tree any base model you could that you wish to explore you could use for now we are going to use decision tree we have already created a decision tree here number of estimators so as I told you there could be n number of models which run in parallel for the final prediction this number you could Define here for now we are going to keep it five we don't want this piece of code to take a long to run so uh we'll keep it five five number of Base estimators we are going to run Max samples is this sample size the maximum number of samples to be drawn withdrawn with replacement so if which is going to be 50 here in this case and bootstrap true means that sampling is going to be done with replacement so this particular feature bootstrap you have to set it to true and thus we call the bagging classifier here base estimator is true three my decision classifier number of estimators is 5 Max samples is 50 and bootstrap is true and then we fit this particular object to our training data sets X strain and white ring fitted to the training data set here and next all you have to do is print this codes so let's just restart and run it again so here we have imported all the necessary libraries loaded the data set and created a simple base estimated decision tree next we are done with the tagging classifier and the performances as you could see training score is the accuracy score on the training data set is 94.3 and the accuracy score on the test data is a 96.5 percent so these both are very close to each other hence we do not see the model suffering from overfitting or underfitting in this particular case and that is what the overall objective of using a tagging classifier so I guess this is just a simple example you could uh try this out uh with a much more complex data set also uh and you could try different base estimators here try to play along with these uh hyper parameters as much as possible there are other hyper parameters as well so now I'm going to stop sharing my screen and let us see what questions were asked difference between bagging with decision tree and random Forest random Forest uh Ashwin is another um Ensemble technique and simple method algorithm your decision tree is a simple base algorithm while random Forest is one of the uh and simple methods that is the entire approach is very similar to bagging the only difference is with random Forest at this step let me share my screen again hold on so if you see my if you could see my screen now what happens with random Foresters just like we have withdrawn random sample of records from the data in random Forest at a point at a certain point of time or for for one of the models you would all you would also subset the number of features so subset of sample records plus subset of features only a subset of features would be used to generate a tree then another subset of uh features would be used to generate another tree so that is the basic difference between bagging classifier and random Forest rest everything works like this if the samples are drawn with replacement won't that mean that the random samples are not independent if the samples are drawn with replacement that means that random samples are independent of each other uh no Aditya I guess you are thinking the it other way around if your samples are withdrawn with replacement then they are independent of each other let me just scroll up the chat window and okay see the accuracy changed because uh if you uh had observed we are using a random parameter random hyper parameter so every time random sampling is done to withdraw samples from the data and hence the numbers are going to change a bit okay recording I guess yes you can uh this would be there on their YouTube account you could uh access this recording from there voting classifier is a classification uh algorithm so your decision tree is a voting classifier where does majority voting comes in ensembled techniques so my who has asked this Maria just now we saw that example where we chose the mode to come up with the final prediction that was nothing but the voting mechanism the one the class out of zero and one with the maximum number of votes gets the final prediction once the final prediction so since one was up for the first report once was appearing the maximum number of times once one got the maximum level one got the maximum number of votes should be constrained base model before passing it to bagging models for example you have constrained you need a base estimator that that's it srinivasan you we would have to declare a base estimator you can either do it separately at the top or you could even declare it inside the hyper parameters what is your accuracy score notebook would be shared to you code file would be shared okay difference between test set and validation set so what happens is uh you usually uh so however it's been okay so what happens is the data is when you have to when you have to deploy your model in production your data should be divided into three sets train test and validation validation data set is something that uh you wouldn't you wouldn't allow your model to see until you're sure of the performance and sure of the fact that this is the final model that you wish to deploy in production why because uh data leakage is something that happens when your model is exposed to the test data so when you are when you are exercising hyper parameter tuning you uh test it on the train first you evaluate the model on training data set then you evaluate the model on test data set You observe the performance and then go back change the hyper parameters again again evaluate the performance so in a way you are tweaking your model basis the test results you wish you would wish to keep one data set that is completely unexposed to the model to evaluate it once again before it is deployed in production for that you keep that validation set that set is never exposed to the model until and unless you're sure that this is the final model that I wish to deploy uh got free I I guess uh they have shared my LinkedIn profile with with you it is there on the the LinkedIn the link to the LinkedIn profile is shared on my on the registration page so you can follow me from there backing with decision trade random Forest we talked about it what else how to find Fates for model so I'm sorry uh your question is not very clear which ways are we talking about here yes Chucks uh even the and symbol methods need to be optimized by hyper parameter tuning there are times when your data is such that uh only using uh in simple methods wouldn't be sufficient for the problem in that case you have there are certain hyper parameters which you could use to tweak the performance and then again you should know uh which problem do you want to solve is it overfitting or is it underfitting can we do hyper parameter tuning in bagging for the model running in parallel can we do yes you could do hyper parameter tuning for for the base estimator are you talking about the base estimator here what I understand is uh if you could do hyper parameter tuning for the base models base estimators yes you could do that while you are initializing your base model you could use certain hyper parameters and then uh do it later for debugging classifier as well however be sure that you know it is not you're not uh increasing the complexity too much oh you you don't have to define those weights that those weights are assigned by the algorithmic sense can we use two different types of CNN Region 19 Google net and simple approach uh different types is only possible in stacking it's only possible in stacking you cannot use uh different kinds of methods in bagging and boosting bagging and boosting works with similar kinds of algorithm just one of them and simple methods can handle the problem of class imbalance interesting well it depends actually so if uh usually there are other techniques that we use to solve the problem of class imbalance first thing that you should keep in mind is which particular measure do you wish to Target is it the overall accuracy that you would yeah that you would want is it recall or is it precision so if it's one of them either recall or Precision then uh try to go with over sampling or under sampling techniques first and then on the top of that you could use an ensemble model however if the purpose is just to get overall accuracy then I guess uh going with uh any ensembled model should help is there a situation where there is no errors on the previous model if yes do we still need to go ahead with the boosting uh the situation that you have described here that there is no error on the previous model that is too hypothetical I do not believe that that is ever the case so prediction is always prone to Elders Peter in bagging classifier if we have even number of models and equal number of frequency of classes how what mode will work if we have even number of models and even number of frequency of classes how the mode will work what is even number of models and frequency has anything to do with mode how will the aggregation work in this case where all records are unique in each bagging model even if all records are unique uh it always takes an average of all the predictions you do not necessarily have to have same uh Records there the objective is to take an aggregation towards the end okay in customer satisfaction prediction which score we have to focus Precision or recall that's an interesting question let's talk about this one so customer satisfaction prediction I assume that it is whether the customer would be satisfied or not right now let us Define our precision and recall cases here let us define false positives and true uh false negatives here so what would be a false positive your system your model identifies that the customer would be satisfied wouldn't be satisfy uh false positive yes so it uh you're positive here being that the customer is satisfied your model identifies the customer is going to be satisfied however it is not and a false negative would be models is that the customer won't be satisfied but the customer wants so these are the two kinds of error false negatives and false positives which one is more harmful for the company the first one where you are telling the model is going back and telling you that the customer would be satisfied but in actual the customer is not so a false positive is more harmful for this kind of model and which measure do we focus on while we talk about false positives precision so here in this case you have to Target precision are there more questions please explain once more about production what about production I'm sorry could you elaborate your question if you wish to optimize hyper parameters in simple learning what are the ways in your opinion so it all depends on the kind of problem that you're dealing with there are quite a number of hyper parameters let me share my screen I don't think this sent recordings to the mail IDs because uh are you talking about production or prediction and which prediction are we talking about if you could clarify I I shall explain it once more all right hyper parameters with let's talk about bagging classifier here just give me one minute all right sharing my screen hope everyone is able to see my screen when we talk about Market classifier these are all the parameters that you could play with number of estimators if you keep on increasing the number of Base estimators you would uh eventually decrease the learning rate but you have to look for a threshold till that point you could probably get a good number of estimators that will make your model even more stronger along this one the number of features to be drawn from X to train each base estimator you could also uh give the number of features to be drawn from this the other one which is very useful is this out of that sample score it's a Boolean value you could either uh by default it's always false you could set it to true what is out of bag samples uh here so what happens in uh especially in a random forest or in bagging what happens is when you are uh creating these samples there are some data points here in the training data which do not show up in any of these random samples since we are sampling with replacement here there are certain data points which would not show up in any of these samples you could use that chunk the set of data points for further evaluating your model when this prediction is done test your prediction or test your model test your final model on that data that was not selected as the in the during the process of sampling and whatever error you receive from that evaluation use that error as a feedback to construct the tree again or construct they'll got them again so since those that chunk of data was not selected that chunk is called in in this in terminology that is used is out of bag okay the chunk that was not used during the sampling for training the models is called out of back data when you use that out of back data to evaluate the model further you get it out of that error that error can further guide the building of a model the construction of the algorithm all right so if you set this out of back score as two to estimate the generalization error and it can further enhance the model performance so this is one of the coolest hyper parameters that one could use yeah these are all of them okay for imbalance classes using smart would it be better yes you could a smart is an over sampling technique you could use smooth Vishal it is one of the effective methods this is the difference between gradient boosting and adaptive boosting okay we I uh after answering this question I will talk about precision and Recon okay when to use precision and when to use recall I will talk about that let me first answer this question the difference between gradient boosting and adaptive boosting adaptive boosting is very similar to the traditional boosting algorithm that we just talked about each uh each model learns from the error made by the previous model that is how adaptive learning also works gradient boosting is differs in a way that the model then the succeeding models they no longer predict the actual value [Music] errors made by modern so gradient boosting algorithms they learn by predicting residuals and they try to minimize residuals while they are predicting residuals they try to minimize their residuals as they proceed on with every succeeding model they try to bring it down and that is the only difference is there any situation sampling done nothing in unbalanced data I'm sorry Andrew I didn't get your question okay let us talk about precision and recall uh now so Precision is uh two things that you got to remember here number one is uh how do you define your how do you define your uh false positives and false negatives false positive is your model predicts a positive while the actual value was negative model predicts a positive when the actual is negative false negative on the other hand is model predicts a negative which is false which is not right the actual value was positive so you always got to know how to define these two uh errors once these two errors are defined Precision is always affected by your false positives if false positive is something that is affecting your algorithm so much you that you cannot bear to have a false positive in your model that is when then you should work on the Precision on the other hand when uh false negative is something that can hamper your uh model or your solution too much that is when you must Target recall all right are in symbol methods affected by outliers depends question not so much not so much affected by outliers but again it it would uh it would depend okay what kind of noise uh is present in the data it is uh what what happens when outliers are present is your model tends to overfit and what concept are we using in simple techniques we are using the concept of aggregation and generalization when we generalize we are already uh when we aggregate we are already generalizing the effect of outliers on the data so they are not so much affected by outliers but again um in case the noise is such that you know it is not it's creating a bias or something like that then you it might then it might if we want to reduce false positives we got to make sure Precision score is high right right everyone needs to fill the feedback form before you leave should I apply a treatment be done in all assembled methods see what happens is uh when you are using Ensemble techniques on decision trees you're using decision trees or random Forest uh such uh such algorithms are very robust to outliers they are not that much prone to outliers so you could skip your outlier treatment if you're going with a random Forester bank classifier they are anyway very robust to outliers but uh it is always a good exercise to be aware of what kind of outliers are present in the data don't just ignore them heart disease prediction which in simple technique is effective now to increase the moderate accuracy so heart disease prediction is a classification problem you could start with bagging try hyper parameter tuning on bagging you could then uh May probably use random Forest hyper parameter tuning on random Forest these are some of the models that you could try what are the methods to Target recall and precision methods to Target recall and precision so there is no such method uh Amina it is an exercise where you have to be very well aware of your data number one number two uh while using logistic regression you could use the ROC AOC curve you could use the Precision recall curve to strike a balance between precision and recall there are a lot of ways actually uh to uh be aware of what is happening and then there are ways to correct that is bagging most suitable for regression there is no such concept arvind you could use tagging for regression as well as classification uh problems there is no such concept of most suitable for any of the two problems explain how to optimize hyper parameter whether by partitioning method or cross validation cross validation is again uh you know you do it a grid search press cross validation is done to uh reduce overfitting I am not sure which partitioning method you are talking about here but cross valuation is a good exercise that you must always do thank you if I'm not able to answer any of the questions you could ping those questions again accuracy changed that I explained code file I'll try to share it with you two different cnns no different algorithms could be used it can only be done in stacking how to be consistent in data science even when on job keep learning uh Chucks I'm I I fear I'm not really able to understand your question how to optimize hyper parameters by cross-validation is is what you are asking uh arvind I will try to host a session I'll talk to the analytics team and let's see if we can have a separate session for uh Roc AUC precision and recall yeah that would help because I'm available on uh LinkedIn you could connect with me on LinkedIn voting classifier uh it is the same algorithm that we just talked about where we were using the mode mode is nothing but we were taking a vote whichever class has the maximum number of votes that wins so it is the same thing the classifier that we looked at in the example on that Excel sheet that was a voting classifier uh it's kind of Six Flags rows of data which is better over sampling or unsampling I would say under sampling but then it would depend it depends on a lot of other factors you could try with under sampling under sampling is something which is not very recommended because we are losing out a lot of information LinkedIn profile someone has pinked the LinkedIn profile handle above let me ping it again this is my LinkedIn profile handle what can I do if my model accuracy is not very good uh Kesha again if the accuracy is not very good you have to first have to figure out what is the problem is uh is the model not doing a great job on the training data itself if that is the case then there is under fitting yeah your model is too simple to capture the information in that you have to increase the complexity of the model and find out ways to do that how do we increase complexity of a model use uh more complex strong algorithm maybe your algorithm is too simple for the problem use more data bring more data points into the problem so those are the ways which you could use to make a model even more complex if your accuracy is good on training but not that great on the test data that means yeah that the model is overfitting it is not generalizing very well on the Unseen data so that's the case of overfitting how do you get rid of overfitting there are a number of ways you could try bagging to overcome overfitting you could try cross validation so couple of ways that you could do to overcome overfitting all right thank you so much I have better things to do Satish thank you thank you everyone thank you for this lovely session hope I've managed to answer most of the questions thank you [Music]
Original Description
The bias-variance trade-off is a challenge we all face while training machine learning algorithms. Ensemble methods improve model precision by using a group (or "ensemble") of models which, when combined, outperform individual models when used separately. Different methods could be used to reduce variance or bias.
In this DataHour, Ritika will deeply explain the methods of ensembling machine learning models and their working.
🔗 More action pack session here: https://datahack.analyticsvidhya.com/contest/all/
Stay on top of your industry by interacting with us on our social channels:
Follow us on Instagram: https://www.instagram.com/analytics_vidhya/
Like us on Facebook: https://www.facebook.com/AnalyticsVidhya/
Follow us on Twitter: https://twitter.com/AnalyticsVidhya
Follow us on LinkedIn:https://www.linkedin.com/company/analytics-vidhya
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Analytics Vidhya · Analytics Vidhya · 46 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
▶
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The DataHour: Data Science in Retail
Analytics Vidhya
The DataHour: Anomaly detection using NLP and Predictive Modeling
Analytics Vidhya
The DataHour: Energy Data Science Project from Scratch
Analytics Vidhya
The DataHour: Explainable AI Need and Implementation
Analytics Vidhya
The DataHour: Google Cloud AI/ML
Analytics Vidhya
Prediction to Production in Machine Learning #machinelearning #prediction
Analytics Vidhya
Practical Applications of Data science in Ecommerce
Analytics Vidhya
How to tackle Overfitting?#machinelearning #overfitting
Analytics Vidhya
Building Data Pipelines on GCP #googlecloud #datapipelines #data
Analytics Vidhya
Hands-on with A/B Testing #abtesting #datascience
Analytics Vidhya
Efficient Implementations of Transformers #transformers #cnn #machinelearning
Analytics Vidhya
Modern Deep Learning Architecture #deeplearning #architecture #deeplearningtutorial
Analytics Vidhya
Key steps for Designing Artificial Neural Network (ANN) for Image classification #machinelearning
Analytics Vidhya
5 things you should know about Azure SQL #azure #sql #datahour #datascience
Analytics Vidhya
AI & ML in the Automotive Industry #machinelearning #ai
Analytics Vidhya
Building Machine Learning Models in BigQuery
Analytics Vidhya
NLP aspects in Telecommunication Industry
Analytics Vidhya
Practical Time Series Analysis
Analytics Vidhya
Fundamentals of Quantum Computing
Analytics Vidhya
A DAY IN THE LIFE of a Data Scientist (From waking up to working on algorithms)
Analytics Vidhya
Classification Machine Learning Model from Scratch
Analytics Vidhya
Knowledge Graph Solutions using Neo4j
Analytics Vidhya
Model Guesstimation (MLOps)
Analytics Vidhya
ETL Pipelines in Google Cloud Platform
Analytics Vidhya
Key steps for Designing Convolutional Neural Network(CNN) for Image Classification
Analytics Vidhya
Getting Started with AWS EC2 #amazon #aws
Analytics Vidhya
How to Use Azure NLP and Graph Databases for Intelligent Knowledge Mining
Analytics Vidhya
Certified AI & ML BlackBelt Plus Program #shorts
Analytics Vidhya
Visualizing Data using Python #machinelearning #visualization #python
Analytics Vidhya
DCNN for Machine RUL Prediction using Time-series Data #timeseries #machinelearning #datascience
Analytics Vidhya
M in ML stands for Math & Magic
Analytics Vidhya
An Unsupervised ML approach using Clustering
Analytics Vidhya
Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience
Analytics Vidhya
Model Parameters vs Hyperparameters - Techniques in ML Engineering #machinelearning
Analytics Vidhya
Practical MLOps #mlops #datascience
Analytics Vidhya
Data Engineering with Databricks #dataengineering #databricks
Analytics Vidhya
Multi-Objective Optimisation
Analytics Vidhya
When Airflow Meets Kubernetes
Analytics Vidhya
AI in Banking
Analytics Vidhya
Learn Convolutional Neural Network for Image Recognition
Analytics Vidhya
Extracting Value from Data
Analytics Vidhya
How to measure Marketing Channel Effectiveness
Analytics Vidhya
Transforming Lives | Data Science Immersive Bootcamp
Analytics Vidhya
Stock Market Analysis - AI driven approach
Analytics Vidhya
Become a Data Engineering Professional in 2022 | Future Trends + Skills Required
Analytics Vidhya
Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience
Analytics Vidhya
The Power of Visualization | Tableau Full Course | Analytics Vidhya
Analytics Vidhya
Demand for Data Engineers is on the Rise | Data Engineer | Analytics Vidhya
Analytics Vidhya
Data Visualization in Data Science | DataHour | Analytics Vidhya
Analytics Vidhya
Role of Optimization in Machine Learning & Deep Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Solving any Machine Learning Problem | Approach and Steps Involved
Analytics Vidhya
Topic Modeling Explained with Implementation | Using LDA in Python | DataHour by Arpendu Ganguly
Analytics Vidhya
Data Engineering in E-Commerce | The Best Case Study
Analytics Vidhya
Introduction to Classification using Azure Machine Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Introduction to Federated Learning | DataHour | Analytics Vidhya
Analytics Vidhya
Diffusion Models for Generative Arts | DataHour | Analytics Vidhya
Analytics Vidhya
Master Google Analytics in 1 Hour | DataHour | Analytics Vidhya
Analytics Vidhya
Learn Hypothesis Testing | DataHour | Analytics Vidhya
Analytics Vidhya
A Practical Approach to Kaggle Competition | DataHour | Analytics Vidhya
Analytics Vidhya
Making AI work for Business | DataHour | Analytics Vidhya
Analytics Vidhya
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Data Science
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Python
The Python Dictionary Trick That Makes Interviewers Smile
Dev.to · Ameer Abdullah
🎓
Tutor Explanation
DeepCamp AI