Anomaly detection in time series with Python | Data Science with Marco

Data Science With Marco · Beginner ·📐 ML Fundamentals ·3y ago

Key Takeaways

Anomaly detection in time series data using Python with methods such as Robust Z-score, Isolation Forest, and Local Outlier Factor.

Full Transcript

hey everyone I am Marco and welcome to this lesson on anomaly detection in Time series uh this time I couldn't find any meme for this very hard to find a meme for Unholy detection anyway uh today what are we going to talk about first we'll talk about the different types of anomaly detection tasks in Time series and then we'll take a look at three different methods that you can detect outliers with your in your time series data so we'll take a look at the mean absolute deviation that we will use to compute a more robust z-score then we'll take a look at the isolation Forest algorithm and finally the local outlier Factor now as always you know what I like to do I like to talk about the theory first and then Implement everything in code so we'll be going back and forth between the slides and the notebook uh and in this in the description sorry uh you'll have everything that you need so you'll have the data set as well as the full source code for this lesson so that you can follow along so let's get started what is anomaly detection well anomaly detection is a task where you identify rare events that deviate significantly from the majority of the data and it's very useful right it's used in a wide range of real-life applications from manufacturing to healthcare and now why would you do anomaly detection at all well for two reasons first unexpected events usually are caused by some kind of production faults or system defects right and it's very important to identify those so for example imagine that you're working at Amazon right and you're monitoring the number of visitors to the website well if at any point the number of visitors follows to zero clearly something is wrong right maybe your website is down and so you need to identify that as fast as possible so that you can fix it okay this is a pretty extreme example but you understand you know why we would do anomaly detection and the second reason is that outliers can actually affect the performance of your forecasting models so we know that in Time series any forecasting model uses past values to then predict the future right so if in your past you have some kind of outliers well you need to First identify them and then think do you want those outliers to remain in your historical data to turn your model on right is it reasonable to keep them should you remove them or try to normalize you know their values to bring it back you know to more normal behavior in a sense so that's your forecasting model is more stable in the future okay so this is really the two main reasons why you would do anomaly detection and now like I said there are two types of anomaly detection tasks that you can do with time series data the first one is pointwise anomaly detection and the second one is pattern wise anomaly detection and they're pretty self-explanatory so Point wise uh as it says you know you are looking at isolated points in time that are outliers so here you can see an example this is actually the scenario we'll be working with in this lesson it is the CPU usage of an AWS ec2 instance so as you can see those two red dots in the time series there are two isolated anomalies that were labeled and so we are interested in identifying them okay and then for a pattern wise anomaly detection well here you are looking to identify a sequence of points that form an anomalous pattern okay so here again in our data we have another scenario here where we are monitoring the cost per click of an ad and now you can see that the series of red dots is considered to be abnormal okay and so you would have you would use other algorithms to identify this entire sequence of anomalous points but like I said for this lesson uh in particular we will only focus on point wise anomaly detection so this is the previous example right we'll try to find uh can we devise you know algorithms that are able to identify these two red dots here uh perfect so that's it for this little introduction so before we actually move on to the first method uh let me go uh back into the code right we'll read the data import it uh visualize it a little bit and then we'll come back to the slides and learn about our first method all right and we are in the jupyter notebook note that you can get this exact same code in the description I will leave you a link to the GitHub repo and uh for now we are just going to import our data read it and visualize it a little bit and of course I am not writing the entire code from scratch because some parts are very boring like you know Library Imports or plotting I really want to focus on the anomaly detection techniques so let's go ahead and run these first two cells so again I'm just importing the uh you know normal libraries that we use in data science pandas and then I am setting the size of my figures now here this is where you can download the data set so you can use the exact same data as I am so that you can reproduce all the results uh so anyway a bit of information on the data you know it's real life data on the CPU utilization of an nc2 ec2 instance in the AWS cloud data is recorded every five minutes starting on February 14th at 2 30 pm and there's a total of 4032 data points and it's made available through the numeta Benchmark anomaly Benchmark sorry which is very cool repository you have a lot of data sets with anomalies and labels as well and you can do either pointwise or pattern wise and only detection it's really cool really really grateful to have found this little repo there so make sure to check it out and you know try try things with different data sets after this video so let's go ahead and read the data so again this is the the data that I downloaded from the link that you can find here and then once this is done we also gonna take the labels or the labels this is the second link that you can see here so basically here in our data set we only have two points that are anomalies in this case so before we move on as you can see here on the timestamp we don't have an actual timestamp type right because of those slashes so let's actually convert those dates to timestamp so I will say that DF timestamp is equal to pd.2 date time and then pass in again the timestamp column perfect and now we have the right format so this is great we can actually move on um and now I'm what I'm going to do is add an anomaly label so you know we are trying to do anomaly detection so of course we need to find a way is our method correctly identifying the anomalies or not so and for that we need enable a label so DF is anomaly so this is a new column that I am creating is going to be equal to one so by default one here in this case will mean in layer so not an anomaly minus one will be an anomaly this is basically the standard that scikit-learn is using as well so all the models you know a normal point will be one and an outlier will be -1 so we are using the same thing here just to keep everything easy linear on when we start modeling so for each in anomalies timestamp so now I'm going to identify my anomalies I will say that the F lock and then here DF timestamp where it is equal to each and then you will say that is an anomaly is going to be equal to -1 perfect once this is done we can go ahead and display the head so now as you can see we have our timestamp we have the value and now we have this label in the is anomaly column telling us if the point is an in layer or an outlier and now let's quickly visualize our data so here I'm just separating my anomaly and in layer data and then I am making a plot very standard plotting code here and you should get the following so let me zoom out a little bit so you can see everything perfect so now as you can see we have our data so in blue this is all the normal points so the inliers and in red the outliers so as expected right just like we have seen in the slides the two red dots are anomalies and those are the points that we are trying to find out now of course keep in mind right because it is anomaly detection we are detecting very rare events which is why we only have two points here right so it's going to be very hard for our model you know to find those points and also when we evaluate it it's really like either you hit or miss right so either you find uh this these one or two points or you completely miss them right so it's going to be a pretty interesting pretty hard as well also notice that we have a lot of constant values right so we have all those flat lines here at the bottom in blue which is uh which is interesting right so we'll see how uh different algorithms take that into account so uh that's really it for this uh little part here let's go back into the slides so that we can learn about our very first method all right so we are back and let's take a look at the very first method that we will use here to identify outliers in Time series data which is the mean absolute deviation or the mad so uh the mat is fairly intuitive and I would say this is some kind of a of a baseline method okay so it will work but not all of the time you really have to use it in very particular situations okay so the idea behind the mat is that when your data is normally distributed right you can reasonably say that points at each tail at each end of the Tails right are outliers okay that makes sense so for example if you take out and this is done um by using the z-score method right so if your z-score here you can see this is basically the point minus the mean divided by the standard deviation and so this is how you can say is my point and outlier or not so usually when your z-score is above like 3 or 3.5 usually we conclude that it is an outline so this is basically what we are visualizing here right here you can see this normal distribution and then on the x-axis you have the z-score so as you can see here I set a threshold of three so anything you know above three or below minus three you could say that those are outliers now um because you have outliers in your data right they will affect the mean and because you are using the mean when Computing your z-score you are affecting the z-score as well right so we need to find something more robust right something that will not be impacted by the presence of those outliers which brings us to the robust z-score method that instead of using the mean when Computing the z-score instead we are going to use the median which we know is a statistic that is much more robust and stable uh when we when we have the presence of outliers so the man basically is this formula that you see here so it is basically the median of x i minus X basically it's the absolute difference between the values of a sample and the median of the sample okay and then the way we calculate the z-score is doing in this following formula so uh 0.6745 times the value minus the median of the sample divided by the map and by the way you might be wondering why do we need to scale by 0.6745 this is because because the z-score uses the median absolute value it's always smaller than the standard deviation basically and so to bring it back to something that looks like z-score right so because you're dividing by a smaller number number your number is going to be like a bit larger so to bring it back to the scale of the z-score you multiply by 0.6745 which usually you know bringing it back to the scale level Z score and so you can use it as a z-score method now you have to be careful okay so like I said this is like a some kind of a baseline method you know very naive you have to be careful the z-score method only works if your data is close to a normal distribution so first of all and second uh the mad if your mad is not equal to zero right so of course if you're dividing by zero uh it's gonna be very bad right so and that happens usually when more than 50 of the data has the same value okay so if that is the case uh usually you cannot use the mat in that situation now of course you might remember right our data has a lot of constant values right a lot of flat lines and so here in this case implementing the mat is definitely not the best of ideas um so I am expecting this not to work very well but we're still going to implement it just so that you know how to do it if you ever encounter that kind of situation so uh so that's it for this now let's go back into the code and implement this robust z-score method okay and we are back into the code so I'm just resuming the same notebook that we have started before and now let's actually Implement our kind of a baseline method here which is the medium absolute deviation or the mad and so for that as you remember the mat can be used in two situations right when your data is close to a normal distribution and when your mat is not equal to zero meaning that more than 50 of your data is actually equal to the median so for that we need to take a look at the distribution of our data see if it makes sense in this case so uh here again just plotting the distribution of my data very standard and we get the following so uh this is very problematic right so as you can see here um this horizontal line indicates the median and then we also it matches the peak right so that means that a lot a lot of the data actually falls right on the median okay so already first red flag okay I don't know if it's actually 50 of the data that falls on the median but a big proportion of it actually falls on the median so already the Mad probably is not going to work very well and also looking at the distribution right this is definitely not a normal distribution uh it's very skewed to the left so again two red flags in this case right it's not a normal distribution and a big portion of the data actually falls right on the median so I'm really expecting the Mad not to work here very well but that's okay I still want to implement it for you guys so that you know how to do it so if you ever encounter this kind of situation you'll be able to do it okay so with all that in mind let's actually implement this mean uh absolute deviation or like the robust z-score method actually so actually the median absolute deviation can be imported from side Pi so this is what I'm doing here so here the actual mad uh it's median apps deviation but it's actually computed here of DF value perfect and let's also take a look at the median so the median is NP median DF value so we are going to need those values right later on to compute uh the the robust z-score and then once this is done we can actually print them so why not let's print the mad and let's print the median like so and then let's actually Define our function already to compute this robust z-score so I'm just going to follow uh the formula that we saw on the slide so compute robust z-score we pass in some value and this will simply return uh 0.6745 times x minus the median and everything divided by the bad so this is simply the formula that we saw to compute the robust z-score so let's go ahead and run this cell and you should get the following so as you can see big problems right our math our mean our median sorry absolute deviation is equal to 0.002 so it's not exactly zero right but it is very close to zero so what's going to happen now probably is that because your mat is so small basically any point that is slightly off uh from the median right so anything slightly over or slightly under the median uh we're probably going to get flagged as an outlier because keep in mind you're dividing by the mad and so here you're dividing by 0.002 which is a very small number so your z-score is going to be very large right and so even setting a threshold to like 3 or 3.5 you know chances are that your Z score is going to be so large that again any value uh somewhat off from the median is going to be flagged as a outlier but still you know let's keep moving ahead just to just so you can know you know how to apply this method uh so let's actually compute the z-score for uh all of our samples so I will say that DF z-score is going to be equal to DF value then we are going to use the apply method here which is compute robust z-score just like so so let's go ahead and display the head now as you can see for each of our sample now we have our z-score perfect and then depending on the z-score we can now determine if a value is either an outlier or an in layer right so this is what I'm doing here so as a baseline you know everything is going to be an inlier and then I use a threshold of 3.5 so here if the z-score is greater or smaller than so if it's greater than 3.5 you know it's an outlier and if it's smaller than minus 3.5 you know just to make sure that you cover both ends of your distribution and then it will also be an outlier so minus one so let's run this perfect and now we can move on to the evaluation so how are we going to evaluate our model very simple we'll use the confusion Matrix okay so uh you know so we can see what were the predictions from the model and see if the right label was assigned to the value so very simple you can import the confusion Matrix and confusion Matrix display from scikit-learn and then it's just a matter of plotting it so I will say that cm is equal to confusion Matrix and then you pass it DF is normally like so and then you also pass it the Baseline perfect and the labels in this case is going to be 1 and -1 all right and then you say that this CM so display the confusion Matrix is going to be a confusion oops confusion Matrix display like so that's a new confusion Matrix and then again display labels is going to be equal to 1 and minus one all right and then you can simply say this CM Dot Plot like so semicolon at the end and let's go ahead and run this and you should get the following so again let me zoom out just a little bit uh and so this is what you get so definitely uh it was pretty bad right it is fairly bad so of course as you can see it did uh correctly label the two outliers that we had in our data right so this is what you see this number on the bottom right corner so this is great right the two outliers that were actually outliers were identified by this uh robust z-score method however you can also see that a lot of data was flagged as an outlier when in fact it is not right so uh so let me zoom out again predicted label minus one here but the true label is equal to one so that means that the model incorrectly labeled normal data as an outlier and so again this is to be expected right we had a lot of flat values here in this case uh the mat was very close to zero so again um we see actually what is happening right anything that's slightly deviated from the median uh was flagged as a now player because of this method okay but again it doesn't really make sense to use it here because not a normal distribution that was very close to zero but now you actually know how to implement this method so if you've ever encounter that kind of situation right you know how to implement this Baseline method and who knows maybe it's going to work better in some other scenarios uh but anyway that's it for the man for the robust z-score method so let's go back into the slide we'll learn about the isolation forest and then we'll come back into the code to implement it in Python all right and now let's take a look at the isolation Forest algorithm so no surprise here this is a tree based algorithm that is often used for an online detection and uh basically the algorithm starts by randomly selecting an attribute and then you randomly selecting a split value between the maximum and minimum values for that attribute and so the partitioning is done many times until the algorithm has isolated each point in the data set and so the general idea behind this is that if it needs many partitions to isolate that particular point then it means that the point is an inlier however if you require a few partitions to isolate it then it means that the point is an outlier and this really makes sense with this picture here so as you can see every red line so a tree right on it can only make either a horizontal or vertical lines so every line is a partition here and so to isolate this blue point here that is uh with the arrow here x i to isolate it as you can see it required a lot of partitions right so this means that this data well is likely to be an in-lier it's a normal Point okay however and we take a look at this point here x j now in this case it only required four partitions to isolate it right completely so in this case this point XJ is likely to be an outlier because well in this case you know very few partitions were necessary in order to isolate it completely from the rest so that is really all there is to it about isolation Forest as you can see very simple very intuitive so let's implement it right now in Python and then we will come back to take a look at the last method which is the local outlier Factor okay and we are back and now let's implement the isolation first algorithm so very simple we're going to import in from scikit-learn and here I would like to do something a little bit different so instead of looking at all the data and seeing if the model can actually identify if it is an anomaly or not uh let's let's um let's consider the scenario where you have data and then you want to see if a future point or if you're new data point will is an anomaly or not right so this is really when things get interesting right you don't want to know if something in the past was necessarily an anomaly but it would be interesting to know if a new data point coming in is this one anomaly or not okay so this is what we are doing here so this is why we are splitting the data like so in such a way that we have one anomaly point in the training set and then one anomaly in the test set so here we'll evaluate the model in its capacity to identify this new anomalous point in the test set okay so to use those algorithms you actually need to set a contamination level so the contamination is really just the number of anomaly of anomalous points right in your training data so in this case it's 1 over the length of train okay so this is something that I know right I did split the data just so that we have one anomaly in the training set one anomaly in the test set so the contamination is simply one divided by the entire number of samples in the training set and then once this is done we can go ahead and ice and initialize our model so isoforest is isolation Forest like so the contamination is going to be equal to contamination and then here let's specify a random state in this case I will set it to 42. all right and then X strain has to be reshaped so X train is going to be equal to train values and then dot values dot reshape minus one one so that we can feed it to the model and then it's just a matter of fitting the model so isofer is not fit and then you pass it K string perfect so let's run this and it shouldn't take too long to fit and I made a little mistake here the column is now called values the column is called value all right perfect so now that this mistake is fixed uh perfect everything worked correctly and now that the model is fit we can go ahead and make some predictions so preds is going to be uh sorry so preds uh ISO Forest uh is going to be equal to ISO Forest dot predict and then pass in your test data which also has to be reshaped so test value dot values dot reshape and then here minus one one perfect so let's run this all right and now it's just a matter of evaluating the model okay so let's see if our model was able to identify this new anomaly anomalous data point in the test set so confusion Matrix but only for the test set here so just like before cm is confusion Matrix passing your test is anomaly and then your predictions so preds ISO first like so and the labels again one and minus one and then this cm is a confusion Matrix display all right CM display labels uh also one n minus one all right and then finally we'll say that this CM I'll plot like so so let's run this here and I made a little mistake I forgot my equal sorry about that and you should get the following so what do we see here so looking at the confusion Matrix we can see that in fact it didn't manage to identify this new anomaly right so here we would expect to have a one at the bottom right but it's not the case in fact the model only predicted um that everything is an inlayer basically so the for the model right that in the test set there are no uh anomalies when in fact there is only one so that's why you know predicted label was one but the true label is minus one so in this case isolation Forest did not manage to find this new outlier uh in the data so that's it for this model here so let's go back into the slide learn about the local outlier factor and then we'll come back into the code so that we can implement it in Python all right and now let's take a look at the last method that we'll we'll explore today which is the local outlier Factor so uh this is an unsupervised method for anomaly detection and the intuition behind this method is that we will compare the local density of a point to the local densities of its neighbors and so if the density is smaller for that point then the point is likely to be isolated and so it must be an outlier so that is really the general idea uh behind this algorithm now this is all based on a metric that is called the reachability distance so let me do my best to explain to you what is the reachability distance so let's take a look at this picture here okay so here suppose that we are looking to compute the local authier factor of point a and you set the number of neighbors to three okay so in this case b c and d are the three closest numbers to the point a and point e is too far away so we're not going to consider it okay so what we'll do okay is that we draw a circle around point a so point a is at the middle is at the center of this circle so we draw this black circle around okay um and then the reachability distance is defined by this formula that you see here okay so it is the maximum between either the K distance uh of B so between a and b or the distance between or the actual distance sorry between a and b now the K distance of B it is simply the distance from point B to its third nearest neighbor and that's why in this figure right I have another blue circle here with the point B at its Center to calculate that K distance of B so what is the distance from B to its third nearest neighbor so in this case it would be C is the third nearest neighbor right that's why C is on the edge of this circle so to calculate the reachability distance between uh A and B you would take the maximum value either between the distance from B to C or the distance from A to B okay and you take the maximum value of that and then once you have that reachability distance calculated for all K nearest neighbors of a you can calculate the local reachability density and that's simply the inverse of the average of all the reachability distances okay and so intuitively uh what the reachability density tells us is how far do we have to travel to reach a neighboring point right and so if the density is large then the points are closer together and we don't have to travel for long okay and then the local Atlanta Factor uh it's simply a ratio of the local reachability densities okay so if we set K to a 3 we will have three different ratios right and then we would average them and this allows us to compare the local density of a point to its neighbors and now how do we identify you know if the LOF is representative on an inline or an outlier well if it is close to one or smaller than one then we conclude that it is an inlier otherwise if the LOF is larger than one then it is an outlier um but this has some drawbacks right what does it mean smaller larger right is an LOF of 1.1 does that mean it is an inlier or is it an outlier right is it like larger enough than one or not really so uh it has some drawbacks right and it really depends on the data set right sometimes it's going to work very well sometimes it's not going to work that well it also depends you know how far away your outliers are from the normal data so that's it for this method and this is the last method I will take a look at it today so let's go back into the code and actually implement it okay so last method that we are going to implement in this lesson uh is the local outlier Factor so again a model that you can import from scikit-learn so from sklame.neivers will import the local outlier factor and then very simple works pretty much the same way as the previous model so LOF is local outlier Factor also set the contamination level here so contamination is contamination and here we need to set level T novelty equals to true because in this case this is not necessarily anomaly detection it is technically novelty detection right because we are fitting on some data and then asking the model to predict on new data if we have anomalies or not so anyway be careful with this right if you want to use the predict method of local offline refactor you need to set novelty equal to true so with that being said now we can fit the model on our training data like so perfect and then again just make your predictions so creds Loa is equal to LOF dot predict and then you pass in your test value dot values sorry values that reshape minus one one all right and then once you have your predictions uh you are going to plot your confusion Matrix so cm is equal to confusion Matrix and then test is anomaly like so and then creds coming from the local outlier Factor labels is again one and minus one like so and then this oops this but cm is equal to the confusion Matrix display as in your confusion Matrix and then this play labels is also equal to one and minus one and then it's just a matter of displaying it so this CM dot what like so and you should get the following so as you can see now very interesting because local outlier Factor was able to identify uh this you know incoming anomaly in the data set so as you can see now we have a perfect confusion Matrix right uh everything uh so here's zero and zero at the top right and bottom left so this is exactly what you want to see and everything that was predicted as an inlier is actually an inlier and this one anomaly was actually labeled correctly so for this situation it turns out that local outlier Factor was the best model all right well that's it for this lesson I hope that you enjoyed it that you learned something new you can always follow me on LinkedIn by the way I will accept your connection and also if you want to read the original article so this video was actually based on a Blog article that I wrote I will also leave the link below in the description so that's it guys I'll see you in the next one

Original Description

A hands-on lesson on detecting outliers in time series data using Python. Full source code: https://github.com/marcopeix/youtube_tutorials/blob/main/YT_02_anomaly_detection_time_series.ipynb Dataset can be found here: https://github.com/numenta/NAB/blob/master/data/realAWSCloudwatch/ec2_cpu_utilization_24ae8d.csv Labels can be found here: https://github.com/numenta/NAB/blob/master/labels/combined_labels.json Chapters: Introduction - 0:00 Get the data - 4:11 Robust Z-score method - 9:08 Robust Z-score method (code) - 13:12 Isolation forest - 20:48 Isolation forest (code) - 22:33 Local outlier factor - 27:16 Local outlier factor (code) - 31:21 Thank you - 34:01
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science with Marco · Data Science with Marco · 35 of 38

1 Linear Regression in Python | Data Science with Marco
Linear Regression in Python | Data Science with Marco
Data Science with Marco
2 Classification in Python | logistic regression, LDA, QDA | Data Science With Marco
Classification in Python | logistic regression, LDA, QDA | Data Science With Marco
Data Science with Marco
3 Resampling and Regularization | Data Science with Marco
Resampling and Regularization | Data Science with Marco
Data Science with Marco
4 Decision Trees | Data Science with Marco
Decision Trees | Data Science with Marco
Data Science with Marco
5 Suppor Vector Machine (SVM) in Python | Data Science with Marco
Suppor Vector Machine (SVM) in Python | Data Science with Marco
Data Science with Marco
6 Unsupervised Learning | PCA and Clustering | Data Science with Marco
Unsupervised Learning | PCA and Clustering | Data Science with Marco
Data Science with Marco
7 Data Science Portfolio Project: Regression #1 | Data Science with Marco
Data Science Portfolio Project: Regression #1 | Data Science with Marco
Data Science with Marco
8 Data Science Portfolio Project: Regression #2 | Data Science with Marco
Data Science Portfolio Project: Regression #2 | Data Science with Marco
Data Science with Marco
9 What Are Time Series - Applied Time Series Analysis in Python and TensorFlow
What Are Time Series - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
10 Basic Statistics - Applied Time Series Analysis in Python and TensorFlow
Basic Statistics - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
11 Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow
Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
12 Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow
Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
13 Random Walk Model - Applied Time Series Analysis in Python and TensorFlow
Random Walk Model - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
14 Moving Average Process - Applied Time Series Analysis in Python and TensorFlow
Moving Average Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
15 Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow
Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
16 ARMA Model - Time Series Analysis in Python and TensorFlow
ARMA Model - Time Series Analysis in Python and TensorFlow
Data Science with Marco
17 What is data science?
What is data science?
Data Science with Marco
18 Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?
Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?
Data Science with Marco
19 R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice
R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice
Data Science with Marco
20 Data science or data engineering - which is best for you? #datascience #datasciencecareer
Data science or data engineering - which is best for you? #datascience #datasciencecareer
Data Science with Marco
21 Where to find data for data science projetcs? #datascience #datasciencecareer
Where to find data for data science projetcs? #datascience #datasciencecareer
Data Science with Marco
22 Data science certificates on resume? #datascience #datasciencecareer #careeradvice
Data science certificates on resume? #datascience #datasciencecareer #careeradvice
Data Science with Marco
23 Should you aim for data science or data engineering? | Data Science Q&A #1
Should you aim for data science or data engineering? | Data Science Q&A #1
Data Science with Marco
24 Don't waste time on this | #datascience #datasciencecareer
Don't waste time on this | #datascience #datasciencecareer
Data Science with Marco
25 Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice
Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice
Data Science With Marco
26 How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer
How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer
Data Science with Marco
27 Transition into DATA SCIENCE without a masters or bootcamp #careertransition
Transition into DATA SCIENCE without a masters or bootcamp #careertransition
Data Science With Marco
28 How to improve your data science profile?
How to improve your data science profile?
Data Science With Marco
29 How to learn Python for data science?
How to learn Python for data science?
Data Science With Marco
30 Does Scrum/Agile work for data science?
Does Scrum/Agile work for data science?
Data Science With Marco
31 What are the major roles in analytics and how to choose?
What are the major roles in analytics and how to choose?
Data Science with Marco
32 Thoughts and advice for a live SQL coding round
Thoughts and advice for a live SQL coding round
Data Science With Marco
33 Data science interview question: difference between type 1 and type 2 error
Data science interview question: difference between type 1 and type 2 error
Data Science With Marco
34 Feature selection in machine learning | Full course
Feature selection in machine learning | Full course
Data Science With Marco
Anomaly detection in time series with Python | Data Science with Marco
Anomaly detection in time series with Python | Data Science with Marco
Data Science With Marco
36 Podcast - TimeGPT, predicting the future, and more
Podcast - TimeGPT, predicting the future, and more
Data Science With Marco
37 Big announcement - Revealing my new book
Big announcement - Revealing my new book
Data Science With Marco
38 Get Started in Time Series Forecasting in Python | Full Course
Get Started in Time Series Forecasting in Python | Full Course
Data Science With Marco

This video teaches anomaly detection in time series data using Python, covering methods such as Robust Z-score, Isolation Forest, and Local Outlier Factor. It provides a hands-on lesson with code examples and a real-world dataset.

Key Takeaways
  1. Get the NAB dataset
  2. Preprocess the data
  3. Implement Robust Z-score method
  4. Use Isolation Forest for outlier detection
  5. Apply Local Outlier Factor for anomaly detection
  6. Evaluate the results
💡 The video demonstrates how to use different methods for anomaly detection in time series data, highlighting the importance of choosing the right method for the specific problem.

Related AI Lessons

The Python Dictionary Trick That Makes Interviewers Smile
Learn the Python dictionary trick that impresses interviewers and improves your coding skills
Dev.to · Ameer Abdullah
I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026
Discover the top 5 Python courses for 2026, curated from a comparison of 50 courses, to enhance your programming skills and career prospects
Medium · Python
Machine learning for beginners #5
Learn the basics of machine learning through the analysis of self-driving cars and understand how ML is applied in real-world scenarios
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development
Medium · AI
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →