Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)

Dave Ebbelaar · Beginner ·📐 ML Fundamentals ·3y ago

Key Takeaways

This video demonstrates how to apply a Butterworth low-pass filter and Principal Component Analysis (PCA) in Python for a machine learning project, covering data transformation, temporal abstraction, and feature engineering using tools like Python, Pandas, and NumPy.

Full Transcript

hey everyone and welcome back to part 5 of this series where we cover an entire machine learning project and today is all about feature engineering and more specifically we're going to filter subtle noise so not outliers that we covered that in the last video but now we're going to filter subtle noise we're going to identify parts of the data that explain most of the variants through principal component analysis and then we're going to add numerical temporal frequency and cluster features so this episode will be packed with methods to add features to your machine learning data to potentially hopefully improve our predictive modeling as usual this document that you're seeing right now will be linked in the description so go check that out also if you're new here make sure to check out the playlist over here that is linked in the document and this will link to the will playlist with all the episodes that we're covering in this project so we're going to build upon the data sets that we have built in the previous episodes so if you want to follow along completely make sure to check out also those previous episodes if you just want to learn about the feature engineering techniques then you can just continue to watch this episode okay let's get into it and to get everyone on the same page let's start with a quick definition or description of what feature engineering is so this is the process of transforming the raw data into more or extra meaningful relevant features that we can use for our machine learning models so we're basically going to look at each of the features that are currently present in our data and work we are going to try and come up with ways to manipulate them combine them Etc in a way that creates new features and potentially more information that we can provide to our model so here is a brief description of what I've just explained and we will cover most of the examples shown over here so this episode will be packed with different methods and I will probably split this up into two episodes because it might get too long but anyway let's continue right now to the python files that we will be using and as usual you can click on the Arrow over here and download the python file so there are three in total this time so the build features is the empty script that we will be building in so this contains all the sections for everything that we will be doing and I've also added two additional python files over here with some functions that we are going to use and these functions are taken from this GitHub repository over here and that is the machine learning for the con to fight self repository and in this episode we will be using functions from this repository over here mainly from chapter 3 the data transformation.by file and from chapter 4 the temporal abstraction dot by file so those are the files that you are seeing right here I've made small adjustments to them but you can open up the files over here you can see where they're from and that I've made some small adjustments on today actually and you can go ahead and download these files or copy and paste all of his python code and paste it into a file but make sure as usual that they are in the correct folder so today we are going to work from The Source folder and in there the features folder and in there you should have the build features file so as you can see this is empty right now and you're also going to put the data transformation.bi and the temporal obstruction dot by file in here and in doing so we can import the classes that are defined files so the low pass filter principal component analysis and also the numerical abstraction Clause we can import them with the following import statements so we are basically creating our own classes python packages like this that we can import into this file and start using and coming back to our document for today we've briefly covered what feature engineering is and now the next step is to deal with missing values which we are going to do through imputation so in the previous episode we have removed some outliers and replace these values with nands so basically now there are gaps within our data set and most algorithms cannot properly deal with missing values so we have to come up with a way in order to you tackle those so let's get into it and we'll start off as usual by loading the data frame and for this we'll create a data frame variable and we'll call the PD dot reads sorry not CSV pickle methods and then as usual we'll link to our data folder we'll go to the interim files and then we're going to continue with our O2 outliers removed shofonets dot pickle file and let's start up an interactive python session and also this is a critical point to make sure that you can import the classes from the files over here so this should run without any errors but only if you have placed these files in the features folder and also make sure that the naming is correct so everything should be like this and then you should be able to import it without any errors so for me it succeeded and now we can continue by reading the data frame as usual and have a look at it so this is the exact data frame that we exported in the last episode and that is because we are using a pickle file which is very convenient so we can continue now just like in the last episode we're also going to define the predictor and that is because we will be referring to these quite often and the predictor columns are all the acceleration and gyroscope data so for this we're going to call day F and then we'll call columns so this will show us all the columns in the data frame but then we're going to create a subset up until the sixth column which is as you can see up until the last gyroscope value and we're also going to convert this to a python list to store it here in the predictor columns variable next I'm going to import the plot settings so I just copy and paste these so you can just type these over and run them as well so we're going to be using the 538 Style again and we'll make sure that our images are nice and big alright now we're ready to deal with the missing value so if we have a look at our data frame and call the dot info methods we can see that there are 9009 entries in total and here in the overview we can see for each of the columns how many missing values there are or basically we can see how many non-nil values there are and this also tells us how many missing values there are of course if we are looking at the predictor columns over here we can see that basically all of them are missing some of the values and that is the result of our outlier detection methods that we applied in the previous episode so now let's deal with this these missing values and for this I'm going to give a few examples of what we're actually going to do so I'm first going to create a subset of the data frame and I'm going to select a subset and set it equal to a specific set and I'm just doing this at random over here so let's see set 35 what are we looking at we're looking at a heavy row so let's for example take which one has a lot of missing values so let's J take the gyroscope y value and plus an image for this all right now you can see it in here but let's just pretend that and let me switch to the Whiteboard over here make it a bit larger let's pretend that there are missing values within this set so for example let me just take a pen and illustrate it like this so for example uh do it like this and then here this so and then another one and then also it misses this peak so let's pretend this is the actual data and we're missing some values over here here and here here because the outlier detection model the points that are circled over here as outliers so for example what could have happened is there was a point over here that dropped all the way down and then went below a certain threshold and then the model would say okay this point is definitely an outlier so then it would introduce a gap over here same thing over here but then for example like this so it would cut off you get what I'm saying so there are gaps within the data and now there are several several ways that we can deal with these gaps in the data and the first one is that we can for example drop the rows where there are many missing values so we can look at the whole data set that we can just say any row where there is a gap missing value we just drop the row and now of course this would result in less data so this is this is not always the best or preferred method but when there's enough data it's a very good it's a solid option but you can also look at imputation and this basically means that we're going to look at the Gap and we're going to impute a different value over here and to come up with that value we can look at different statistical properties of the data for example so we can look at the mean the median the Min the max stuff like that but we can also try and interpolate the data meaning that we're trying to connect the last point over here and the next point over here so try and fill in that Gap and that is what makes the most sense in this situation at least in my opinion so that is what we will be applying so we're going to interpolate the data meaning that anywhere where there is a Gap a missing value we're just going to connect the dots over here and interpret interpolate them linearly so we're just going to create a straight line and most of the time is times it will just be one value that is missing so it will just interpolate one value so do it like this and here also like this so that is what makes the most sense to me in this situation so let's see how we can do that in bonus so let's get rid of this and also this and pandas has a nice function that we or method that we can use out of the box in order to do this and that is the interpolate function so we can take any column basically and call uh so for example like this and call the interpolates and this will interpolate the data linearly just like in the example I've shown you and now all we have to do is Loop over all the predictor columns for which we want to interpolate the data in order to store the results so we're going to create a for Loop so four column in predictor columns and then we're going to jump over here and then we say the data frame and then we refer to the column we're going to overwrite that with the data frame column but then interpolate it let's run this see if it works and now let's have a look at uh the data frame and then call dots info and now we should see that all of the predictive predictor columns shouldn't contain any missing values so here you can see that there are no missing values anymore so jobs done for this part okay so the next thing we're going to focus on is calculating the average duration of a set and it is a preparation that we have to make in order to later apply the Butterworth low pass filter which is a filter to basically filter subtle noise in the data set and let's start off by giving you an illustration of what we are or what we want to accomplish with this filter and for that I'm first going to create a subset by selecting an individual set again so I'm just going to take a random set set 25 which is a heavy overhead press and then let's look at the Y acceleration and let's create a plot so what we can see over here here is a movement for five repetitions so remember where the data split up in heavy and medium sets heavy sets for five repetitions medium sets for 10 repetitions and here we can see that there are clearly five Peaks or one two three four five um which translate to the five repetitions in this heavy set and as you can see these lines are are pretty jacked they're pretty sharp and we can take a look at another another example so let's just take 50 for example this is a nice example because this is medium set so this is a bench press medium and what we basically want to accomplish with the low pass filter is we want to filter out subtle noise width in the exercises so we can clearly see the movement patterns of the individual repetitions over here so we start off on the low end and then we accelerate to a higher point and then we come back down again and the same for the bench press we can see the clear patterns that are repeating over and over again corresponding to the different repetitions with by applying the low pass filter we can basically make these lines smoother meaning that we are just going to look at the overall movement pattern and not necessary at like the small tiny differences that are apparent between every repetition and also every participant so like the small incremental adjustments of the bar and your hands and feet position for example that yeah can come up during a rep during a set we want to filter out that and look at the big movement patterns and that's why we have to know how long a repetition takes because in doing so we can later adjust the frequency settings and basically attune to a frequency that is higher meaning faster repetitions itself in order to filter out the noise and this is quite complex and is probably still a bit abstract right now so let's just get into it and then I will demonstrate it with an example okay so let's just first calculate the average duration of a set and we can do this by taking a subset and let's just look at set one which is a heavy Squat and then let's have a look at the index and take the starting so this is a timestamp so this is the timestamp on which the set started and what we can then do is we can take this and then put it in front and then say let's have a look at the final timestamp so by doing the minus one in the index over here we select the last index from this data frame from this subset and here we have two timestamps and if you subtract two timestamps you get a Time Delta variable so we can do this let's have a look so we have the time Delta and it says you have 20 seconds and 400 milliseconds so this is the difference between the last and the first time stamp meaning this was a duration and now we can also take that duration and on a Time Delta variable we can also call seconds and then we can see that it's 20 seconds so this will round it to uh to to the nearest Second so this is how we can calculate the duration for a single set now let's do that in a loop to calculate the average duration and for this we're going to Loop over all the unique sets that are present in the data set so that will be s in data frame and then we call the set column and then we call Unique so let's have a quick look at what that's like that is just set of all the unique sets and then we're going to define the start and the stop so let me just copy and paste this put this over here and then say start equals equals and make sure to replace the one with s from the loop so oh also make sure to spell start correctly so we're going to Loop over all the sets and then we're gonna select start and stop then we're going to calculate the duration which is stop minus start which results in the time Delta and now we're going to add this duration to the data frame in a new column and also taking the set into account so in order to do this we're going to say we want the F band lock where and then in parentheses the set column equals the S so this is the selection and then we're going to select the new column so this will be duration and we're going to set that equal to duration and then let's just round it to seconds just like we did over here let's run this and see what we get so we now have our new data frame which has a duration column so here we can see set 64 to 16 seconds to complete and now we can use this new duration column to take the data frame and basically Group by and then Group by category and then only look at the duration column and then take the average so now we can see that the average duration of heavy sets is 14 seconds and the average duration of medium sets is 24. bring this even further we can say duration DF equals the group by function that we've just did and now we can take a look at the first element in here so that is the duration average duration for the heavy sets take a look at the second element in here that is the duration average duration for the medium sets and then we're going to divide that by the amount of repetitions that was present during heavy and medium so 5 and likewise 10 and now if we take a look at this we can see that for the heavy sets the average duration of a single repetition was about three seconds and for a medium set which is lighter of course and therefore easier to perform the average duration was two and a half seconds for a repetition all right awesome we could have also just looked looked at the plots itself and did like a visual inspection of how long it took but now we can back this up with data and we know exactly how long each repetition lasted so this gives us some information that we can use in the low pass filter that we will look into now now and just like the previous methods that we have applied we're not going to dive into all the technical and mathematical details of the methods over here because I want this to be a very practical course practical tutorial that you can follow along but of course it's also important to understand the underlying principles so I suggest if you want to learn more about the Butterworth low pass filter you you can look that up on your own there is some information over here in the document so you can check this out in resources so you can find a brief description here and also a visual representation of what different frequencies look like but for now let's continue by defining the parameters that we need so we are going to leverage the class from the data transmissions.pi file but before we get started let's just create a copy of the data frame that we will be using to apply the low pass filter to so let's define DF Lopez which will be in exact copy and then continue by defining an instance of the low pass filter class so we are going to create a class instance of the low pass filter class called Low pass and then let's just run this so what this does in the background is it we your first loaded this class variable over here and we have now initiated an object called Low pass which has access to this function over here so the low pass filter and if we look at this low pass filter over here and then look at the function that is in here we can see all the required parameters it includes a data table which is the data frame then also the column that we want to apply the filter to we need a sampling frequency and a cutoff frequency so table and column are pretty straightforward these are new variables and this is also why we have calculated the average duration of a repetition the sampling frequency in our case is the step size between the individual records within our data frame and as you remember from part two we have set this to 200 milliseconds meaning 5 instances per second so that is what we have to fill in as the sampling frequency and then the cutoff frequency is the frequency that we want to set our filter to and basically we have to play around with this setting look at the results that the filter is giving us so a visual inspection and also considering the average repetition duration that we calculated so this is a value we will be playing around with and I will show you some examples in a bit okay so let's start off by defining the sampling frequency and the cutoff frequency and first one will Define as fs and we have to specify it like this so this will result in 5 meaning that there are five instances per second so we take we take this in milliseconds one second and when we divide it by the differences between the records so remember between each each of these records there is a step size of 200 milliseconds resulting in five instances per second so that is where that is coming from and then for the cutoff frequency we're going to start with one and now let's start off by applying the filter to a single value so we're going to continue with day F Lopez and we're going to take the low pass class instance that we've initiated over here and we're going to call the low pass filter function so remember low pass filter class contains the low pass filter function like this and in here we have to put the self which is not relevant right now and we need the data table and the column so let's first off start off by inserting the low pass data frame and then let's start off as usual by looking at the Y acceleration and what else we need we need the sampling frequency and the cut off frequency we're going to input the f s and we're also going to input the cutoff make sure we run this and then we can also have a look at the order that defaults to 5 so we can leave that at 5 for now and let's see what we get so we can run this and have a look at the new data frame and we can see that there is a new column over here acceleration y with the low pass filter applied now let's have a look at what we have actually done by applying the filter by comparing this new column with the original column and for this I'm going to copy and paste a piece of code in here and this is also some data visualization stuff that takes a long time to type but this is not the data visualization episode so I'll just go ahead and first select subset again so you can just type this over as well to follow along we select set 45 in this case this is just random but but we're looking at a participant e doing a medium deadlift and then we can check this out so this is a deadlift and now what we can do is we take that data frame and we're going to or the subset better to say we're going to plot the original data so the raw data and the data with the filter applied and we can already tell that the filter has done its job in smoothening the data so hopefully now you get a better understanding of what this filter is used for because before this was quite abstract but here we can clearly see okay we have the original raw jacked lines over here and here we can see by applying the filter it even got rid of the two peaks that we see over here and now this brings us to the question what is the right cutoff frequency to use so let me give you an example by setting this cut off frequency higher so to 2 for example and this is kind of counter intuitive but by setting the frequency higher we allow higher frequencies meaning more jacket and rough lines because they are faster at a higher frequency movements meaning to sum up the higher this number the more or the less we filter smooth data so high number close to Raw data low number smooth data so let me show you what that looks like by just running this piece of code again so here you're looking at cutoff frequency one now we change that to two and we can see that the data now looks a lot more like the raw data it's not the same we can definitely tell that some smoothing is applied but not so much anymore and now let's say if we set this to a half and now this results in very smooth lines and it's probably too much if we consider the problem at hand and trying to identify create a model to identify the different exercises we are now in the position where we have to figure out what the best cutoff frequency is and this is best done by first of all looking at the graphs over here then considering the duration of the exercises and trying to find a nice balance between smooth lines without the subtle noise but still seeing the apparent characteristic patterns for each of the individual exercises and to speed this process up I'm going to cut straight to the point and that is a cutoff frequency of 1.3 that is the value that I used in the original paper and let's have a look at what that looks like so here we can see that there are still some Peaks and valleys in here but it is overall much smoother than the original data frame and we can even tweak tickets for example a little bit and see okay what does it look like if we set it to 1.2 there is not much difference for this set over here so probably setting it to 1.2 1.2 doesn't matter that much but I'm gonna set it to 1.3 to leave even more variation in the data and also this will also be a feature that we could potentially experiment with later down the line so once we have our prediction pipeline ready we can see what the effects of changing the color frequency has on the predictive performance of our models so just like outlier detection featured engineering and also this particular parameter in particular um yeah you have to tweak and test and do run experiments in order to find the best value now the next step is to apply this filter to all of the columns which we will be doing in a loop so we're going to Loop over column in all of the predictor columns and we will say low pass they have low pass and then we're going to this over here and say we're going to change this up with the column That We're looping over and now this will add the low pass filter to the data frame in a new column but in this case we actually want to overwrite the original values so this is something that you could consider so you could add the smooth as additional columns and then later compare them or you can overwrite the original columns and what we are going to do right now is we're going to overwrite the original columns so in order to do that we have to first take the data frame and then apply the filter so new column is added and then we're going to basically overwrite the column that we've just created so this will be column but then plus and then underscore low pass and and then finally we will delete the additional column so what this will do Loop over all the predictive columns apply the low pass filter then overwrite the new column with the original column and then it will delete this column so that's what we are going to do so let me make sure that our low pass dayf is clean again so it doesn't have any of the filters applied and then we're going to Loop over everything instant beautiful all right so now all of our values are updated and over written great and then it's time to move on to principle component analysis PCA for short which is another very interesting technique and the mathematical foundation for this is also pretty complex just like the Butterworth low pass filter you can find some additional information about principal component and now analysis in the resource over here but basically to summarize it's a technique used in machine learning to reduce the complexity of data by transforming the data into a new set of variables called principal components basically this transformation is done in such a way that the new set of variables captures the most amount of information from the original set while reducing the number of variables necessary so to summarize we can look at a set of variables so for example the all the acceleration and the gyroscope data and then try to combine them in such a way that we can reduce them to 1 2 or 3 columns while still being able to capture or explain most of the variants and this helps to reduce the complexity of the data makes it easier to analyze and make predictions so here is a brilliant video explaining step-by-step in depth of how principal component analysis works but for our example we're just going to continue by apply applying it and then see what it looks like all right and to get started we're going to look at the principal component analysis class over here in the data Transformations dot by and we can see that there are three functions in here so we have a function to normalize the data which is necessary in order to calculate the principal components and also we can determine the principal component explained variance and we will use this in order to determine the total amount of printable components that we would like to use so the optimal number which we'll get into in a bit and then eventually we have the apply PCA function so if you want to understand how this works on a more theoretical level you can have a look at the functions over here and then also look up more information about how PCA works but for now let's just continue by creating a new instance sense of the class principal component analysis just like we did for the low pass filter so we're going to define a new object and we're going to set that equal to the class so now we have a new class instance and before we continue let's just also create a new DF the fpca and we're going to set this equal to the day of low pass and then copy it again so we can always come back to this line of over here and reset our data frame so now the first step is to determine the optimal amount of principal components and like we've just mentioned we're going we are going to do that with the function of here determined PC explained variance and that takes as an input a data table and the columns that we want to implement so that will be the predictor columns over here so there's also a comment over here perform PCA on the selected columns and return the explained variance so let's just first take a look at what this will result in and then I'm going to explain how we can use these results to determine the right amount of principal components so we can start off by calling PCA and then call the determine PC explained variance and then we can put in the DF PCA and also the predictor columns so we're going to apply the principal component to the first six Columns of this data frame and now we can store that in a value called PC and let's run this now this will result in a list of six values over here and that is because there are six columns in total that we introduced and of course PCA is a dimensionality reduction method so we want to move from a lot of variables a lot of columns to less columns so we want to explore and total amount of principle components up to the total amount of columns that we're introducing and choosing six in this case wouldn't make much sense because then we haven't actually achieved anything because we would still end up with six columns and now the next step so we have these values over here and we are now going to look at methods to determine the optimal amount of principal components and we can do this by using the elbow technique and for this I'm going to switch to the document again over here and this is a technique you also often see when doing k-means clustering for example to determine the optimal amount for or the optimal number of K and here there is a brief description about how this works but basically the optimal component number is chosen as a number of components that capture the most variants while also not incorporating too many components so we want to find an Optimum over here and this is done by plotting the variance captured the array we just calculated against the component number and then select acting the point at which the rate of change in variance diminishes which is called the elbow and before explaining this further let me first just illustrate how this works so I'm going to copy and paste a piece of code over here that basically plots these values that we just just calculated on a square graph okay so now what we can see over here on the y-axis we have the explained variance and then on the x-axis we have the principal component number and this is a perfect example of an elbow that we see around the number three meaning that as we increase the principal component number the explained variance decreases but after a certain point after three in this example there is a diminishing effect so coming back to the explanation over here this is done by plotting the variants captured against the component numbers and then selecting the point at which the rate of change in variance diminishes so it's about rate of change in variance as I've said this is a perfect example of how we can illustrate that and how this is is called the elbow technique so in this scenario it's quite straightforward that three is the optimal number for our principal components so let's now continue and Implement that so let's have a look at the apply PCA function apply PCA given the number of components we have selected we add new PCA columns so we include the data table The Columns and the number of components so pretty straightforward so let's see how that works in practice so we'll start off by the PCA data frame again and then we'll set that to PCA and then we say apply PCA and then we provide the data frame the predictor columns and then the number of components which is three in our example let's run that and let's have a look so we now have a data frame and if we scroll all the way to the back we can see that we now have three components over here we have basically summarized these six columns into the three principal components over here while capturing or explaining as much of the variance as possible we will leave the principal components in here next to like all the other columns so um not like the low pass filter we're not going to overwrite the initial values but for now we're going to just keep them in here and later doing feature selection we're going to check whether the principal components actually perform better than the individual columns but for now the last thing that we have to do in here is visualize them to get a better understanding of what we've just done and for this I'm going to create a subset again so let me just copy and paste this over here and now change up the data frame and let's say we're going to look at set 35 so what we got we got a heavy row perfect and now let's look at this subset and then a selection based on only the PCA column so it would be pca1 and then let's just copy paste this once more once more two so we got a selection of only the principal components and then let's just and here we can see the result of our principal components for this particular set and that concludes the principal component analysis for now so we can have a look at the data frame you have the original values with the low pass filter applied and then three additional columns created by the principal component analysis all right so then we can move on to the next set of features and I can't say this enough but the math behind the PCA and the low pass filter is quite complex and I really encourage you to look it up and learn more about it if you want to apply it in like your own projects but I just want to show you like how it works and later when we're going to look at predictive modeling it all comes back together and we can see how each of the features is performing and then we can also make better decisions but that's why I'm briefly going over this normally this would require a much more thorough study into the actual results of the principal component analysis to to validate whether they're actually useful all right let's continue with the sum of squares attribute and for this I'm going to refer to the resources again and here is a part from the original report that explains how the sum of squares is calculated basically so to further exploit the data the scalar magnitudes called R of the accelerometer and gyroscope data were calculated R is the scalar magnitude of three combined data points so the X Y and Z and the advantage of using R versus any particular data direction is that it is impartial to to device orientation and can handle Dynamic reorientation R is calculated by basically taking the squares of all the original vectors and then taking the square roots again to basically bring it back to a single positive scalar so let's do that in pandas and we're going to start off by defining the acceleration R which is equal to um and then also let's copy the data frame again so we'll take the PCA data frame and we call this DF squared equals okay so we set it equal to the PCA data frame and then from the DF squared we're going to first look at the acceleration so we'll take the X and then we'll square it like this in Python and then we're going to plus this and we're going to do that two more times so X Y and Z of acceleration and we Square it so come back to the formula this is the part under the square root now we're going to do the exact same for the gyroscope data so we're going to select OC in this case hit command D or Ctrl d three times I guess one two three yes and then we type in gyroscope and now we do the same for the gyroscope data so let's do that and this will result in a single series combining all of the three results and now the final step is to take the DF create a new column called acceleration R and we're going to take the acceleration r that we created but now let's take the square root and then do that one more time select acceleration hit command D type in gyroscope check that out as well so we can first have a look at the values over here and then take a look at the square root all right looking good so now finally step is to visualize that again so let's take another subset do the same trick by now we know the deal let's take number 18 for now what we got medium row we also got the row in the previous one I want another one squat medium perfect so we have the subset and what we can now do is we can have a look at the subset and then create another subset actually of the gyroscope are accelerometer R gyroscope R so you get that and then let's just call the plots method and then also say subplots equals true so we get two images over here and instead looking correct yeah I think so alright so now what we can see over here so um we have now combined the acceleration and the gyroscope data into you scalars using the formula over here and we will later explore how this could potentially impact our models but the main goal of implementing a magnitude scaled version of all the values over here is because it is impartial to the device orientation and this could help us to make the model generalize better to different participants so let's have a look at the data frame again and we can see that we have five additional features as well as the filtered version of The Columns over here of course and that also concludes this video for now because as you can see we still have quite a lot of topics to cover and otherwise this video will get too long so I will stop this right here and then continue in the next one if you've been following along please like this video And subscribe to the channel and then I'll see you in the next one

Original Description

Want to get started with freelancing? Let me help: https://www.datalumina.com/data-freelancer Need help with a project? Work with me: https://www.datalumina.com/solutions In this video, we will learn how to apply the Butterworth low-pass filter and principal component analysis (PCA) in Python. 👉🏻 Source material for this week: https://docs.datalumina.io/tjGyJjXxfpChiL ⏱️ Timestamps 00:00 Introduction 01:22 What is feature engineering 02:23 Python files 05:00 Loading data 07:08 Dealing with missing values 12:16 Calculating set duration 19:26 Butterworth low-pass filter 30:35 Principal component analysis (PCA) 39:42 Sum of squares features Project overview (what you will learn) Part 1 — Introduction, goal, quantified self, MetaMotion sensor, dataset Part 2 — Converting raw data, reading CSV files, splitting data, cleaning Part 3 — Visualizing data, plotting time series data Part 4 — Outlier detection, Chauvenet’s criterion, local outlier factor Part 5 — Feature engineering, frequency, low pass filter, PCA, clustering Part 6 — Predictive modelling, Naive Bayes, SVMs, random forest, neural network Part 7 — Counting repetitions, creating a custom algorithm Link to playlist: https://youtube.com/playlist?list=PL-Y17yukoyy0sT2hoSQxn1TdV0J7-MX4K If you find these videos helpful, consider subscribing @daveebbelaar
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Dave Ebbelaar · Dave Ebbelaar · 27 of 60

1 How to Install Homebrew on Mac (Getting Started)
How to Install Homebrew on Mac (Getting Started)
Dave Ebbelaar
2 How to Install Python on Mac (Homebrew)
How to Install Python on Mac (Homebrew)
Dave Ebbelaar
3 How to Install Anaconda on Mac (Getting Started)
How to Install Anaconda on Mac (Getting Started)
Dave Ebbelaar
4 How to Set up VS Code for Data Science & AI
How to Set up VS Code for Data Science & AI
Dave Ebbelaar
5 How to Use Git in VS Code for Data Science
How to Use Git in VS Code for Data Science
Dave Ebbelaar
6 Data Science Desk Setup to Maximize Productivity
Data Science Desk Setup to Maximize Productivity
Dave Ebbelaar
7 THIS Is How I Write Clean Data Science Code EVERY TIME
THIS Is How I Write Clean Data Science Code EVERY TIME
Dave Ebbelaar
8 Data Science Tutorial - Project Structure
Data Science Tutorial - Project Structure
Dave Ebbelaar
9 Changing rcParams for Better Data Science Plots | Matplotlib Tutorial
Changing rcParams for Better Data Science Plots | Matplotlib Tutorial
Dave Ebbelaar
10 How to Read Excel Files with Python (Pandas Tutorial)
How to Read Excel Files with Python (Pandas Tutorial)
Dave Ebbelaar
11 My Data Science Journey (Zero to Freelance)
My Data Science Journey (Zero to Freelance)
Dave Ebbelaar
12 How I Automate Data Visualization in Python
How I Automate Data Visualization in Python
Dave Ebbelaar
13 16 Apps I Use Daily as a Data Scientist
16 Apps I Use Daily as a Data Scientist
Dave Ebbelaar
14 How to Manage Conda Environments for Data Science
How to Manage Conda Environments for Data Science
Dave Ebbelaar
15 How to Export Machine Learning Models in Python
How to Export Machine Learning Models in Python
Dave Ebbelaar
16 VS Code Speed Hack for Data Science
VS Code Speed Hack for Data Science
Dave Ebbelaar
17 17 VS Code Tips That Will Change Your Data Science Workflow
17 VS Code Tips That Will Change Your Data Science Workflow
Dave Ebbelaar
18 How to Predict the Future with Python (Forecasting Tutorial)
How to Predict the Future with Python (Forecasting Tutorial)
Dave Ebbelaar
19 How to Use Python Environment Variables
How to Use Python Environment Variables
Dave Ebbelaar
20 7 Data Science Tips for Beginners in 2023
7 Data Science Tips for Beginners in 2023
Dave Ebbelaar
21 How to Effectively Use the Data Science Lifecycle
How to Effectively Use the Data Science Lifecycle
Dave Ebbelaar
22 Full Machine Learning Project — Coding a Fitness Tracker with Python (Part 1)
Full Machine Learning Project — Coding a Fitness Tracker with Python (Part 1)
Dave Ebbelaar
23 Full Machine Learning Project — Processing Raw Data (Part 2)
Full Machine Learning Project — Processing Raw Data (Part 2)
Dave Ebbelaar
24 Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)
Full Machine Learning Project — Data Visualization with Matplotlib (Part 3)
Dave Ebbelaar
25 This Will Change Data Science as We Know It (ChatGPT)
This Will Change Data Science as We Know It (ChatGPT)
Dave Ebbelaar
26 Full Machine Learning Project — Detecting Outliers in Sensor Data (Part 4)
Full Machine Learning Project — Detecting Outliers in Sensor Data (Part 4)
Dave Ebbelaar
Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)
Full Machine Learning Project — Low-pass Filter & Principal Component Analysis (Part 5a)
Dave Ebbelaar
28 Full Machine Learning Project — Fourier Transformation & Clustering (Part 5b)
Full Machine Learning Project — Fourier Transformation & Clustering (Part 5b)
Dave Ebbelaar
29 Full Machine Learning Project — Predictive Modelling (Part 6)
Full Machine Learning Project — Predictive Modelling (Part 6)
Dave Ebbelaar
30 Automate Machine Learning with ChatGPT
Automate Machine Learning with ChatGPT
Dave Ebbelaar
31 Scraping Web Datasets for Data Science Projects
Scraping Web Datasets for Data Science Projects
Dave Ebbelaar
32 Full Machine Learning Project — Counting Repetitions (Part 7)
Full Machine Learning Project — Counting Repetitions (Part 7)
Dave Ebbelaar
33 How to Use GitHub Copilot for Data Science (Python + VS Code)
How to Use GitHub Copilot for Data Science (Python + VS Code)
Dave Ebbelaar
34 Every Beginner Data Scientist Should Understand This
Every Beginner Data Scientist Should Understand This
Dave Ebbelaar
35 Revealing My New AI-Powered Data Science Workflow
Revealing My New AI-Powered Data Science Workflow
Dave Ebbelaar
36 Auto-GPT Tutorial - Create Your Personal AI Assistant 🦾
Auto-GPT Tutorial - Create Your Personal AI Assistant 🦾
Dave Ebbelaar
37 Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)
Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)
Dave Ebbelaar
38 Building Slack AI Assistants with Python & LangChain
Building Slack AI Assistants with Python & LangChain
Dave Ebbelaar
39 ChatGPT Code Interpreter - Goodbye Data Analysts?
ChatGPT Code Interpreter - Goodbye Data Analysts?
Dave Ebbelaar
40 How to Deploy AI Apps to the Cloud with Flask & Azure
How to Deploy AI Apps to the Cloud with Flask & Azure
Dave Ebbelaar
41 How to Build an AI Document Chatbot in 10 Minutes
How to Build an AI Document Chatbot in 10 Minutes
Dave Ebbelaar
42 Is Falcon LLM the OpenAI Alternative? An Experimental Setup with LangChain
Is Falcon LLM the OpenAI Alternative? An Experimental Setup with LangChain
Dave Ebbelaar
43 GPT Engineer... Generate an entire codebase with one prompt
GPT Engineer... Generate an entire codebase with one prompt
Dave Ebbelaar
44 Pandas DataFrame Agent... the future of data analysis?
Pandas DataFrame Agent... the future of data analysis?
Dave Ebbelaar
45 OpenAI Function Calling - Full Beginner Tutorial
OpenAI Function Calling - Full Beginner Tutorial
Dave Ebbelaar
46 How to use ChatGPT's new “Code Interpreter” feature
How to use ChatGPT's new “Code Interpreter” feature
Dave Ebbelaar
47 LangChain just launched their new "LangSmith" platform
LangChain just launched their new "LangSmith" platform
Dave Ebbelaar
48 How I'd Learn AI (if I could start over)
How I'd Learn AI (if I could start over)
Dave Ebbelaar
49 I Used AI To Scrape The Web & Write PDF Reports
I Used AI To Scrape The Web & Write PDF Reports
Dave Ebbelaar
50 LangSmith Tutorial - LLM Evaluation for Beginners
LangSmith Tutorial - LLM Evaluation for Beginners
Dave Ebbelaar
51 7 Lessons for New AI Engineers - Beginner’s Guide
7 Lessons for New AI Engineers - Beginner’s Guide
Dave Ebbelaar
52 The Rise of the "New-Age" Machine Learning Engineer
The Rise of the "New-Age" Machine Learning Engineer
Dave Ebbelaar
53 OpenAI Assistants Tutorial for Beginners
OpenAI Assistants Tutorial for Beginners
Dave Ebbelaar
54 How To Connect OpenAI To WhatsApp (Python Tutorial)
How To Connect OpenAI To WhatsApp (Python Tutorial)
Dave Ebbelaar
55 How to Build Chatbot Interfaces with Python
How to Build Chatbot Interfaces with Python
Dave Ebbelaar
56 PostgreSQL as VectorDB - Beginner Tutorial
PostgreSQL as VectorDB - Beginner Tutorial
Dave Ebbelaar
57 My MacBook Setup (as a coder & business owner)
My MacBook Setup (as a coder & business owner)
Dave Ebbelaar
58 Easiest Way to Connect AI Chatbots to WhatsApp
Easiest Way to Connect AI Chatbots to WhatsApp
Dave Ebbelaar
59 ClickUp Tutorial - What Is ClickUp Brain? 🧠
ClickUp Tutorial - What Is ClickUp Brain? 🧠
Dave Ebbelaar
60 My Development Workflow for Data & AI Projects
My Development Workflow for Data & AI Projects
Dave Ebbelaar

This video teaches how to apply a low-pass filter and PCA to a dataset in Python, covering key concepts like data transformation, feature engineering, and dimensionality reduction. It provides a practical example of how to preprocess data for machine learning models.

Key Takeaways
  1. Import necessary libraries and load the dataset
  2. Apply a low-pass filter to the data
  3. Perform Principal Component Analysis
  4. Determine the optimal number of principal components
  5. Apply PCA to the dataset
  6. Visualize the results
💡 The low-pass filter and PCA can be used together to effectively preprocess data and reduce dimensionality, improving the performance of machine learning models.

Related AI Lessons

Stop Overfitting With Basically One Line of Code
Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression
Medium · AI
Stop Overfitting With Basically One Line of Code
Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Prevent overfitting in models with a simple code tweak, understanding the difference between Ridge and Lasso regression
Medium · Data Science
Stop Overfitting With Basically One Line of Code
Learn to prevent overfitting in machine learning models with a simple code tweak, comparing Ridge and Lasso regression techniques
Medium · Python

Chapters (9)

Introduction
1:22 What is feature engineering
2:23 Python files
5:00 Loading data
7:08 Dealing with missing values
12:16 Calculating set duration
19:26 Butterworth low-pass filter
30:35 Principal component analysis (PCA)
39:42 Sum of squares features
Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →