Coding Up a Linear Regression Algorithm From Scratch | Machine Learning Project

Data Science Dojo · Beginner ·📐 ML Fundamentals ·5y ago

Key Takeaways

This video tutorial demonstrates how to code a linear regression algorithm from scratch in Python, utilizing vectorized operations, object-oriented programming, and gradient descent optimization. The tutorial covers key concepts such as mean squared error, partial derivatives, and weight updates, and provides a practical implementation using NumPy and scikit-learn.

Full Transcript

hi everyone thanks for joining my name's nathan i'm one of the marketing managers at data science dojo um we're here with awesome wahid he's one of our data scientists as well as an instructor he's going to be going over coding up a linear regression algorithm from scratch in python we're really excited to have him i did he is going to be using a coding notebook that we've made available to everyone i've just posted the the link to it in the chat um and if you're on youtube i've also posted it there as well um so awesome why don't you go ahead so hi everyone i hope you have a good day night wherever you are at um so the purpose of this webinar is going to be slightly in fact very technical um i do expect that you would know at some level how a linear regression algorithm works so you know the basics behind what gradient descent is you know the basics behind how gradient descent works the fact that we're optimizing a mathematical function the goal of this webinar is going to be firstly to teach you the difference between vectorized operations and normal operations in python and why we do that the second thing we're going to be dealing with is object oriented programming in python because i've personally noticed that a lot of online courses that teach data science in fact even university courses that focus on data science they don't focus a lot on object oriented programming however in the real world um you do deal with objects all the time in fact when you're using the scikit-learn library or rather any library that you're using you are going to be playing around with objects so if this is not something this webinar and what we're about to do here is not something that will directly almost ever be used in the real world unless you're working on some sort of really really complicated research problems where you have to code up mathematical functions yourself however it will always be helpful to know these concepts and do this exercise at least once because once you've done it then it's there in the back of your mind and once it's there then it's very easy for you to debug problems that you run into when you're using scikit-learn or when you're using any library for that matter because it will be easier for you to follow an error and figure out what the problem is once you've done all of these implementations from the get go so again the goal over here is to do this extremely complicated seemingly useless exercise once but once we've done it then we have a much stronger skill set that we can use in our day-to-day lives as a data scientist and become a better programmer while we get on with it so let's just get started um i'll be using some very simple libraries i'm not importing scikit-learn over here just some basic numpy pandas and seaborn similarly because our goal over here is not feature engineering it's specifically the linear regression algorithm i'm simply picking up a data set from sklearn which is the boston data set ideally we would want to go through an entire exploratory data analysis process in which we figure out what all of these different columns mean um what do they do what is their correlation with the target variable which in this case is the price simply put this data set is just using all of these different features to try and predict the price of a certain aspect so the idea is and this is not the classification this is linear regression as a regression itself um logistic regression is was used for classification over here we're sticking to regression and the reason that i picked this particular data set is because well we don't have to do a lot to it so if we look at the data types then they're all float64s so we don't need to do any sort of data wrangling to convert it to a good data set that we can use similarly it doesn't have any null values again in the real world your data set will never be this good but because our goal over here is to simply code up an algorithm from scratch we're going to stick to a simple data set to which we don't need to do a lot of stuff so whenever we're working with um pure machine learning in the sense that we're coding up algorithms ourselves it's better to convert our data to numpy arrays rather than pandas in fact if you look at the scikit-learn implementation and you look at the source code of it even over there we are sim they do convert their pandas data frames into numpy arrays and if anybody has any experience with scikit-learn you will also notice that scikit-learn generally works better with numpy arrays why is that the case well simply put they're both built for different things pandas is better built for data wrangling data manipulation running a pivot running let's say a merge and overall your data cleaning and everything dealing with data doing the feature engineering while numpy or numpy whatever way you want to pronounce it that's better suited to actually doing linear algebra and as we get into this more and more you'll find out that machine learning at the end of the day is just clever use of linear algebra so that you can carry out very complex mathematical functions in just a few lines of code that run very very efficiently so let's get into it some of the prereqs that i'll first teach you guys is what is a vectorized implementation and why is vectorization so important in python and first of all in case somebody wants a primer on numpy a numpy array or a numpy array is simply a matrix right you can think of it as a list of lists but a list of lists has very few things that you can do and essentially they are just lists of lists while a numpy array is an actual matrix and because it's an actual matrix there's a lot of things that we can pick up from linear algebra that we can simply do directly on umpires so if i were to create a list for example over here um random list one is equal to one two three and then random list 2 is equal to 2 3 4 and if i were to do random list 1 plus random list 2 then it just concatenates both the lists but if i were to do it in a numpy array then what we're doing is we're actually adding the values together so this 2 is simply 1 plus 1 this 3 is 2 plus 1. so each element is just added to each other and similarly when we're doing subtraction each element is subtracted with each other this is exactly the same way that it works when it comes to linear algebra and that's exactly the goal over here we want to be able to implement linear algebra operations very easily in python and that's what numpy allows us to do similarly when we have multiplication in linear algebra multiplication is not that simple right where there's many different ways to find the product between two different matrices the first way is called the hadamard product in more simple terms this is simply element wise multiplication and that's done using this star operator so all this is done is that this has multiplied each element of a with the corresponding element of b another thing is the dot product and what the dot product does is that it multiplies the row each row of a with the column of b so this value the first the first row first column is the result of multiplying the first row with the first column this value is the result of multiplying the first row with the second column and so on and that's essentially what a dot product is to do with dot product you can do a at b which is the dimension that i'll be using um it's the simplest to use it's easy to write and it just works there's also a dot b which produces the same result and then we have np dot dot a comma b so this is just a small primer on numpy arrays in case anybody didn't remember it but now let's start to get into actual linear regression and why vectorization is so important so if we look at the hypothesis of a linear regression as a reminder a linear regression equation is just you multiplying a weight with a particular feature or a particular data point so if your x has multiple different features then the number of features are the number of weights that you're going to have so if your x is of let's actually look at what the shape of our x is so we have 13 variables there what's going to happen is that we're going to have a particular m1 so a normal equation of line is y is equal to mx plus c when we're dealing with linear regression with multiple different variables we have m1 x1 plus m2 x2 plus m3 x3 all the way in our case to 13 x 13 and this is all that this equation is telling us again our goal over here is to convert equations to python functions and python statements and how to easily do that and for that we need to understand what x represents and how this equation is representing x so the first thing we need over here is we need a series of thetas and thetas are going to be of size j what is size j size j is the number of features that we have and the hypothesis that we're getting the result that we're getting is for a particular row of data the number of columns defines how many features you have and the number of rows defined the number of data points that you have so the first thing we're going to do is we're going to create our weights and if you look at the weights that i've created they're just it's just an array of 13 weights each of which are initialized to one now imagine if we were to implement this hypothesis function using simple for loops what are we going to do we're going to go over each data point in x which is going to be one row of data then we're going to calculate the hypothesis and the way we're going to do that is we're going to go over each weight and the corresponding data point we're going to multiply theta by x and we're going to keep adding it to the hypothesis that's why i have initialize this as zero excuse me because i can simply add theta into x into it so if anybody doesn't know what the zip function does all it does is that so if i were to show one particular row in x print data point and then i'll just put a break here because we only want to see one data point it's an array it's an array of size let's actually display it it's an array and the size of the array is 13 right and that's exactly what we're doing when we go over a particular data point that means we're going over each value one by one and we're multiplying the theta by x and we're getting hypothesis and for each prediction we're appending the hypothesis to the prediction this is the simplest way to implement this function over here and if we do that it works there's no problem with it but what i want you guys to remember and what i want you guys to understand is that if we were to start implementing for loops for every single data for every single mathematical formula we're quickly going to run into problems now the problem the main problem is going to be time and let me just show you how to do the same thing using a vectorized operation so again it's as simple as x dot weights why is that well the reason is that our data is of shape m by n where let me just uh yeah so if we look at the formula over here the shape of our data is 506 by 13. the shape of our weights is 13 by 1 and what do we want to do we want to multiply each row with the weights that sounds an awful lot like a dot product if somebody doesn't know what a dot product they can put it in the q a and i'll answer it but i'm assuming that everybody knows what a dot product is and in a dot product all you're doing is you're multiplying a particular row with a column in this case we have just one column which is the weights so the result that we're going to get is going to be in fact i can show it it's going to be 506 by one and that's going to be a prediction for every single data point so this is one in in fact if i cut it down it's eight lines of code i was able to do the same thing using just one line of code but beyond that it took 1.29 milliseconds while this took 67.8 milliseconds and keep in mind this is a very very very small data set to the point where it's only five or six values and the difference is almost a factor of 60. if we were to explore this to 10 000 data points and the difference is a factor of 60 then you can start to imagine that things start to get very very complicated so again the goal over here is number one to understand how this vectorization helps and the way that it helps is that when we convert for loops to simple linear algebra dot products in that case not only is our code more concise but we're also running our code in a much more efficient manner because all of this is handled by numpy and numpy at the back end is written in c plus plus and it has a lot of parallelization that goes on as well so all of this is done almost instantaneously as compared to you actually going over every single value in a for loop so this is the first aspect which is the vectorization aspect of it yeah awesome we have a couple of questions one is can you can you go over the dot product really quick okay sure i can definitely do that so what happens in the dot product is essentially what's happening in this for loop um if i have a small data let's say small x is equal to and i'm going to create a small data set over here in fact i think i did over here yeah so i have a and i have b if i were to look at a it's one two three four if i were to look at b it's in fact let's change b b is equal to mp dotted a two two two three five six and then if i were to look at b it's two three five six what's happening over here is that if i were to go and do this how did we get this value 12. the way we got this value 12 was we took the first row of a which is 1 into and we multiplied it with the first column of v so that's 1 into 2 plus 2 into 5. ah my bad 2 into 5. so we got 12 over here and that's how we got the first value over here the second value over here is going to be uh where is a where is b this is a and this is b so we have the first row multiplied by the second column so we have one into three plus two into six so i'll just make this a little bigger so it's easier to see and yeah first row multiplied by the first column and we get 12 first row multiplied by the second column and we get 15 and we can do this second row multiplied by the first column is going to be 3 into 2 plus 4 into 5 second row multiplied by the second column is going to be 3 into 3 multiplied by 3 into 3 plus 4 into 6 and that's how we get this matrix over here and this is the simplest form of a dot product all you're doing is multiplying each row with each column and summing the values and that's essentially what a dot product is so what are we doing over here if we were to inspect this for loop properly let's take a step back and let's just look at one row of the data so if i were to take out one row of the data which is simply going to be small x is going to be x0 and if i were to look at the shape of x it's simply 13 by 1 and if i want to get one prediction which is what this function is showing over here i would go over each weight for theta data point in zip weights x then i'm simply multiplying theta into the data point and i'm adding those values one prediction is equal to zero and i'm adding those values together so what did i do over here if i were to inspect x and if i were to inspect the weights we're multiplying one with this then one with this then one with 2.3 then one with 0.0 then one with 5.38 and we're adding it all together to get this and this is done over every single data point so this small x is one row of the data and our weights array is an array where there's just one column and each column and this one column just has every single weight attached to it so instead of going over each data point and calculating this manually we were able to take a shortcut and then simply apply this to um the dot product over here so okay now does that answer your question about the dot product over here if somebody can just say a yes or a no i can move on because unless you understand the dot product then there's really no point again all we're doing is we're multiplying each row with each column and that's all we're doing and because we're doing that each row in the date each row in the data is an entire data point or an entire row and then when we multiply it one by one we can just simply do that with a singular dot product and what this dot product is doing is taking each row of x multiplying it with the column which in this case there's only one column which is all of the weights and it's putting it into one array and that's it over here one trick that i'll tell you with a dot product is that whenever you have array one at array two the result resulting array will have the shape of array one dot shape 1 by array 2 dot shape 0 so let's see this what is x dot shape one it's 13. sorry zero yeah that's the rule the rule is that when we're doing array one at array 2 or a1 dot array 2 then the resulting shape and this is a main trick that i use all the time if we're multiplying x with weights or we're taking a dot product between x and weights the resulting array will have shape five or six by one so we take the number of rows of the first array because that's what we're doing we're multiplying each row with each column and we get the number of columns in the second array this is a trick that i use all the time in fact in most cases if i'm just able to figure out the shapes then half of my problem is solved so now that we understand this properly let's just move on and before that i'll take any questions um so we have a little some slightly irrelevant questions why is relu better than sigmoid that's a good question it takes a little bit of time to answer um in a nutshell it's because with relu you're not just stuck between one and minus one but you're also able to go all the way till infinity technically when it comes to positive so you have a lot more room to play but it also has problems which is why we use leaky relu and we take harmonic mean and f1 score y naught am or g m um i'm assuming am means average mean i'm not sure and well the reason is because with the harmonic mean we want to balance precision and recall and we can do that again i would like if the questions are a little more relevant but let's move on okay so uh yeah in object or now let's move on to the second concept that we'll be using a lot which is object oriented programming and object oriented programming is a very very complicated topic there's entire university courses which are dedicated just to object oriented programming over here i'll only be teaching you the basics and mostly because in python there's a lot of things that exist in object-oriented programming that don't exist in python if somebody knows about it then it does the if the term private and public variables rings a bell then you would notice that in python there is no such thing as a private and public variable but the simple thing is that a class is just a thing that or a sort of blueprint that you define and once you define that blueprint then you can create objects of that type so for example over here i've created a class dog the syntax is simply class dog and every class should have a sort of init function and these this init function is a sort of automatic function which is written with double underscore in it and then double underscore afterwards and this is run automatically in which you can pass some variables to it and once you create those variables you can create what are called attributes these attributes are things or variables that exist within the clause and then you have certain methods methods is just a fancy way to say a function that exists within the class and using this self variable that function has access to the attributes within the class so what do we have we have attributes that are simply variables that exist within the class and then you have methods that are functions that exist within the class and using the self um syntax you can access those attributes within the class as well so what have i done over here i've created a class i've created one function called def in which i've assigned these variables to it and just by the way they don't need to be assigned anywhere with they can be assigned anywhere within the class as well so i can just say self dot species is equal to unknown and then whenever the class dog is instantiated ie whenever we create an object let's say i've created an object called dome sorry um i've passed it the parameters tony and ten which means that this is a dog whose name is tony and their age is 10 why because now self.name is equal to name and self.h is equal to age then when i print the attributes then species is there because even though i didn't initialize it directly because outside the function i have this variable and because these methods can access these variables using self then i can simply do it and if i were to ignore this then i might get an error that dog object has no attribute called species and that's exactly what we're doing here it's just a way to package things in a nice way that we can reuse again and again and similarly um in python there's no such thing as a private variable ideally we would want some sort of control over these attributes so that we cannot change them outside the class but in python you can do that easily so i've changed the dog's name from tony to ezekiel and now if i were to print the attributes their name is ezekiel so there's a lot of attributes that it that can exist within class in fact when we were to call the sqlearn function if we want to know all of the attributes that exist within escalant so for example let me just um i'm going to escalant a random edibles classifier i'm going to import this and suppose that i want to see what attributes exist within the adibose cluster part i'm going to use an inbuilt variable called dir and i can see that there's a lot of things over here and to compare i'm going to compare it with my own class that i created which is dog and within dog i have all of these different things that i didn't really initialize but they're somehow still there that's because there are some things that classes need when they're in built in the first place and for that we have all of these different variables um some of these are very useful so for example this dict variable i can simply print all of the attributes of dog in a day and that's something that's very cool so yes if we were to try and look at this then up till here all of these things are standard python things but then we can see that okay this has a boost variable attribute this has a boost discrete attribute and in python the naming convention is that when you have a single trailing underscore then that means that this is only supposed to be used internally you obviously can access it outside of python but usually this is used internally so check and features compute probability from decision these are all things that scikit learn is using to create and run the adaboost classifier however um these are all things that we have access to so everybody knows the dot fit function everybody knows the get params function everybody knows the predict function and without even looking at the documentation which of course you should do um you can use the dir function to access all of these attributes because at the end of the day all of these scikit-learn libraries that we get they are classes and there are multiple nested classes which again is very complicated we won't be getting into that but it should give you a good primer to be able to at least look at the source code of psychic learn and figure out how they're doing things if we ever run into problems now one small thing over here is that there's a lot of psychic learned functions that we can use without instantiating a class so right now if i wanted to print the attributes of dog i couldn't just do um dog dot print attributes because this would give me an error because it's saying it's requiring one positional argument self but i didn't pass any argument when i ran it here it still worked why because i instantiated an object out of it so in object oriented programming there's two things the first thing is that you instantiate a class and the second thing is that you you create a class which is a blueprint and then you create an object from that class which is the actual object that contains the values for that particular blueprint and once we have that there that is what is passed in this self variable automatically so now that we have all of the basic set i want you guys to strap yourselves in because this is about to get very very complicated very very fast just one last thing that we have these things called static variables and static variables are variables that we can use without necessarily instantiating an object so i've created a class called random functions and without even creating an object i can call this function so i didn't create any object i didn't do anything i simply called this function and it printed hello world um this default method is what's called an instance method and i can't run this because i need to first create an object once i create an object then i can run it and i just print itself and it's telling me that self is a random functions object so yeah now that we have the basic set and this static method is by the way how we're able to use things like train test split because technically train test plate is also a part of a class in scikit-learn but because it is a static method we can simply use it without any scikit-learn object so there's many other i've attached this link over here that you can open if you want to learn more about object oriented programming but we're going to be skipping that part now let's get into it um does anybody have any questions because again i would like to say that this is going to get very complicated very very quickly so if anybody has any questions now's your time to ask and i'm going to wait 10 seconds okay let's get started so as a primer what is the gradient descent process the gradient to say and i'm not going to be talking about it intuitively because there's plenty of resources if you were to google how to do gradient descent you're going to get extremely cool animations visualizations about how there's a ball rolling down a hill all of these things that are very good if you want to understand gradient descent intuitively but our goal over here is to convert the gradient descent functions into mathematical formulas so gradient descent is essentially just a two-step process that's repeated over and over and over again the first step is that you calculate the derivative of the loss function with respect to each weight let's take a step back here the first thing is a loss function and that can be anything that defines how good or bad your model is doing you calculate the derivative of the loss function with respect to each weight ie you calculate the partial derivative and then you update each weight accordingly so what is the equation of the mean squared error the mean squared error is a very very basic error that we use in linear regression all the time and all we have is that we have y i so whenever there's i to n that means that for each data point the error is going to be different so for each data point we take the true value of y minus the predicted value of y which is always going to be y hat and y hat is the same as our hypothesis function above so all we have is weight one mult weight zero multiplied by x0 plus weight 1 multiplied by x1 all the way till weight m multiplied by x m where m is our number of features and n is our number of data points so again what are we doing over here we're taking the difference between the true value and the predicted value we're squaring it and then we're taking the sum so this summation can be done using a for loop whereby we simply go for data for true value prediction in zip y y bread and then we simply just total is equal to zero total is equal to true value minus prediction and then we would square it up and then at the end we would divide total by the length of y right this is the for loop way of doing it but why are we creating a for loop when we know how to do vectorized operations and in numpy it's as simple as finding the difference between the two arrays because if you recall as i showed you finding the difference between two arrays is going to subtract each individual value with each other then we square it which is going to square each individual value and then we just take the mean and why am i taking the mean because well if you notice then what is this doing this is summing all of the values and then dividing it by n that's simply the mean so we can create two sample arrays we pass it to our sample mse function and we get this error and it makes sense because all of these are correct except for one so the error is 0.25 now let's have a look at what the derivative of the mean squared error looks like and this is a little more complicated and i'm not going to show you the derivation of this derivative if you're interested again i've showed this link my goal here is not to teach you calculus i'm not even that good at teaching calculus so we're going to focus on the coding aspect of it so just take it as it is the derivative of this function over here is going to be minus 2 x i j so it's always going to be for a particular weight so for weight number j the derivative is going to be minus 2 x i j and then simply the difference between the true value and the predicted value so for each weight we are multiplying the data point associated with that weight and the difference is that and we're multiplying it with the difference between our true value and the real value again this sounds awful a lot like a dot product because over here we're multiplying each row with each column so if our x is a data set of shape n by m where we have n data points and m features our y values are going to be of size n one and our weights are going to be of size m by one that means the partial derivative that we want because remember in a linear regression the weight update equation is such that each weight is updated by its own partial derivative so if we have an array of weights that means we also have an array of partial derivatives and the shape of that partial derivative is going to be m by one and this is what gives us our first hint what hint does this give us the hint that this gives us is that when we do this calculation for all of the weights together again because we want to do this efficiently we don't want to run a for loop over each and every weight and then calculate the partial derivative one by one we want to do this for all of the weights together the result of the partial derivative is going to be m by 1 the same as our weights so that we can simply subtract them the second thing we should notice is that for a particular weight only the relevant data point is multiplied by the value of the difference and let's have a look at this now we have a slightly more complicated data um i mean it's still a toy data but it has three features and four data points if you want to initialize the weights then take a second to guess but our weights are going to be of length three because we have three features over here and four data points again it's going to be x dot shape one by one if we were to get our predictions just as before it's x at weights our predictions are of size four by one because for each data point we're going to have one prediction and i've randomly selected some predicted values so they're off by just a little the first value is supposed to be one i've set it to be eight the second value is supposed to be six i've set it to be seven and so on now if we were to calculate the loss over here let's get that loss it's one point zero so for this particular data the loss is 1 0.0 so now let's actually implement this function the first step is to find the difference between the true values and the predicted values the difference is of shape four by one our x is of shape four by three the resulting partial derivative that we want should be of shape three by one something doesn't make sense over here how do we do this what we want to do is that we want to multiply the entire first column with the difference so if this is x and let me just remove these two what we want to do is we i want to multiply this with this then multiply the second row first column with this and instead of multiplying the rows with the columns i want to multiply the columns with the columns that doesn't sound directly like a dot product but if i were to simply take a transpose i can do it like this so i've taken a transpose in which case this column has now become the row and now i can simply do a dot product so minus x dot t at the difference and now the partial derivative is of shape 3 comma 1. zero and one value of the msc you calculated what is this zero and one representing so the it's it's not necessary for it to be one in fact if i were to change this value a bit more the value is going to increase to 3.0 all this is doing is that it's finding the difference between my actual value and my predicted value it's squaring it and then it's finding the mean of it so what we're doing is we're finding the mean squared difference or the mean squared error between our actual value and the predicted value and that is essentially a measure of how good or bad our model is there's many different ways to calculate this error mean squared error is just one of them some people are more aware with the root mean squared error but i didn't use that here because the derivative is a little more complicated but essentially that's all we're doing um these in terms of what they represent that's a good question because it's a little difficult for me to say that one is a good error but 10 is not a good error that generally depends on the data point and that's why and it depends on the problem at hand so that's an entire other topic on how you evaluate regression models and i don't want to get into that right now but very easily we were able to calculate this let me just do it again because i think i updated my weights too much so just to run it again we have x we have these weights i created a y true randomly based on my predictions then i found the difference and i found that the error right now is 1.0 then i calculated the partial derivative and then i did the weight update so when we're doing a weight update we want to multiply a learning rate and again intuitively speaking this is because so this is so that our weights don't get optimated updated too drastically too many times because then we could overshoot and not actually get to the optimal value of the weight um [Music] but if you know this intuitively this should not be big of a problem to understand and we updated our weights and we saw that [Music] just by running this epoch once our weights got better by 0.05 and i can do this again and again and it'll get even easier so in just one iteration we reduce our loss from 1.0 to 0.95 and then all i need to do is run these cells again and again and we will get to the same response so i'm going to show you one more loss function and then we'll actually start coding up the class so another popular loss function is what's called the mean absolute error function and in the mean absolute error instead of squaring the difference you're just taking the absolute value of the difference and then you're taking the mean of it but the derivative of the absolute function is not straightforward if i were to go into the mathematical complexity of it it's because if i were to try and graph this value it would be like a v and because of that the mean absolute error function is not differentiable when y true is the same as y predicted however we can in most cases it is approximated as a stepwise function so this i've broken down into a chain rule and what the chain rule is doing is is that to find the partial derivative of the error with respect to weight j you take the derivative of the error with respect to the predicted value and then you take the partial derivative of the predictive predicted value with respect to weight j this is simply going to be x j why because y i is equal to w naught plus x naught plus w one plus x one so for a particular weight we're only interested in that particular value so for weight not the derivative is simply going to be x naught so this part is simple but how do you take the derivative of the absolute function and that's a stepwise function simply put if the predicted value is greater than the actual value your differential is going to be plus 1 if it's less than the actual value of your differential is going to be -1 so a very interesting difference between the absolute value function um between the absolute error and the squared error is that in the squared error the magnitude of the difference also matters but in the absolute error the magnitude of the difference doesn't matter only the direction matters so if in a problem if in for a particular problem our predicted value is way off from our actual value then the squared error might actually work faster than the absolute error because the squared error is taking that magnitude into account but the absolute error is not that's just a small side note don't get too hung up on it and this is what the differential looks like and the reason i want to talk about the absolute error is because in a lot of cases we do find stepwise functions in maths as well and when we want to apply stepwise functions to our programming languages as python it's actually it sounds daunting but it's actually much easier said than done so again if we were to find the sample mean absolute error it's simply going to be y minus y print and this can be a single number or this can be an entire numpy array and it will work just as fine we find the difference we take the absolute value we find the mean and we return it it's as simple as that but one very important function in numpy is called the np dot where function in this the first parameter is your condition the second parameter is the value that returned a value to return if condition is true and this third parameter is the value to return if the condition is false so i've created a small array one two three four five and i've said that if the value of a is less than three then the value should be zero otherwise the value should be a and i can run this and you can see that the first two have become zero and the rest of the three have remained the same because i've said just keep the same value and i can do this anyway so i can say take the exponent raised to the power 10 and this has become 1 10 0 2 4 i can say multiply it by 10 and this has become 10 20 and i can put values so i can either use the same array i can even use a different array as long as it's the same length so if i do this this is also five values i can just do this as well as long as the length is the same it will work so how do we get the derivative of this the first thing we do is that we find the difference why do we find the difference because instead of create it would be much more complicated to try and whip up an if statement to do this directly so we use some clever mathematics we find the difference and in mathematics you can say that oh it's not defined at y-hat is equal to y but in programming we can't do that like what do you mean not defined i'm i'm running a program i need to define it something has to happen if that scenario were to come up in the real world that's not an assumption that isn't that's an assumption i can make as a mathematician but that's not an assumption that i can make as a programmer because there will be some edge case in which y hat is equal to y and if that edge case does end up coming about what do we do intuitively speaking i would ask you guys to think about this for a couple of seconds intuitively speaking if the prediction is perfect what should we do in fact i want somebody to answer this in chat if our prediction is the same as our actual value what should the error be should it be plus one should it be minus one should it be something else entirely so if anybody can take a crack at this in the chat yes so nisha server says it should be zero and yeah that's pretty much the correct answer if we want if our prediction is perfect then we don't want anything to update if we don't want anything to update then our partial derivative which is what updates the weights that partial derivative should be zero and in this case that's precisely what we're going to do if the derivative is if the difference is zero then we substitute the values with zero otherwise we just take the difference as it is if the difference is positive then we substitute it with one and we keep the other value as it is if the difference is negative then we take the negative value and keep the rest as it is what is difference being so again remember i'm taking the difference between prediction minus the true value so if the prediction is greater than the true value then this difference is going to be positive if the prediction is less than the true value then this prediction is going to be negative so what do i do i take this absolute i run the same differential that i ran before because i know it works and then i do the same process that i did before i create some random toy data where this is what the difference is i calculate the absolute error i calculate the partial derivative and then when i update the weight the partial derivative has gone down a bit and i can keep running this again and again and we'll get closer and closer this should already start to make sense very simply what we've done is that we have created a simple partial derivative function that we can run again and again and again and again and slowly our weights will start to converge to a point where they are optimized enough to give us the right answer so now let's actually start to write up this class so what do we need for this class the first thing we need is an initialization function that takes in all the important values so we have def in it and it's always going to have cells in it and let's say that i want to be able to let's keep the most simple kind of linear regression that only uses one loss function which is the mean squared error what do i need to define for that i need to define the max number of iterations that i'm going to run whether i want to run a thousand whether i want to run this a thousand times or i want to run this 100 times or whatever and i need to define a learning rate this is the most essential things and what i'm doing over here is that i'm giving it some default values such that even if a person doesn't define the values there's going to be some default values for it then i'm going to define some weights so self.weights initially they're going to be none why because we can only define our weights when we know the shape of our actual data remember the weights have to be the same size as the number of features so we cannot define it right now we're going to say that our self.max iterations is the same as max fighter we're going to say that our self.learning rate is the same as and again as the learning rate that we pass into it so this is the very basic initialization that we need to run our actual model what's the next thing that we need the next thing we need is some way to initialize the waves okay so we have number of features that get passed into it and initially i initialized it as one but it is good practice to use a normal distribution with mean zero for weight initialization again there is a lot of research on the best way to do weight in this initialization and there's things like the xavier initialization initialization method there's one created by who at al and we're going to be sticking to the simplest form which is np.random dot normal and the size is going to be the number of features comma one so now we have our two main thing set we have a way to initialize our weights and we have a way to initialize the class itself now the next thing we need is a predict function what does this predict function do what this predict function does is that it takes a cell it takes an x which could be any data set it could be a test data set we say that the y thread is equal to x at self.weights as we were doing before and then we simply returned y print now let's get to the difficult function which is the fit function what are we going to pass into the fit function we're going to be passing cell which is the object itself we're going to pass an x and we're going to pass a y the first thing that we're doing to do is we're going to just to be sure as a sort of error handling we're going to make sure that the length of x and y are correct so we're going to assert this and if this is not true then we're going to raise an error and we're going to say that x and y should be the same length and then once we do that we're going to initialize the weights so we're going to call the function self.init widths and we're going to pass it x dot shape one we've done this before we're just only putting it together and now we actually get into the actual gradient descent we're going to run for dash in range self dot max maxliter and the reason i've put a dash over here is because i don't need a counter variable here if you do then you can add an i over here but i don't really need it the first thing i'm going to do is that i'm going to get a prediction because this is what we'll do to calculate our loss so use x to predict um the particular prediction then we're going to carry out our loss and for now i'm going to leave this empty right and this is a very important thing that we often forget in programming is the idea of abstraction which is the idea that i can either take a bottom up approach where i define every single small function before i define my big functions but then i lose sight of what i'm trying to do so before i actually define the loss function i'm just going to assume that i've created a loss function over here i'm just going to say get loss i haven't defined this yet but i know that i have to do this okay so that's the first thing done the second thing we're going to do is we're going to find the partial derivative so how do we find the partial derivative again we haven't done this yet get partial derivative from function and we're going to define these functions but right now we're going to leave it as it is then we're going to update our weights and we're going to say [Music] self dot learning rate multiplied by the partial derivative that we get and this will update our way so right now we have a very simple dot fit function that does it in three steps but this isn't enough because we want some way to be able to monitor what's happening and how we're doing it so i'm going to add a couple of more details before we start writing up the loss functions and i want to store my loss history and i for that every time i calculate the loss i'm going to say loss history dot append loss this way i can get the history and figure out if my model is even working fine or not another thing that i'm going to do and this is something that's very interesting is in fact i will get to that later let's just run this once and let's write start writing up the loss functions so let's say that we have the main loss function which is mean squared loss which states y y bread and as before i'm going to return np dot mean y minus y spread into two and this should be a static method because i should be able to calculate my loss regardless of whether i have a linear regression model or not this should work regardless of anything else but the derivative is something that will only be used internally so self x y y thread and then i'm going to return minus x dot t at y minus y breadth and this is something we did before and then we're going to divide it by len x it would be a good idea to wrap this in brackets as well so now we have this so now we can do this easily self dot mean squared loss y y squared partial derivative is going to be self dot mean squared loss derivative x y y bread let's see if this works okay and just for now what i'm going to do is i'm going to print the loss function so we can monitor what's happening and i'm doing some basic pre-processing over here where i'm scaling the data and especially when you do this manually this is essential to do in fact in scikit-learn they do some sort of scaling within the linear regression class but we're just going to do it is do this outside all this does is that it centers it around a mean of zero and a standard deviation of one so if i were to create this regressor got onward uh okay let's remove this for now input one does not have enough dimension so clearly i've made an error over here let's check it out so we're getting an error here and we're getting an error how do we figure out what this error is we print out x dot shape and and we print out self widths non-type objects so clearly i did ah self.init waves so there's an error here see and the only way i was able to do this was because i looked at the shapes just by the way i'm coding this actually live um so i'm going to be running into problems and now you can see very clearly that our loss is decreasing i mean it's decreasing very slowly but it's working so we're actually getting somewhere now let's start to add a bit more complications into this okay so what we want to be able to do is that we don't always want to run a thousand uh iterations we want to be able to stop early on as well so if i were to change the learning rate to something like 0.01 then okay even this is not very quick ah you can see over here that even though the loss is a lot the learning rate is not changing at all in fact from this point onwards after every 10 iterations it's only decreasing 0.0001 and that's a little stupid isn't it because why are we keeping on running our algorithm when it's clearly reached an almost optimal value so for that we're going to add one more thing into this and that's what's called the tolerance and we're going to say that the tolerance is something like 0.000 we're going to say self.tolerance is equal to tolerance and then within the fit function what we are going to do is we're going to say if after we've calculated the loss if np dot absolute loss minus previous loss what is previous loss we haven't defined it yet but don't worry is less than the self for tolerance so remember it will never be exactly the same but if the difference is smaller than a very very small value then we can simply say that the model has converged and we will break and we can initially we can say that the previous loss is going to be infinite because before we do anything the loss is infinite and that is what np dot infinite does and at the end of it so at the end of our predict function we can say that if the model has converged we'll simply print model converged else will print ma warning max iterations reached model did not converge so slowly and gradually we're able to add more and more complications into our class and if i run this again uh converge reference before assignment why because i have not said that converged has to be false max iterations reached model did not converge and if i were to change the tolerance over here we'll notice that it will change okay so this again isn't working interesting let's find out what the error is maybe our loss function isn't working this is going to take a while ah i know what i'm doing wrong i have said that previous loss is infinite but i am not updating it so previous loss is equal to loss cool and now we can check it out and after printing it said model has converged so now we're actually getting somewhere which is pretty cool to be honest okay so let's slowly and gradually start to add even more complications into this so another complication that i want to add is that we want to be able to have some sort of fit intercept as well so if you remember in linear regression we have that weight update equation but we can also introduce a bias that bias in psychic learn is called fit intercept and this is either going to be true or false and now would be a good idea to [Music] fit intercept by default it's going to be true right so if the fit intercept is true then instead of adding a separate bias term which would require its own derivative and brings up its own host of problems a very simple way to implement this is we take this fit intercept if it's true which by default it is then we just np dot concatenate we take x we take np dot ones of shape is equal to len x by one so we're just appending another column to it and then we say access is equal to one so now we've simply added a fit intercept to it by almost no problem at all okay so one more complication added let's check it out and see just by adding the fit intercept my loss got so much better and that's because it gave my model that much room and see once you start to implement these things by hand from scratch is when you start to figure out how important very very small things can make all i did was add a fit intercept and my model was able to converge from 572 to 29 because i allowed the line to move um above and down so there's a question are there some types of data spread where it won't converge yes definitely like that entirely depends on your data there's definitely go that definitely um might not converge at all and that is why there is always a trade-off between number of iterations learning rate and your tolerance i should probably increase the tolerance a little bit but if i were to reduce the tolerance then i'm pretty sure my model will not converge oh it's still converged because i have a lot of iterations let's keep it to a thousand iterations and it's still converged wow my model is too good i can't do anything about it so okay now let's add the last complication because we have 10 minutes left so suppose i want to be able to pick and choose which loss function i wanted to use so i can add a simple and the way that scikit learn does it if you remember is that you add a simple string to it in which you say that you want to use mse or you want to use absolute um the absolute uh error function and then we define those two functions so we say at static method [Music] yeah so it's static method def mean absolute loss y y squared return as we did before and p dot mean and p dot absolute y minus y right and then we remember the derivative and i'm just going to copy paste the derivative because it's too much and let's just clean it up a little bit okay so now we have our absolute loss so now the thing is that when we have multiple different loss functions the two lines that are going to change is this line in this line and a very simple thing we can do is that we can add a bunch of different if statements that if the something like um if self tolerate did i add it to self i don't remember i did not self dot loss function is equal to loss function so i can simply do something like if self dot loss function is equal equal to mse then do this l if self dot loss function is equal to mae then do the other thing but i'm not going to do that why am i not going to do that because this will get very messy very very fast imagine if you have 10 different loss functions are you really going to create 10 different if statements that's not a very good idea now i'm going to start to go a little quickly because we're running shorter time a very cool thing you can do in python and which is why dictionaries are my favorite thing in python is that i can simply attach this string to a function so i've already defined the function mean squared loss and the key is the strain the value is the function absolute or in fact i'd rather say m a e self dot mean absolute loss now what i can do is that over here i'm calling a function which function am i calling i don't know i'm going to simply call the lost dictionary within the loss dictionary i'm going to pass the value of loss function which we have defined over here and when i do that it's simply going to replace it with this function to which we're passing these values and this will also make sure that every new loss function that we make has the same format whereby it takes the same input and returns the same output and that's something that's very essential when you're trying to write scalable code and we can do the same thing with our derivative dictionary self taught means squared or string and now that we have this all i need to do is change this to self dot and i assure you hopefully it works just as fine and now for one final complication we talked about the loss history let's actually formalize it initially it's going to be none and then once we have the history we're going to say self taught loss history is equal to mp dot array converting into an array is always helpful and another thing that i want to store is actually my weights as well because i want to for the last five minutes i want to visualize house changing some things is going to change my model so we have our weight system over here and right after we calculate our weights which is going to be here right before we calculate always dot append self.weights and this is going to give an error so what reshape minus term one is that it flattens the array right now it's going to be a number of features by one array in this case what i want it to be is just 13 just a single one dimensional array so that's what we're doing over here and because once again numpy arrays are objects that means that if i were to keep appending it to a list when it changes it would keep changing because the only thing that's stored in an umpire array is a reference to the object and when i change the object which i'm doing over here even if it's stored in a list it's going to get changed within that list as well so what do we have to do we have to call the copy function so that it creates a separate copy and then it passes it and simply with history is equal to np dot array which history and i will finally create a function to return these def get training history cells returns self taught loss history and lastly one neat thing that's always useful is the get coefficients method which is there in sklearn if you guessed it it's simply self taught with so now let's run it and wow so that's the randomness of it sometimes it works and sometimes it doesn't so because it converged early even though our max iterations were 1000 we have 874 values over here and we can visualize the performance we see that the training loss goes down and the weights sort of try to find an optimal value over here and over here i'm only looking at one of the weights i can easily change the score to look at a separate weight i can even look at all of them together but that's not a very good visualization and you can see they all tend to go towards one value and something very interesting that i want to show you guys over here is that if i were to use the absolute loss function so loss function is equal to mae hopefully i didn't make any error while i was copy pasting this ah can somebody guess what's happening over here in fact there's not enough time to guess the training loss is going up and down and up and down and up and down so the simplest solution to this is to reduce the learning rate and now it converged in only 109 steps and see if you notice for the absolute function the weights are jumping about a lot and the reason is precisely because the magnitude doesn't matter when we were using the squared error function then we could clearly see that it was a very smooth line and that's because as we get closer to the point as our predictions become more and more accurate the magnitude decreases but in an absolute function the magnitude doesn't matter so it doesn't matter how close or far we get the partial differential equation is only dependent on the particular data point which is why this is a lot more jittery as compared to the squared loss function so i'm going to wrap it up now because i want to respect everybody else's time um the point of all of this is for you to actually dive deep into all of this coding at least once in your life as a data scientist because once you do this once you will figure out how very very small things such as including this fit intercept create such a massive difference in our models you get an in-depth look into how our models work and you once you've done this i assure you that it will be that much easier for you to figure out the problems in your models because you've done this once you know exactly how the model works it's like if you've built a car then it's easier for you to diagnose what the problem in your car is but if you've never built a car in your life if you know nothing about the car and your car breaks down you don't know how to fix it so if you built the car just once then you're able to fix any car later as well and all of these resource links we should have emailed them to you the only thing that matters is this notebook this notebook has all of the links within it hopefully we'll convert it into a blog post and um this has also been recorded and it will be uploaded so i'll just pass it over to nathan who can wrap it up hey thanks awesome and thanks everyone for joining today um if you have any additional questions about this webinar or the notebook we will be posting this um and a recording on our youtube channel as well as on our tutorial site and and our blog feel free to leave a comment there and we'll make sure it gets passed on to awesome awesome thank you very much i think everybody really enjoyed this we do have an another webinar coming up on may 12th titled building trust with the ai solutions um it's with david frostly a principal ai architect at microsoft i've posted the link in the chat so if you're interested go ahead and give it a click and see what it's all about again thank you awesome thank you every everyone for joining and i hope you all have a great rest of your day

Original Description

Get started with linear regression, this talk will walk through an advanced Python tutorial in which we will be coding up a Linear Regression algorithm from scratch and making it usable in a manner not so different from sci-kit-learn. Whether we learn data science through online courses, or tutorials, or jump straight into a hands-on project-based approach, few of us take the time to try and learn how some of our favorite libraries are built. All we know is that we can import the sci-kit-learn library, instantiate a Linear Regression model, and call the “.fit()” method on it. What is happening under the hood? How are these different algorithms implemented in an efficient manner? We will be learning about Object-Oriented Programming in Python, vectorized operations, and efficient coding strategies that allow for as much future customization as possible. We will be walking through how to convert mathematical formulas into Python code that runs as efficiently as possible. This webinar is not for the Python beginner, you are expected to know basic Data Science tools and frameworks, such as Pandas, Numpy, Scikit-Learn, etc. You are also expected to know, at least in theory, how Linear Regression works. This webinar will bridge that gap between theory and implementation and in the process teach you some advanced Python tips and tricks. Presenter Bio: Asim Waheed holds a bachelor's degree in Computer Science. He is a Data Scientist at Data Science Dojo where he works on building data products while consulting Fortune 500 companies, and course development for online bootcamps as well as corporate training. Before working as a Data Scientist, he was a Research Assistant at the Security and Internet Analytics lab at Lahore University of Management Sciences, working mostly on problems related to Deep Learning Security, and Internet Analytics. He has worked in collaboration with the Machine Learning Security group at Virginia Tech. Notebook: https://bit.ly/linear_
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 0 of 60

← Previous Next →
1 Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
2 Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
3 Reading External Data Sources | Beginning Azure ML | Part 2
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
4 Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
5 Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
6 Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
7 Feature Engineering & R Script | Beginning Azure ML | Part 6
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
8 Building Your First Model | Beginning Azure ML |  Part 7
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
9 Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
10 Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
11 Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
12 Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
13 Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
14 Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
15 David Wechsler on the Impact of Data Science Bootcamp
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
16 Andrew Choi on the Impact of Data Science Bootcamp
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
17 Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
18 Michael DAndrea on the Impact of Data Science Bootcamp
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
19 Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
20 Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
21 Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
22 Scale R to Big Data with Hadoop & Spark | Community Webinar
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
23 Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
24 Ryan DeMartino on the Impact of Data Science Bootcamp
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
25 Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
26 Wade Wimer on the Impact of Data Science Bootcamp
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
27 Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
28 Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
29 Lance Milner on the Impact of Data Science Bootcamp
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
30 Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
31 Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
32 Michael Atlin on the Impact of Data Science Bootcamp
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
33 Amina Tariq's In-Person Experience at Data Science Bootcamp
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
34 Ceo's Revelation about Data Science Bootcamp
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
35 Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
36 Kevin Hillaker on the Impact of Data Science Bootcamp
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
37 Marko Topalovic's Experience with Data Science Bootcamp
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
38 Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
39 Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
40 Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
41 Vang Xiong on the Impact of Data Science Bootcamp
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
42 Data Scientist's Experience at Our Data Science Bootcamp
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
43 Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
44 Introduction To Titanic Kaggle Competition | Part 1
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
45 Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
46 Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
47 How To Do Titanic Kaggle Competition in R | Part 3.1
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
48 How to do the Titanic Kaggle competition in R | Part 3.1
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
49 Delve Deeper into Data Science with Data Science Bootcamp
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
50 Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
51 Shaena Montanari on the Impact of Data Science Bootcamp
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
52 Types of Sampling | Introduction to Data Mining | Part 12
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
53 Sampling for Data Selection | Introduction to Data Mining | Part 11
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
54 Data Aggregation | Introduction to Data Mining | Part 10
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
55 Data Cleaning | Introduction to Data Mining | Part 9
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
56 Missing & Duplicated Data | Introduction to Data Mining | Part 8
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
57 Data Noise | Introduction to Data Mining | Part 7
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
58 Graph and Ordered Data | Introduction to Data Mining | Part 5
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
59 Document Data & Transaction Data | Introduction to Data Mining | Part 4
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
60 Data Quality | Introduction to Data Mining | Part 6
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo

This video tutorial teaches how to implement linear regression from scratch in Python, covering key concepts and providing a practical implementation. The tutorial is suitable for beginners and intermediate learners who want to understand the underlying mechanics of linear regression and improve their skills in machine learning and Python programming.

Key Takeaways
  1. Create a linear regression algorithm from scratch using Python
  2. Use vectorized operations to improve performance
  3. Implement gradient descent optimization
  4. Use mean squared error as the loss function
  5. Update model parameters using partial derivatives
  6. Implement early stopping to prevent overfitting
💡 Vectorized operations and object-oriented programming can significantly improve the performance and readability of machine learning code, and gradient descent optimization is a key concept in training linear regression models.

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research
Medium · Data Science
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections
Medium · Deep Learning
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance
Medium · LLM
Up next
Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →