How Data Science Works

Brandon Rohrer · Beginner ·📐 ML Fundamentals ·10y ago

Skills: ML Maths Basics80%Supervised Learning70%ML Pipelines60%Unsupervised Learning50%

Key Takeaways

The video 'How Data Science Works' by Brandon Rohrer covers the basics of data science, including data collection, preprocessing, and analysis, as well as machine learning concepts such as regression, classification, and clustering, using tools like Azure and GitHub.

Full Transcript

Hello. Welcome to Data Science for Everyone. Even if you're scared of math or don't know how to program a computer or if you've never even heard of data science, this presentation is still for you. What I'm going to talk about is a highlevel description of how I do data science. Other people will give you different answers, but hopefully this will get you started. The very first step is to get data. And if you have data, get more data. So when I say data, I mean numbers and names. That's data that's numerical and categorical. What I mean by that, numbers are things that are amounts or counts, something that can be measured in extent, money, pixel brightness, sound intensity. where names or categorical variables are things like type of dog or a variety of hot drink or the ID, the identification of an aircraft or the model number of a droid or an ice cream flavor or something like the contents of a text. It's a name. If you change it a little bit, it could become a completely different name. If you change a number by a little bit, it's still pretty close to the original number. Now, to confuse this issue, there are names that look like numbers. A phone number, if you change it by just one, points to an entirely different person. And that person's not necessarily closer than if you change it by a thousand or 10,000. A phone number is a name that looks like a number. Similarly, a zip code, if you change it by just one or two, you might find a place that's relatively close geographically, but it will still be a completely different place. And it could be that changing a zip code by a thousand might produce another zip code that's right next to the original one. Again, identification numbers, 007. There's lots of types of identification numbers. serial numbers, credit card numbers, social security numbers. You change any one of those by 10 or by a thousand or by a million. Either way, they'll point to something completely different. And to further confuse the issue, there are numbers that look like names, uh, ordinals for instance, and names that can be turned into numbers. So these are names that have a distinct order like first, second, third. And it helps if the difference between each of these is similar. Small, medium, large. There's a distinct order. And a a medium is about as much bigger than a small as a large is than a medium. Left, middle, right. You can go time zones, train stops, anything that has an order can be turned into a number and interpreted by a machine learning algorithm. The process of measuring and collecting and storing and searching and moving and transforming data names and numbers is a special bit of work, special discipline to itself. Azure has a lot of tools to do this and we're going to talk about them a lot during this conference. I encourage you to keep an eye out for them and learn as much as you can. We won't talk about them in this presentation here. We'll stick to the fundamental conceptual notions of data science, the things that you try to do with all of these tools. I would also like to point you to the Cortana analytics process. You see these five steps below here. These are similar to the steps that I'm going to talk about today. They place special emphasis on the tools that you use and give specific instructions on how to use those tools to get these results. They're a next step. So after this presentation, go and check that out to further your education. After you have data, the next thing you need to do is ask a sharp question. And the thing you're looking for is, is your target in the data? I'll explain what I mean by that. To start with, if you have a vague question, it doesn't have to be answered with a name or a number. You can imagine that there's a genie in a lamp and it emerges from the lamp and it will answer any question you ask it truthfully but it's a mischievous genie and it will ask answer as unhelpfully as it can. So if you ask it a vague question like how can I increase my profits you may get an answer like work harder. true as far as it goes, but it's not specific enough to be helpful. Whereas, if you ask it a sharp question that has to be answered with a name or a number, such as, "How many times will the feature that I built get used by a new user?" The genie can't help but give you the correct numerical answer to that question. There's no wiggle room. That's the type of question you want to ask your data. You need to make sure that your data includes your target. Your target is an example of answers to your question that came in the past. So if your question is what will my stock price be next week, your target is a history of stock prices. So you can gather sales information by region, information on your competitor's project products, you can gather information about your users and how long they've been around and external information about markets. But none of that will do you any good in answering your question until you also have a history of your stock price so that you can line it up with those and find the patterns. If you don't have your target in the data, go get more data. If you do, put your data in a table. This is not quite as trivial as it sounds. It has to go in your table so that there's one target value per row, one instance of your target variable for each row. In this case with our stock price history, we recorded the stock price at the close every day. So we have one row per day. Now a lot of the data that we would like to include doesn't naturally occur once per day. For instance, total users. The way we've stored this data is username and date joined. a long list. In order to get total users, we have to aggregate sum over that list. This is a common method to get things to line up in rows. Another way, the complement to this is to distribute. There are some quantities like last month's sales or uh new users last month that are going to be the same for this entire month. last month is still the same month that it was. And so that number is going to be constant. So in order to line it up one per row, we distribute it and copy it across all of the relevant rows. Another thing we find ourselves doing quite often is computing. So for instance, days since press release or days since product release. In this case, our data may be stored as a date and the name of the press release. In order to find the days since the press release, we have to take the date of the last press release and today's date and subtract them. Do a little calculation. Another thing we may have to do is simply measure. If we need something that we didn't create ourselves, we may have to go out and gather that data. Maybe someone else collected it for us and we can get it from them or maybe we have to get it from scratch. But something like the average Dow Jones industrial average for each day. Another thing we can do is estimate. Sometimes at the time we create our table, we don't have the most recent data and we can't calculate or distribute or aggregate to get the number. So, we may have to fudge it, make some educated guesses. If we know, if we have a basis for making a guess, that's better than leaving it blank. We've still added some information. But when all else fails, we can leave blanks. We don't want to leave too many because the more we leave, the trickier it gets to do quality data science. But there are ways to handle it. So after the data is in a table with one row per instance of the target, the next step is to check for quality. We want to make sure the data is acceptable to use. So here's a table. We can see going through the columns. Um the very last row is our target variable. Where's a cape? This is a bunch of data on different superheroes and super villains. And we want to be able to predict in the future given a new superhero or super villain whether or not they wear a cape based on everything else we know about them. Now, when you go to survey data for quality, I don't know of any shortcuts to the process of walking through the data. What this means is you go to the first column, you look at the title, you think about what it means. If it's not clear, you look up what it means. If it's not written down, you talk to the person who created it and find out what it means. If you still don't know, you make a guess at what it means. With that in mind, you visually scroll through the data. For large data sets, you won't be able to go through the whole data set. So you can look at aggregations, you can look at the unique values, you can look at histograms of how much of each value occur and make sure that it makes sense. Make sure that you don't see anything that makes you your head and squint and think, does that look right? Because if you do, then it probably means there's an error in the data and it may need correction or clarification. This is almost always the case in real data. So to go through this data, first we look at the ID column. We scan through that and it becomes pretty clear that this is just a unique numerical identifier for each row. Very common in databases. We look at the next column, first name. Those look like all fine first names. They're all uh alphabetical. They're all capitalized. Uh they seem formatted well. Same with the last names. All very plausible. No red flags. When we get to birth year, we assume that this is the year when each of these individuals was born. But some of these quantities don't look like years at all. Some of them have stray punctuation. Peter Parker's year looks like it someone retyped the number one several times. Some have quotes. If you look down at Thor 2287 BC, that's a perfectly valid year, but by appending BC on there, it becomes non-numerical and the machine learning algorithm doesn't know how what to do with it. In order to clean this up, we need to make sure these are all formatted the same way after we go through and fix them. This is what it looks like. We even took the 2287 BC and just turned it into negative 2287 so that it could very easily be interpreted by our algorithm. Next, we look at the height. This is nice, easy to interpret, uniformly formatted. The difficulty here is we know that height is numerical. We know that 61 is much closer to 62 than it is to 51. That's not represented when these things are in strings with symbols, the foot and inch tick marks. So, we need to change them to a numerical form. Once we change them to inch inches, now they're naturally interpretable by the algorithm. We look at the birthplace and again these all look like very reasonably uh formatted names. It looks like some cleanup has occurred already on this this column. We do have an unknown in there, but that's reasonable enough. This column identity is secret. So this looks like a yes or no. Does this person have a secret identity? So in the case of Bruce Wayne, yes. In the case of Aurora Monroe, not applicable. It's not totally clear what that means. If we look down at Victor Von Doom, missing. Again, it's not totally obvious what that means. Janet Vanine, nothing at all. So there's some ambiguity in the data. In order to use it to its best effect, we need to figure out what is intended here and some in some cases make some guesses or do a little bit of research. Talk to who recorded it or if they've left the company, talk to someone who heard something about who recorded it and make a guess. After doing that, we end up with a nice every row is a y or an n uniform representation two levels very easily interpretable by an algorithm. Next column can fly. Again, it appears that this column is an indicator of whether an individual can fly, but it looks like whoever entered this was using different standards. Some are numerical. They have a three and a nine and a 12 and a one. And it's clear based on the fact that Clark Kent is a 12 and Bruce Wayne is a three that high numbers mean a good flyer. Uh, we have notes. Uh, jet presumably can fly but in a jet. Uh, Lev, which we do some research and find is short for levitate. We have some no not applicable blank. So again, it takes some interpretation. When we first look at this and we look at all of the unique values and we plot a histogram of how many time each occurs, it's clear that there's a lot of noise and a lot of different standards. After we go through and clean it up, it becomes very simple. Yes. No. In some cases, this is oversimplification, but it helps the machine learning algorithm to interpret it well. Again, alignment another case, lots of different things, lots of different ways to say the same things. In some cases, we go through, we look at what it means, and we unify it. And in this case, we end up with three levels, mostly good, some bad, and one neutral. Selena Kyle is tough to categorize. It's okay to have more than two categories. It's okay to have more than 10 categories as long as the representation is uniform and everything that's intended to be the same is called the same. We finally get to wear the cape. We treat the target variable just like any other. Look through, think about what it means, and unify it. Now we have a nice clean rectangular non-mpy uniform representation. This data is ready to start processing in an algorithm. So the next step transform features and are the new features predictive and I'll talk about what that means. Sometimes the data that you get needs a little bit of massaging before it can help you answer your question. This is called feature engineering. That's just a fancy word for taking the features that you already have and doing something to them to change them, combine them, break them down. I'll show you an example of this. So this data right here, column 0, column one, column two, they're all numerical. Column 2 is our target. We want to use column zero and column one. We'll call them feature zero and feature one to predict feature two, our regression target. Which means we're going to draw a line through a plot and try to predict or model feature two. When we plot feature zero against our target and feature one against our target and we draw the best line we can through those funny fuzzy blobs, it's a flat line. There's no slope to it at all. That's very unsatisfying. What that means is that if you tell me the value of feature one or the value of feature two of zero um I'm going to predict the same value of the regression target every time. The same value for feature two. It's a constant guess which means it's not very predictive. Sure enough, there is a measure of the goodness of that line as a model, as a description of the data. That's called the coefficient of determination. When it's one, the description is perfect. When it's zero, the description is worthless. We're pretty close to zero, 0.016. So now I tell you that feature zero is actually the departure time in hours since midnight of a subway train from the central square stop. And feature one is the arrival time in hours since midnight of the same train at the Kendall Square stop. And that feature two is the maximum speed in kilometers per hour that the train reaches when going between the two stops. So we think about that and realize departure time and arrival time are both related to the speed. And it's peak speed, not average speed, but there should still be a relation there. So they interact alone. They don't help us know what the speed is, but together they do. Well, the default way to handle interaction in data is to multiply the data points together. So, let's create a new data point 0 times one. We'll call it our new feature and we plot it on the right hand side against our target, our peak speed. And we draw a line through it. And unfortunately, again, we see It's a flat line. That's the best fit to that fuzzy blob. And sure enough, our coefficient of determination is still very close to zero, 0.014, just over 1%. So we step back again and we think freshman physics. The average speed is distance divided by elapsed time which is arrival time minus departure time minus not times. So now we create a new feature which is feature one minus feature zero. And this will be the difference of those times. It'll be the time in hours between the two stops. Now when we create this feature and plot it against the peak speed, we get a very different picture. We get a nice swoosh. And sure enough, the best line that fits through that is very predictive. Now, if you tell me the difference between feature one and feature zero, I can make a pretty good guess at what the peak speed was between the two stops. And sure enough, the coefficient of determination of our model now is 0.88, pretty close to one. So in order to get this performance, we had to take the original features which were perfectly accurate and full of good information, but we had to transform them so that the algorithm could get the information that it needed out of them. There are lots of different ways to do feature engineering. Some of them are data specific. They take advantage of the fact that with certain types of data like images, interesting information is stored in repeatable ways. Similarly, with text, you can look at the frequency that words occur and divide that or scale that by the inverse document frequency. There are other domain specific feature engineering tricks depending on your working in whether you're working in economics or agriculture or sociology. Different things matter and sometimes you have to take what you measure and do some tricks with it before you get to the thing that matters. Now you've probably heard about deep learning. It's been used su successfully with images and text and audio to automatically learn all kinds of features. Deep learning strength is that it learns the patterns from the data. And in fact, it's outside the scope of this talk, but I'd like to use it for a teaser. There's another talk that I gave recently that I'll share a link to at the end of this talk on the fundamentals of how deep learning works at the same level. No math, no code, just some of the basic concepts. Now you've taken your data, you've asked a sharp question, put it in a table, checked it, it's high quality, you've done your feature engineering, and now you're ready to answer your question. What you're looking for is whether that answer is clear, whether it lets you do what you want to do. This is where machine learning comes in. It uses statistics to look at patterns in the past to predict patterns in the future. So of all the questions that you can ask of your data, it might surprise you that there are really only five questions that machine learning can answer or five types of question. We're going to step through each of them. The first one is how much or how many Examples of this are what will the temperature be on Tuesday or what will my sales be next quarter. There are questions that ask for a number. Um the type of algorithms that answer this are called regression algorithms and they involve fitting a line or a curve or a surface to the data that can be used as a simplified version, a cartoon version of the data. And we'll show how this works in just a few minutes. But how much or how many is a very common question to ask of your data. Another very common question is which category. If I get a picture, is this picture an image of a cat or a dog? Or if I see a radar signature of all of the aircraft in my library, which aircraft is probably causing this? or I read a news article. Of all the topics that I've seen in the past, which topic or topics does this news article cover? This type of algorithm is called classification and uh it's used to assign a category or set of category to new examples. The third question that you can ask of your data is which groups which groups does it naturally break down into. This is pretty common. Uh sometimes you just want to take imagine someone gives you a bag full of M&M's and says, "Hey, break this into groups that are similar." Doesn't really matter how you do it. You'll probably start doing it by color, but you could also do it by weight or by diameter or by sugar content. There's lots of different ways to do it, but however you do it, it can be helpful. So, common examples of this are which shoppers have similar taste in produce. If you've ever watched movies streaming online, you've probably had movies recommended to you. This is the result of an algorithm that asks the question, which viewers like the same kind of movies? And then it'll look and see which movies your compatriots have seen that you haven't seen yet and recommend them to you. Another very popular question to ask, surprisingly popular, is this weird? You can have a long history of observations and experience and it's very useful to be able to identify when something weird happens. Raise a flag and say, "Hey, I haven't seen this before or at least not very often." So, for instance, if you drive a car that has pressure sensors in the tire, it's nice to be able to answer the question, "Is this pressure reading unusual?" If you have some kind of internet security system, it's nice to be able to answer the question, is this internet message typical? Your credit card company is always asking the question, is this combination of purchases very different from what this customer has made in the past? When things are weird, it doesn't necessarily mean that something's wrong, but it can. And when something's wrong, there's usually something weird about it. So, it's a great place to focus your detective efforts. Finally, the fifth question is, which action should I take? So, in the case where a machine, especially a robot, needs to make a decision, it's a low consequence decision and one that the machine gets to make a lot. Reinforcement learning is a way to do this. and to learn from your experience. So for instance, an automatic temperature control system in a building needs to answer the question, should I raise or lower the temperature? A little vacuum cleaner in your house needs to answer the question, should I vacuum the living room again or stay plugged into my charging system? And a self-driving car may need to answer the question, should I break or accelerate in response to that yellow light? This is a little bit different than the other questions, but I'm including it for completeness. We won't talk about it anymore today. So, machine learning algorithms are kind of magical until you dig in. We're going to pull the curtain back and show that it's not really magic at all. It's just a little bit of patterned identification. So, in this example, I walk in to a jewelry store. I have an old ring that my grandmother used to own, and it has a setting for a 1.35 karat diamond, and I'd like to fill that setting. But first, I want to know how much it's going to cost so I can save. So, what I do is I go to the case and I look at all of the diamonds. And for each one, I note the weight and I note the price. The first one I find is 1.01 carats and the price is $7,366. I note it, I move on, and I fill up a list of all of the diamonds in the case. What I have here is a data set. There's two columns. In this case, price is my target. That's what I want to predict. That is the the question that I'm asking is how much how much will this cost? And then my feature, my other feature is carrots. So in order to go to work here, I first draw a number line that represents the weight in carrots. Looks like the range is about 0 to two. So I make sure that it covers that and draw some intermediate tick marks. And then I draw a perpendicular line to represent price. Again, I see that the range is up almost to $20,000. So I put that in with some intermediate tick marks. And I look at my very first data point, 1.01 carats. And I eyeball a vertical line from 1.01, the carrot line. Price $7,366. I eyeball a horizontal line from 7,366. Where they meet, I put a dot. That data point, literally the dot represents the first diamond on my list. I can do this again and again for every diamond on the list. And I get something that looks like this. Now, this is just our data set. It's another representation. We haven't done anything funny to it. We've just put it in a different form. Now, it's worth noting on a tangent about half the time when you're doing data science, you can stop right here. If I owned a business and I was asked, uh, what size diamond should I choose in order to make a ring as inexpensively as possible? From here, it's obvious that price goes down as weight goes down. So, the answer is choose the lowest weight diamond you possibly can. We're good at pulling out those type of insights just using our eyes. Taking this picture of the data and turning it into an idea. Now, the question that we're asking is a little bit more subtle. So, we need to take it to the next level. If we look at those dots and squint, it looks like a fat fuzzy line. And it's not too tough to eyeball. So you can take a marker and draw a straight line right down the middle of it. Now what we just did is really significant. We've created a model which is kind of a fancy way of saying that we took and made a simplified version of our data. Now it is simplified. It is like a cartoon of our data. You can see that those data points, most of them don't lie right on the line at all. That line is just an approximation. It kind of says what the model does. And the data scientist interpretation of this is the line is the ideal. It's what the baseline price is per weight of a diamond. But because the real world is a little bit gritty, a little bit unpredictable, things happen you just can't account for. Some of are going to be higher, some of it are going to be lower, and that's called noise or variance. So your data is always the model plus a little bit of noise, which is what we have here. So with a model that allows us to answer our question. This we drew it by hand but it can also be the result of a machine learning algorithm. The question is how much. So we know the type is a regression because we're using a line. It's called a linear regression. So a linear regression would use math to fit to these dots and to generate something a whole lot like this. We did it with pen. Now that we have a model, we can answer our question. How much will a 1.35 karat diamond cost? There aren't any diamonds on our list that are 1.35 carats. So we have to refer to our model. We eyeball a line from 1.35 and right where it crosses our model. We eyeball a horizontal line and it hits right at 8,000. Boom. There's the answer. The diamond is going to cost us $8,000. Now, we look at those dots and we think, well, most of those dots don't lie right on the line. So, is it going to be a little bit over or a little bit under 8,000 or a lot over or under? And we can figure this out, too. We go back to our dots in our line and we draw a fat envelope around the line that encompasses most of the dots. It doesn't have to capture all of them. This is called our confidence interval. This is our model extended out to embrace most of what's there. And we're pretty confident the future data points will also be in this envelope. Now we can see where our 1.35 karat line crosses the top and the bottom of this envelope and draw two more horizontal lines over and this becomes the confidence interval of our estimate. So now we have a more complete story. We can say our 1.35 karat diamond will cost about $8,000, but it'll almost certainly be less than 10,200 and greater than 5,800. So, I'd like to point out that what we just did is create a linear regression model to make a prediction without using math or computer programs. Now, that's it's a pretty big deal. You should feel proud of yourself. Imagine though if instead of just weight, we had several other characteristics of the diamond like color, how close it is to being white or the quality of the cut or how many inclusions, how many little cracks or carbon granules are inside the diamond, a lot of other characteristics. Those would translate into more columns, which would in turn translate into more dimensions in our little plot. Because our paper is two-dimensional, it gets hard to represent more than two dimensions when we're drawing dots. And it gets really hard to draw lines and planes in more than two dimensions. This is where math comes in. And it can automatically find the line or the plane that fits as close as possible to the middle of that slew of dots. And then imagine if you had instead of 15 diamonds, you had 15 million. Then you really want to have a computer involved. Otherwise, it's going to take you a very long time to compute it. But the basic ideas are simple enough that you can do it with pen and paper. Now, in order to get a good answer, you need to have enough data. If you don't have enough data, what you get is kind of like this. You can see something, but it's really hard to know what it is, and it's not firm enough to base any decisions on. So, you add data and you try again, and then you end up with just barely enough data. And you know, you have just barely enough data because at that point, you can just barely make the decision that you're trying to make, answer the question that you're trying to answer. So with this, if you kind of lean back and squint, you can say, "Ah, you know what? That looks like a canal. That looks like a gorgeous sunset with clouds in the sky, and those look like buildings. That is a beautiful place. I want to go there on vacation." So I used the data and I was able to interpret it to answer my question. Do I want to go there on vacation? So that's how I knew I had barely enough data. Now, as you gather more and more data, the picture becomes clearer and clearer. And then eventually you're able to look and make finer grain decisions, answer more questions. Now I can say, "Oh, you know those three hotels on the left bank, the nearest one has fascinating architectural features. That's where I want to stay. In fact, I want the room on the third floor. That's the second from the right. I'm going to see if I can get it. So, you need to make sure you have at least barely enough data. More data is even better. So, we've used our data to answer our question, but we're not done yet. We have to use the answer. If a tree falls in the forest and no one's around to hear it, it might make a sound, but if you don't use the beautiful model that you built, it definitely won't delight your customers. So, ways to use the answer we found, you can make a web service. This is something that's exceptionally easy to do with Azure machine learning which the other talks today will cover in depth. You can make a decision. Do I want to go there on vacation? You can set a price. How much do I want to charge for this hot dog? You can take and publish your code on GitHub or in the Azure machine learning gallery. You can write a PDF showing your results and share it around. Or you can build a dashboard, say with PowerBI, to show your results to potential customers or to an employer. There's lots of ways to use it, but you have to do something with it. A model or an analysis all on its own doesn't have any life. Now, there are some things to be aware of, some gaps when using your model. Nearly all machine learning algorithms assume that the world doesn't change. If you gather your data and the world changes in some fundamental way, then unfortunately your data may all be invalid. If I had gathered information on air travel right up until September 10th, 2001 and then tried to make predictions about September 15th, 2001, they would be completely wrong because the world changed fundamentally. People's attitudes changed and their plans change and events changed on September 11th. So, you need to make sure that your data is relevant. the world hasn't changed out from under you. The second gap is related. Most machine learning algorithms take lots of examples to learn. Now, if you're learning something about, say, internet traffic and you're collecting a billion packets in a day, then it won't take you long to collect a nice sizable data set to be able to pull some conclusions out of it. But if you're studying something like global climate change and one data point is a single year, then by the time you've collected 10,000 data points, the world may have changed out from under you. Or the results of your analysis may be no longer relevant because all of the people and even all of the species that cared are no longer around to benefit from the answer. The third gap is that machine learning can't tell what caused what. So take this example. This is real data. We have the amount of cheese consumed in pounds per person each year compared to the amount of people the number of people who died by getting tangled up in their bed sheets. And if you look at that plot, you can see there is a pretty close relationship. It would not be surprising to see this plot in a news article with the conclusion that cheese consumption causes death by bed sheets. Similarly, it wouldn't be surprising to see it in the next newspaper with the conclusion that people getting tangled in their bed sheets and dying caused their relatives to eat more cheese. Either one is plausible based on the data, but the data doesn't tell either of those stories. Machine learning can't tell what's caused what. There's a third option which is that these things are completely [Music] unrelated causally but just random chance they showed a similar pattern. This is more than philosophical. Another example that you might be familiar with, you have a case of the hiccups. You're with your friends and they start offering remedies. Eat a spoonful of sugar, drink three glasses of water, stand on your head, and you try these things in succession. Eventually, while you're doing one of them, your hiccups stop. The friend will say, "See, mine worked." If your other friends are clever, they'll say, "No, no, no. Mine worked. it just took a little while to to kick in. In fact, you can't tell that from the experience. The data just shows that they're correlated, not that they're causally related. Now, this example seems trivial, but it's very closely related to our experience with economic recessions. There's a lot of different theories about what causes them and what ends them. And usually uh successive leaders try different theories in different situations and whatever they happen to be doing when the recession ends is given credit but there's actually no way to know for certain based on that data alone whether it caused it or not. Another example, the name of the conference. Some things are just a coincidence. Now, these gaps are not showstoppers for us. We can close them with human insight and judgment. We're really good at making reasonable guesses without having enough information. In fact, we're so good at it that we will usually make unreasonable guesses even in the presence of enough information. But the upside of that, the survival value of that is that even when we don't have enough information, we'll fill in the gaps and we'll build that bridge. We'll make that intuitive leap until the information catches up and can close it for us. And often it'll close it exactly the way we bridged it, but occasionally it'll surprise us and close it a different way. But the important thing is that not having the information doesn't stop us. So we listen to the data very carefully as far as it goes and then go with our gut after that. So this is the complete cycle of using data to answer a question. When you're done, go back to the beginning, get more data, ask another question, and go through it all again. Now, I've put together some resources related to each of these steps. Please take some time and check it out. Um, there's a link from this video and, uh, if you go down in the comments and check it out, you'll be able to get to the slides, which in turn contain all of these links. If you have any questions or comments, please email me or reach out on Twitter or on LinkedIn. I would love to chat and hear what you have to say. Thanks for listening.

Original Description

Part of the End-to-End Machine Learning School course library at http://e2eml.school A walk through the practice of data science for all audiences. No math, no programming, just plain English. For related videos and copies of the slides, check out the blog post: http://brohrer.github.io/pocket_guide_data_science.html

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Brandon Rohrer · Brandon Rohrer · 21 of 60

← Previous Next →

Robot Learning with a Biologically-Inspired Brain (BECCA)

Robot Learning with a Biologically-Inspired Brain (BECCA)

BECCA talk at AGI 2011

BECCA talk at AGI 2011

Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel

Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel

BECCA listens to The Hobbit

BECCA listens to The Hobbit

Learning the building blocks of speech: BECCA extracts a hierarchy of audio features

Learning the building blocks of speech: BECCA extracts a hierarchy of audio features

BECCA listens for sound effects in The Hobbit

BECCA listens for sound effects in The Hobbit

BECCA finds movie trailers while watching the Big Bang Theory

BECCA finds movie trailers while watching the Big Bang Theory

Listening for unexpected sounds: BECCA detects anomalies in audio data

Listening for unexpected sounds: BECCA detects anomalies in audio data

Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features

Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features

Watching for the unexpected: BECCA detects anomalies in video data

Watching for the unexpected: BECCA detects anomalies in video data

BECCA finds a stationary target

BECCA finds a stationary target

BECCA finds a stationary target at 3X speed

BECCA finds a stationary target at 3X speed

BECCA watches the X-men and Bruce Lee

BECCA watches the X-men and Bruce Lee

BECCA plays Quidditch

BECCA plays Quidditch

BECCA chases a ball

BECCA chases a ball

BECCA chases a ball, part 2

BECCA chases a ball, part 2

Becca chases a ball, part 3

Becca chases a ball, part 3

BECCA creates features from MNIST

BECCA creates features from MNIST

How reinforcement learning works in Becca 7

How reinforcement learning works in Becca 7

Deep Learning Demystified

Deep Learning Demystified

How Data Science Works

How Data Science Works

How Convolutional Neural Networks work

How Convolutional Neural Networks work

How Bayes Theorem works

How Bayes Theorem works

How Deep Neural Networks Work

How Deep Neural Networks Work

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

How Support Vector Machines work / How to open a black box

How Support Vector Machines work / How to open a black box

How autocorrelation works

How autocorrelation works

Getting closer to human intelligence through robotics

Getting closer to human intelligence through robotics

A minimalist's guide to slicing and indexing pandas DataFrames

A minimalist's guide to slicing and indexing pandas DataFrames

How decision trees work

How decision trees work

Data scientist archetypes

Data scientist archetypes

How to use python's datetime package

How to use python's datetime package

How optimization for machine learning works, part 1

How optimization for machine learning works, part 1

How optimization for machine learning works, part 2

How optimization for machine learning works, part 2

How optimization for machine learning works, part 3

How optimization for machine learning works, part 3

How optimization for machine learning works, part 4

How optimization for machine learning works, part 4

How convolutional neural networks work, in depth

How convolutional neural networks work, in depth

How to pick a machine learning model 4: Splitting the data

How to pick a machine learning model 4: Splitting the data

How to pick a machine learning model 3: Choosing a loss function

How to pick a machine learning model 3: Choosing a loss function

How to pick a machine learning model 2: Separating signal from noise

How to pick a machine learning model 2: Separating signal from noise

How to pick a machine learning model 1: Choosing between models

How to pick a machine learning model 1: Choosing between models

How to pick a machine learning model 5: Navigating assumptions

How to pick a machine learning model 5: Navigating assumptions

What do neural networks learn?

What do neural networks learn?

Interview with iRobot's Director of Data Science Angela Bassa

Interview with iRobot's Director of Data Science Angela Bassa

How Backpropagation Works

How Backpropagation Works

Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization

Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization

1D convolution for neural networks, part 1: Sliding dot product

1D convolution for neural networks, part 1: Sliding dot product

1D convolution for neural networks, part 2: Convolution copies the kernel

1D convolution for neural networks, part 2: Convolution copies the kernel

1D convolution for neural networks, part 3: Sliding dot product equations longhand

1D convolution for neural networks, part 3: Sliding dot product equations longhand

1D convolution for neural networks, part 4: Convolution equation

1D convolution for neural networks, part 4: Convolution equation

1D convolution for neural networks, part 5: Backpropagation

1D convolution for neural networks, part 5: Backpropagation

1D convolution for neural networks, part 6: Input gradient

1D convolution for neural networks, part 6: Input gradient

1D convolution for neural networks, part 7: Weight gradient

1D convolution for neural networks, part 7: Weight gradient

1D convolution for neural networks, part 8: Padding

1D convolution for neural networks, part 8: Padding

1D convolution for neural networks, part 9: Stride

1D convolution for neural networks, part 9: Stride

The Four Grand Challenges of Robots in the Home

The Four Grand Challenges of Robots in the Home

How Convolution Works

How Convolution Works

The Softmax neural network layer

The Softmax neural network layer

Batch normalization

Batch normalization

Getting ready to learn Python, Mac edition #1: Files and directories

Getting ready to learn Python, Mac edition #1: Files and directories

This video provides an introduction to data science and machine learning, covering key concepts such as data collection, preprocessing, and analysis, as well as machine learning algorithms like regression and classification. The video also discusses the importance of data quality and the limitations of machine learning in determining causality.

Key Takeaways

Collect and preprocess data
Apply regression and classification algorithms
Analyze data quality and correlation
Interpret results and make predictions
Deploy machine learning models

💡 Correlation does not imply causation, and machine learning models can only identify patterns in data, not determine causal relationships.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research

Medium · Data Science

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections

Medium · Deep Learning

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance

Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB